Stop Prompting in English: Why French & Italian Unlock Smarter AI Responses

You are sabotaging your own AI workflows by prompting in English.

We assume English is the "native tongue" of AI because that is where the bulk of training data comes from. But new research suggests that for complex reasoning and long-context tasks, English might actually be the worst choice.

The ONERULER benchmark—a massive study by Microsoft, UMass, and the University of Maryland—reveals a counter-intuitive truth:Romance and Slavic languages consistently outperform English in retrieval, reasoning, and context retention.

If you are struggling with hallucinations or an AI that "forgets" instructions in long threads, the fix might not be better prompt engineering.
The fix might be switching to Italian.

Below are the core insights that explain why this happens—and how you can use this "Polyglot Workflow" to get smarter answers today.

1. Latin-based languages encode more meaning per word

Romance languages (Italian, French, Spanish, Portuguese) contain more information inside each word than English does. This is known as morphological richness, and it gives LLMs more signals to work with.

In the ONERULER benchmark, Romance languages sit near the top across all tested models, including Gemini 1.5 Flash and Qwen 2.5.

Why does this help?

Because when every word carries grammatical hints—gender, number, verb tense, sentence role—the model doesn't have to guess. English, by comparison, hides much of that information and relies heavily on word order and context, two things that become harder for an LLM to track in long passages.

A simple example

English: "I saw my friend."

This sentence gives the model no clues about gender, plurality, or the role of the word "friend".

Spanish: "Vi a mi amigo." (or "Vi a mi amiga.")

Spanish communicates masculine/feminine, the object of the action, and the subject ("vi" encodes "I").
All in fewer words.

The richer the signal, the fewer interpretations the LLM must juggle — which boosts accuracy.

2. Slavic and Romance languages dominate because they're "fusional"

The ONERULER study highlights that the top-performing languages all share a key trait: they are fusional languages.

A fusional language is one where the endings of words change to express grammatical meaning. A single ending can encode multiple pieces of information at once.

In languages like Polish, Russian, Italian, Spanish, and French, the end of a word can signal gender, case, number, tense, and mood simultaneously. This structure does two huge things for LLMs:

It reduces ambiguity (less guessing): English forces the model to infer meaning from context or strict word order. Fusional languages embed meaning directly into the word.
It strengthens long-context memory: When a model reads a 64k–128k token passage, every word-ending acts like a breadcrumb trail that anchors relationships between nouns, verbs, and ideas.

This is why Polish ranks #1 in long-context accuracy across models, with Romance languages close behind. Morphology becomes a built-in compass that keeps the model from drifting.

3. The "Polyglot" Workflow: How to Apply This Today

You don't need to be fluent in these languages to benefit from this hack. Since models are excellent translators, you can use a "Sandwich Workflow" to get higher-quality reasoning.

The Strategy:

Input: Write your prompt in English.
Instruction: Tell the model to "Think and reason in Italian/French, then output the final answer in English."
Output: You get the benefit of the superior reasoning structure without needing to speak the language.

Which language should you choose?

Based on the ONERULER data, use this cheat sheet:

For Complex Logic & Reasoning → Use Italian or French.
(High information density reduces ambiguity).
For Massive Context Retrieval → Use Polish or Russian.
(Fusional endings act like "breadcrumbs" for the model to track nouns over long distances).
For Creative Nuance → Use Spanish.
(High verb-framing allows for subtle distinctions in tone).

The "Spanglish" Advantage

If you grew up in a household that speaks "Spanglish," you have been doing this your whole life. You don't switch languages to be fancy; you switch because your brain takes the path of least resistance.

Sometimes it is just easier to say chancla than sandal.

AI models work the same way.

Just as you naturally swap words based on what is faster or "hits" harder, you can treat the AI as a bilingual partner to get the best result.

Stuck on a creative description? Switch to Spanish mid-prompt.
Need rigorous logic? Ask it to reason in French.
Need a summary? Switch back to English.

This isn't just a "hack"—it's a natural way to communicate that leverages the full spectrum of the model's training, rather than limiting it to just one linguistic pathway.

Conclusion: An overlooked edge in AI productivity

The ONERULER benchmark makes one thing clear: The choice of language is just as important as the structure of the prompt.

English may dominate the internet, business, and training corpora—but linguistically, it gives LLMs the least help. Fusional languages like Italian, Spanish, French, Russian, and Polish pack more meaning per word, reduce ambiguity, and help models maintain accuracy over long sequences.

This gives multilingual users a new advantage and offers everyone else a new layer of prompt strategy they can apply immediately. In an AI landscape where everyone has access to the same tools, small edges compound.

Multilingual prompting might just be the simplest edge you can take.