Language of instruction
Instructions in English often outperform native language due to training data mix. But: instructions in target language reduce accidental English leaks in output.
Advertisement
Cross-lingual retrieval
Multilingual embeddings (BGE-M3, e5-mistral) enable query-in-language-A + docs-in-language-B. Native retrieval beats translate-then-retrieve for many pairs.
Advertisement
Translation as pivot
Complex reasoning: translate to English, reason, translate back. Loses nuance. Modern GPT-4/Claude do fine without pivot in most language pairs.
Right-to-left, non-Latin
Arabic, Hebrew, CJK: watch tokenization quality. Some models struggle with under-represented scripts (Amharic, Burmese, low-resource).