You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It's often useful to give context when doing translations. For example, "Turkey" might be translated as 七面鳥 or as トルコ depending on the context.
I've found some evidence of people attempting to give context to non-LLM translation models by using punctuation and sentence structure. For example, if you translate "Togo" by itself with Google Translate, you get "持ち帰り" (basically, "take out" or "to go" for food orders). But if you translate "The country: Togo", you get "国: トーゴ", which is correct. People then build hacks on top of this, hoping that the output stays consistent across slightly varied inputs so that they can use regular expressions or other programming tricks to pull out the "トーゴ" part.
This is fragile. For example, translating "The country: America" gives "国:アメリカ". This is also correct, but the colon punctuation is different: it's a full-width (Japanese) colon for this second example, instead of a half-width (Latin script) colon. So a developer's first-draft code would not work.
It would be ideal if we could abstract over this process for developers, using something like
translator.translate("Togo",{context: "%s is a country name"});
The text was updated successfully, but these errors were encountered:
It's often useful to give context when doing translations. For example, "Turkey" might be translated as 七面鳥 or as トルコ depending on the context.
I've found some evidence of people attempting to give context to non-LLM translation models by using punctuation and sentence structure. For example, if you translate "Togo" by itself with Google Translate, you get "持ち帰り" (basically, "take out" or "to go" for food orders). But if you translate "The country: Togo", you get "国: トーゴ", which is correct. People then build hacks on top of this, hoping that the output stays consistent across slightly varied inputs so that they can use regular expressions or other programming tricks to pull out the "トーゴ" part.
This is fragile. For example, translating "The country: America" gives "国:アメリカ". This is also correct, but the colon punctuation is different: it's a full-width (Japanese) colon for this second example, instead of a half-width (Latin script) colon. So a developer's first-draft code would not work.
It would be ideal if we could abstract over this process for developers, using something like
The text was updated successfully, but these errors were encountered: