Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rationalising supported locales #13

Open
jbarrett opened this issue Jan 17, 2019 · 7 comments
Open

Rationalising supported locales #13

jbarrett opened this issue Jan 17, 2019 · 7 comments

Comments

@jbarrett
Copy link
Contributor

jbarrett commented Jan 17, 2019

We seem to introduce some amount of interface clutter and overhead with our set of supported locales. There has already been some discussion on our Arabic dialects in #7 and I suspect this applies to other languages. For example, there are ten Spanish language locales configured. Are they really going to be so different from each other on a search results page?

The same applies for English, French, German (see #12), Dutch and others.

In my experience most sites have one (or maybe two) choices per language.

@ghost
Copy link

ghost commented Jan 18, 2019

I think it depends. In the case of swissgerman/german the problem is mainly the definition of how the language should be translated. We have different accents here so we translated it to standardized german. If it is like this, it doesn't really matter.

I think for de_CH we might consider translating it to swissgerman, otherwise we could also use de_DE as standard. But it should be explicit specified that we should either fully translate it to swissgerman or use de_DE only. (Current state is some weird mixture of swissgerman and german.)

(Also I can say that we swiss really like seeing a specific swiss option 😃 )

@preemeijer
Copy link
Contributor

Hello John,
I did not noticed that Github is already in production. Sorry about that.

I do not know exactly what is mean by the Dutch in this case.
If you are referring to Belgium and the Netherlands than I would say this is somewhat a different language and it is not possible to merge. There is a list of differences for the two countries: https://nl.wikipedia.org/wiki/Lijst_van_verschillen_tussen_het_Nederlands_in_Belgi%C3%AB,_Nederland_en_Suriname
This is incomplete (as the page tells you) but I think that this is the reason not to merge these two.

Cheers, Paul

@jbarrett
Copy link
Contributor Author

jbarrett commented Mar 8, 2019

Hey @preemeijer - good to see you here! 🙂

From this thread it seems like there's a strong preference for language + locale based translation. I think this is fine from a product perspective, since your language should be configured based on your browser language by default, you shouldn't even need to think about it.

My concern is the dialectal divergence between each of the options probably isn't enough to warrant re-translating all of the text - that's a chunk of repeat effort.

I've been thinking about this a little and it should be feasible to configure a common "fallback" language for each of those with multiple locale options. For example, we would create a 'nl' locale which would act as a filler for missing translations in nl_NL and nl_BE. Dialectal differences from 'nl' would go into nl_NL or nl_BE - these translations would override the fallback language. (Sorry if this isn't explained clearly).

@jbarrett
Copy link
Contributor Author

jbarrett commented Mar 8, 2019

Though if it wasn't too controversial, I would like to not have to create new languages for this - have nl_NL be the fallback for nl_*, de_DE for de_* and so on...

This won't work for some languages, but might for many.

@preemeijer
Copy link
Contributor

I've been thinking about this a little and it should be feasible to configure a common "fallback" language for each of those with multiple locale options. For example, we would create a 'nl' locale which would act as a filler for missing translations in nl_NL and nl_BE. Dialectal differences from 'nl' would go into nl_NL or nl_BE - these translations would override the fallback language. (Sorry if this isn't explained clearly).

I let this sink in for a couple of hours and I think this could work out well.

NL is not that very fragmented. I just cannot oversee the consequences for example for the Spanish translations.
Some questions and/or remarks I want address here are:

  • Which "dialect" will have the upper hand? or will fill the fallback?
  • I cannot estimate if some things will be "very" weird in the eventual translation, when all the tokens come together. One part from the fallback and the other part of a sentence from the nl_NL for example.

Hey @preemeijer - good to see you here! slightly_smiling_face

@jbarrett: Thanks! I'm also happy to interact with you again on translations 😄

Cheers, Paul

@jbarrett
Copy link
Contributor Author

I let this sink in for a couple of hours and I think this could work out well.

NL is not that very fragmented. I just cannot oversee the consequences for example for the Spanish translations.
Some questions and/or remarks I want address here are:

  • Which "dialect" will have the upper hand? or will fill the fallback?

The idea would be fallback would fill in for missing translations, so if you configure nl_BE with a fallback nl_NL, the nl_NL translation would be used only where a nl_BE translation is missing. It's just filling in - nl_BE translations would have priority.

Also, none of the fallbacks would be automatic. This section...

"nl_BE" : {
"locale" : "nl_BE",
"name_in_english" : "Dutch (Belgium)",
"name_in_local" : "Nederlands (België)",
"rtl" : 0
},

...would become something like:

   "nl_BE" : {
      "locale" : "nl_BE",
      "fallback" : "nl_NL",
      "name_in_english" : "Dutch (Belgium)",
      "name_in_local" : "Nederlands (België)",
      "rtl" : 0
    },
  • I cannot estimate if some things will be "very" weird in the eventual translation, when all the tokens come together. One part from the fallback and the other part of a sentence from the nl_NL for example.

This is a definite possibility but we have made some effort to reduce the number of fragmented texts. Support for retiring tokens was added to duck.co and we removed maybe 700-800 of those deprecated and fragmented tokens (though maybe the templates in https://github.com/duckduckgo/duckduckgo-publisher/ still retain some of them).

#2 is still open - if any fragmented sentences are hanging around, we should work to remove them.

@preemeijer
Copy link
Contributor

Clear on all the points.
Cheers, Paul

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants