Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Align on properly formed language tags #642

Open
carlosjeurissen opened this issue Jun 19, 2024 · 15 comments
Open

Align on properly formed language tags #642

carlosjeurissen opened this issue Jun 19, 2024 · 15 comments
Labels
i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. inconsistency Inconsistent behavior across browsers needs-triage: chrome Chrome needs to assess this issue for the first time next manifest version Consider for the next manifest version supportive: firefox Supportive from Firefox supportive: safari Supportive from Safari topic: localization

Comments

@carlosjeurissen
Copy link
Contributor

carlosjeurissen commented Jun 19, 2024

Introduction

Historically, browser extensions have been using language tags with two different syntaxes.

  1. Using a hyphen, I.E. en-US. This is the proper language tag format as defined in BCP47
  2. Using an underscore, I.E. en_US. This is similar to POSIX, ICU and ISO/IEC 15897.

Both syntaxes have been used and supported with mixed support in different areas of the extensions. Support varies per API and browser. This WECG issue covers those areas in an attempt to come to alignment.

_locales directory

In most documentation, locale directories are supposed to use the underscore variant.

Currently this is a requirement for Chrome while Firefox seems to also support the hyphen -.

Going forward, unless there is a clear reason why underscores should be used my proposal would be to start add support for the proper BCP47 tags and disallow the use of underscores in mv4.

manifest.json default_locale

Following documentation, default_locale is supposed to be using the subdirectory name of _locales.

Currently chrome requires the use of the underscore, while Firefox supports both a hyphen and an underscore.

Going forward I suggest we keep this documentation and follow the switch to BCP47 as mentioned in _locales above.

i18n.getUILanguage

Following documentation, this returns a BCP47 language tag. Historically before version 55, Firefox used to return the underscore variation. I suggest we keep this as is.

i18n.getMessage('@@ui_locale')

This is defined as returning the current locale. In Chrome it returns the underscore variant. While in Firefox, probably since version 56, it returns the BCP47 tag.

Going forward, I suggest this to be equal to the variation used in _locales as it could be used to fetch the messages.json file.

Final words

Basically the goal is to align the behaviour across browsers. My opinion would be to always use BCP47. However this comes with a transitional cost which I believe this is worth it. Alternatively we agree for example to use the underscore only for files. However we still need to agree on default_locale and what is returned by @@ui_locale.

Related: #131

@carlosjeurissen carlosjeurissen added inconsistency Inconsistent behavior across browsers next manifest version Consider for the next manifest version topic: localization i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. labels Jun 19, 2024
@tophf
Copy link

tophf commented Jun 19, 2024

It would break external tools like transifex that use the ISO/IEC 15897 standard for language codes (for example en_US).

@patrickkettner
Copy link
Contributor

From Transifex's help page

When we add support for a language, we follow the BCP47 standard. The multiple language locales are based on region subtags.

@tophf
Copy link

tophf commented Jun 19, 2024

From https://developers.transifex.com/docs/using-the-client#language-management

Transifex uses the ISO/IEC 15897 standard for language codes

@patrickkettner
Copy link
Contributor

FWIW, the sentence directly following your quote is "If you use a different format for the local language codes, you can define a mapping in your configuration file .tx/config."

@tophf
Copy link

tophf commented Jun 19, 2024

When we add support for a language, we follow the BCP47 standard.

This quote is about the internal UI/management in transifex, but I referred to the transifex tools that are specific to extension development and which process _locales directory.

FWIW, the sentence directly following your quote is "If you use a different format for the local language codes, you can define a mapping in your configuration file .tx/config."

Yeah, but that's a huge pain, multiplied by thousands of extensions and dozens of languages.

Either way, there may be hundreds of other utilities that use the classic _ syntax.

@patrickkettner
Copy link
Contributor

Not arguing for or against it, just wanting to clarify that your specific point was not reflective of my use of their tool. Appreciate you taking the time to bring up issues!

@tophf
Copy link

tophf commented Jun 19, 2024

Judging by Transifex example alone, using two different standards may be actually an established practice that's implemented in many such tools, i.e. BCP47 is used internally and ISO/IEC 15897 for the files.

@carlosjeurissen carlosjeurissen changed the title Align on properly formed language tags: switch to BCP47 everywhere Align on properly formed language tags Jun 19, 2024
@carlosjeurissen
Copy link
Contributor Author

@ghostwords @tophf The goal here is to align this cross-browser. Updated the post to mention alternative paths we can take. We could for example always use the underscore just for file paths. However we then still need to agree on default_locale and what returned by @@ui_locale.

If tools like Transifex have some way of exporting data specific to extensions and thus use the underscore, they will likely update this if the extension system updates. However I very much see your point of the cost of switching this.

In general using multiple standards seems very counter intuitive. However if using underscores for the file paths we could simply use BCP47 with all the hyphens replaced with underscores.

The reason to align on this is also motivated by proposals like #641.

@tophf
Copy link

tophf commented Jun 19, 2024

Judging by the source code Chromium uses ICU for locales, which uses underscores. Since ICU is an industry standard, the same conventions are likely to be used by many tools for extension development.

@xeenon
Copy link
Collaborator

xeenon commented Jun 20, 2024

Safari also requires underscores for _locales.

@xeenon xeenon added the supportive: safari Supportive from Safari label Jun 20, 2024
@carlosjeurissen
Copy link
Contributor Author

carlosjeurissen commented Jun 20, 2024

@ghostwords @tophf This was discussed during the 2024-06-20 meeting.

It comes down to:

  • adding support for BCP47 in all areas of the extension
  • not removing support for underscores anytime soon for backwards compatibility and industry investment

As for i18n.getMessage('@@ui_locale'), Firefox returns a hyphen while Chrome returns an underscore. So in general this would not result in a breaking change as we could already not rely on this. So we either choose to use BCP47 like Firefox, or use the same format used in _locales (if feasible implementation wise). So if en_US is present while en-US is not, it will return en_US, else it would return en-US.

I would still be in favour of requiring the use of BCP47 for default_locale for manifest version 4.

@tophf
Copy link

tophf commented Jun 20, 2024

What about adding a new variable:

  • @@ui_locale_hyphen
  • @@ui_locale_web
  • @@ui-locale - not an identifier strictly speaking, but it's kinda self-explanatory once you know the difference (con: it's confusing if you don't), so it might be worth making an exception.

Or maybe switch behavior based on the second parameter like getMessage('@@ui_locale', ['-'])

@Rob--W Rob--W added the supportive: firefox Supportive from Firefox label Jun 20, 2024
@hanguokai
Copy link
Member

At the JavaScript API level, most Web APIs currently use BCP47. I think everyone agrees with that. Of course, my proposal #641 uses it too.

This issue mainly involves some non-API areas. All developers hate breaking changes unless there is a huge benefit. It doesn't seem like there is a huge benefit here, so I would like to keep backwards compatibility as much as possible, and @Rob--W also said in today's meeting that it is important to keep backwards compatibility.

About _locales directory and file name, I see that both Chrome doc and MDN doc are very clear. In fact, I have never heard of a developer complaining about this. So this is not a real problem for developers. In my opinion, there is no need to change it, or we can support both underline (_) and hyphen(-).

About manifest default_locale, I think we can support both. It is not a real problem for developers. It is a manifest value, I never use it in JS code.

About i18n.getUILanguage(), both Chrome and Firefox return BCP47 now, so there is no problem.

About the predefined message @@ui_locale, I never use it myself. It is inconsistent between Chrome and Firefox at present, so whether Chrome makes changes or Firefox makes changes will result in breaking changes for their current developers. To avoid breaking changes, it may be necessary to introduce a new predefined message, like @@ui_locale_hyphen and keep @@ui_locale unchanged.

@carlosjeurissen
Copy link
Contributor Author

@hanguokai as mentioned here: #642 (comment), there is agreement to keep support for the underscore. I am not advocating against that.

The problem is having to deal with replacing the underscore with hyphens and vice-versa throughout your extension code and the whole supply-chain while the web mostly uses BCP47 and potential bugs because of these conversions.

As for the predefined message @@ui_locale. Instead of introducing something like @@ui_locale_hyphen, following your proposal #641, we can introduce a @@current_locale which returns the current locale as BCP47. It seems @@current_locale can fully replace @@ui_locale use-cases.

@hanguokai
Copy link
Member

@carlosjeurissen Thanks for the suggestion. @@current_locale looks good to me for i18n.getCurrentLanguage(). And @@ui_locale still represents i18n.getUILanguage().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. inconsistency Inconsistent behavior across browsers needs-triage: chrome Chrome needs to assess this issue for the first time next manifest version Consider for the next manifest version supportive: firefox Supportive from Firefox supportive: safari Supportive from Safari topic: localization
Projects
None yet
Development

No branches or pull requests

6 participants