Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Locale "sr-Latn" is not working: supporting script subtags #10

Open
747 opened this issue May 25, 2021 · 5 comments
Open

Locale "sr-Latn" is not working: supporting script subtags #10

747 opened this issue May 25, 2021 · 5 comments

Comments

@747
Copy link

747 commented May 25, 2021

In r18n/locale.rb:

self.class.name.split('::').last.split(/([A-Z][a-z]+)/)[1, 2]

This line seems to have an easy logic error, because:

$ irb
irb(main):001:1* module R18n
irb(main):002:2*   class RuFooBarBaz
irb(main):003:1*   end
irb(main):004:1* end
=> nil
irb(main):005:0> l = R18n::RuFooBarBaz.new
=> #<R18n::RuFooBarBaz:0x00007fffdeb58610>
irb(main):006:0> l.class.name.split('::').last.split(/([A-Z][a-z]+)/)[1, 2]
=> ["Ru", ""]
irb(main):007:0> l.class.name.split('::').last.split(/([A-Z][a-z]+)/)
=> ["", "Ru", "", "Foo", "", "Bar", "", "Baz"]

Perhaps the problem has been elusive because it was introduced with the parent locale function (2c88300), and no one tried to use different locales under the same parent locale at once.

@AlexWayfer
Copy link
Contributor

Hello. Thank you for your report.

Can you please provide more realistic example? We both know that RuFooBarBaz is out of R18n (real-world projects) scope.

I'd glad to try to understand, cover with tests and fix it.

@AlexWayfer
Copy link
Contributor

To be honest, region was introduced just a day before, as I see: 009cadf

And there were no reports for 2.5 years, so, I guess, it's not a big deal. 😅

Also we have tests for "different locales (regions) under the same parent locale" here: https://github.com/r18n/r18n-core/blob/28c1d46/spec/r18n_spec.rb#L195-L206

So… I can understand a code error, but I want to know what better to test, how it affects projects.

@747
Copy link
Author

747 commented Jul 1, 2021

Indeed, now I see most locale classes with a secondary element are named in a format like EnUS so that the behavior is "correct" for them.

What it harms are those such as SrLatn in this repository's built-in locales.

require 'r18n-core'
R18n.set "en-us"
puts R18n.t.yes # => "Yes"
R18n.set "zh-tw"
puts R18n.t.yes # => "是"
R18n.set "sr-latn"
puts R18n.t.yes # => "Yes" <- falls back to English even .yml exists!

open('sr.yml', 'w:utf-8') do |sr|
  sr.puts "'yes': да"
end
open('sr-latn.yml', 'w:utf-8') do |srl|
  srl.puts "'yes': da"
end
R18n.default_places = '.'
R18n.set "sr-latn"
puts R18n.t.yes # => "да"

So maybe no one from Serbia has used this gem 🙄.


And when we're at it, what would you say to supporting script subtags? Outside sr-Latn and sr-Cyrl, there's kk-Latn upcoming, and some real world examples such as zh-Hant-HK (because they may use both Simp. and Trad. variants in Hong Kong) exist.

@747 747 changed the title Region is not actually working since 2018 Locale "sr-Latn" is not working: supporting script subtags Jul 1, 2021
@AlexWayfer
Copy link
Contributor

It seems a lot more complicated than I thought.

For example: https://en.wikipedia.org/wiki/IETF_language_tag#Extension_U_(Unicode_Locale)

So, "locale" can have a lot of "tags". And the second one can be either region or script or anything else.

Meh.

Two ideas:

  1. We should get rid of script tags and don't support them (do we really need?).
  2. We should implement a complicated system, not breaking existing one with regions, but also supporting script tags on the second and the third places (sr-SR-Latn is possible, I guess?).

@747
Copy link
Author

747 commented Mar 3, 2023

I think the latter would be a well-balanced option. You should also support 3-letter language codes as in the standard.

(Note that script comes before region, so it must be sr-Latn-SR and not sr-SR-Latn in that case. And SR is confusingly the country code of Suriname and not Serbia, so "Serbian spoken in Serbia written in Roman alphabet" will be sr-Latn-RS.)

It seems a lot more complicated than I thought.

For example: https://en.wikipedia.org/wiki/IETF_language_tag#Extension_U_(Unicode_Locale)

The whole system of IETF language tag is indeed complex, but half of them (including what you cited) are for domain-specific or backward compatibility things not immediately needed for user-facing locales.

Almost all cases can be covered with three elements: language-script-region. If you want a step smarter thing with relatively small effort, consider also accepting one variant in the place of region (so that language-script-variant). This is good for sub-country official languages such as Scottish English en-scotland or Valencian ca-valencia (because IETF tags are not designed to handle ISO subdivision codes very well).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants