Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No warning for invalid locales #476

Closed
hadley opened this issue Apr 14, 2022 · 5 comments
Closed

No warning for invalid locales #476

hadley opened this issue Apr 14, 2022 · 5 comments

Comments

@hadley
Copy link

hadley commented Apr 14, 2022

stringi::stri_sort("a", opts_collator = stringi::stri_opts_collator(locale = "doesntexist"))
#> [1] "a"

Created on 2022-04-14 by the reprex package (v2.0.1)

Originally filed in tidyverse/stringr#440

@gagolews
Copy link
Owner

As far as I remember, ICU is quite tolerant with regards to what it accepts as a valid locale id and tries hard to fall back to something closely approximating what the user needs (as per https://unicode-org.github.io/icu/userguide/locale/ and https://unicode-org.github.io/icu/userguide/locale/resources.html)

It might be a good idea to implement what you request based on what https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/classicu_1_1Collator.html says about static Collator* icu::Collator::createInstance ( const Locale & loc, UErrorCode & err )

The UErrorCode& err parameter is used to return status information to the user. To check whether the construction succeeded or not, you should check the value of U_SUCCESS(err). If you wish more detailed information, you can check for informational error results which still indicate success. U_USING_FALLBACK_ERROR indicates that a fall back locale was used. For example, 'de_CH' was requested, but nothing was found there, so 'de' was used. U_USING_DEFAULT_ERROR indicates that the default locale data was used; neither the requested locale nor any of its fall back locales could be found.

@gagolews
Copy link
Owner

gagolews commented Apr 15, 2022

So the above would be an instance of U_USING_FALLBACK_WARNING or U_USING_DEFAULT_WARNING

https://unicode-org.github.io/icu/userguide/locale/resources.html

@hadley
Copy link
Author

hadley commented Apr 15, 2022

I wonder if it's worth warning/messagining on U_USING_FALLBACK_ERROR and erroring on U_USING_DEFAULT_ERROR?

@gagolews
Copy link
Owner

gagolews commented Nov 7, 2023

U_USING_DEFAULT_WARNING when requesting a Collator and a few other services now triggers a warning on an explicitly set locale that ends up with ICU's returning a resource bundle from the root locale:

> stringi::stri_sort(c("a", "c", "ch", "h", "ą"), locale="C")
[1] "a"  "ą"  "c"  "ch" "h" 
> stringi::stri_sort(c("a", "c", "ch", "h", "ą"), locale="en")
[1] "a"  "ą"  "c"  "ch" "h" 
> stringi::stri_sort(c("a", "c", "ch", "h", "ą"), locale="pl")
[1] "a"  "ą"  "c"  "ch" "h" 
> stringi::stri_sort(c("a", "c", "ch", "h", "ą"), locale="sk")
[1] "a"  "ą"  "c"  "h"  "ch"
> stringi::stri_sort(c("a", "c", "ch", "h", "ą"), locale="unknown")
[1] "a"  "ą"  "c"  "ch" "h" 
Warning message:
In stringi::stri_sort(c("a", "c", "ch", "h", "ą"), locale = "unknown") :
  A resource bundle lookup returned a result either from the root or the default locale.

@hadley
Copy link
Author

hadley commented Nov 7, 2023

Thanks!

gagolews added a commit that referenced this issue Nov 9, 2023
…ends up with ICU's returning a resource bundle from the root locale
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants