-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++] "utf8_upper" kernel produces different result than Python's str.upper for "ẞ" #34599
Comments
Arrow uses the And this library changed the upper case for "ß" from "SS" to "ẞ" a few years ago: JuliaStrings/utf8proc#130 It seems that there is some discussion about what the correct upper case should be. For example, see also https://bugs.openjdk.org/browse/JDK-8186073 . The unicode standard (http://unicode.org/charts/PDF/U1E00.pdf) mentions:
https://www.fileformat.info/info/unicode/char/00df/index.htm mentions "uppercase is "SS" (standard case mapping), alternatively U+1E9E" |
So in the end, this is not something we can change in Arrow itself. If you want this to change, you will need to bring it up at https://github.com/JuliaStrings/utf8proc/ (but given they changed this a few years back, it might not be likely they would change it again) |
Describe the bug, including details regarding any error messages, version, and platform.
Component(s)
Python
The text was updated successfully, but these errors were encountered: