-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compiler regression on 1.14 when using unicode as variables names #11750
Comments
Hi @mrluc! This is a regression based on Unicode Security. The µ character is no longer allowed, because it has IdentifierType of "NOT_NKFC". The micro sign actually normalizes to Greek's mu. We only required NFC before, not NKFC. Another interesting aspect is that, µ is considered to be "Common", probably because its scientific usage applies to many scripts. Therefore, if we simply allowed µ as is, it could be mixed with the Greek version, and cause confusable issues. However, it is still weird that it is forbidden because of its Greek counterpart. It would be similar if we forbade Latin's A similar issue happens with all mathematical characters. They all translate either to latin or greek, here is the full list:
My suggestion would be for us to allow all characters that are normalized to greek with a scriptset of "ALL - GREEK". This means 1D6A8..1D7CB plus the micro sign. WDYT? |
Actually, ignore me. This is not a good proposal because we could have confusion between the mathematical symbol A and the ascii letter A. So we would need to judge characters based on their similarity, which is a job done by the Unicode standard. Therefore I can see two options:
I am really inclined to go with 1 though, because micro is a recognized SI unit and being unable to use it is very limiting IMO. |
@josevalim that makes sense, and as highlighted in PR, any mechanism for documenting exceptions is in the spirit of the UTF standard. (Edit: The following relates to the 'Greek micro, Common micro distinction is annoying' question only, not to the actual error) Data points on how it's handled 2 other places that I saw, one of which we may want to imitate:
So, while we may feel a bit cheesy doing it 'by hand', we'd be in good company! 😄 We could normalize like that too -- maintain a map of characters that should just always be translated to a specific 'canonical' character as needed. How to get around 'Restricted'? Subtract the by-hand list of mapped codepoints from the list of Restricted codepoints.
How to prevent mixed-script issues with eg. a Greek char in a Latin string? Canonicalize to the Common codepoint, not the Greek codepoint. Edit: ie, the mapping would be from |
@josevalim sorry for conversation in 2 places -- I posted with a potential root cause in the PR. |
Closing in favor of #11753. |
Environment
Current behavior
Errors when compiling the current
tesla
release from hex (1.4.4).This code has since been removed from tesla but they haven't made a new release yet. It was addressed in this PR and mentioned in this issue.
Expected behavior
This code works on 1.13 and should continue working on 1.14.
The text was updated successfully, but these errors were encountered: