-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unicode characters are not recognised as alphanumeric on some versions of macOS #746
Comments
Yes unicode identifiers should be supported, so this is a bug. On my mac (High Sierra) with my compiler and Qt from MacPorts, unicode works. My locale is:
However, with the version from the build server, it does not. So the problem is probably linked to the default setup in the C library or the Qt version used to compile Aseba from the build server. |
It seems some locales do not consider unicode characters to be alphanumeric. According to cpp reference.com, it could be forced. Maybe forcing the locale to "UTF8" could help. Hence, a potential hack is to do:
in the Aseba compiler. |
Do we move that to 1.6.1? |
yes |
The error message has typos which make it difficult to find in the source code. It's Identifiers must begin with _ or an alphanumeric character, found unicode character 0x?? instead where 0x?? stands for the hexadecimal unicode code point for the Basic Multilingual Plane (up to U+FFFF), something else beyond. It's for error Suggestion: write your own unicode handling code and don't depend on Qt, c++ standard libraries, the OS, user settings, or any short-term hack. |
The underlying issue have been found, it's indeed because the tool rely on the underlying environment settings ( which it should not) and for some reason on OSX that environment may not be utf8. However, we should definitively not reinvent the wheel. Why locale support in the STL is somewhat lacking, it's top of the class in Qt, when used properly. |
What about tools which use the compiler without a GUI, such as the switch? In mine, when compiled on macOS, I get the same error:
Unicode character categories and utf-8 aren't rocket science. A platform-independent, Qt-independent, definitive solution would be nice. |
Unicode is anything but simple, especially if we don't want a dependency on icu ( which Qt has on most systems). |
While the whole unicode standard is large and prone to bugs, that isn't needed by the compiler or any Aseba non-graphical tool. Classifying code points between what's valid as first character in an identifier, what's valid as remaining characters in an identifier, and everything else would be enough. If by "unicode is anything but simple" you mean the text conversion to canonical forms, it's uncommon: Javascript explicitly doesn't do it for identifier matching, for instance. I'm not impressed by Qt unicode support, which fails to display reliably "e acute" with a combining acute accent (U+0065 U+0301) in Studio 1.6 on Mac. I don't worry, I just hope there won't be dependencies difficult to fulfill or huge for the simple tools. |
See 09835e6#diff-0e1c4b3b848444324ad5dc75a312633cR195 for a temporary fix and a detailed explanation. Nothing is simple if you want to do things properly, and I'm not a huge fan of half-backed, wheel reinventing solutions. |
It has been merge in master. |
Fixed |
aseba on windows (probably linux) accept in the code accentuate character.
on macosx it make error " indentifier must begin with _ or an alpha numeric character, found unicode character instead"
In the description of the language it is normally not permit. But generated code coming from blockly could have some. We should be consistent.
The text was updated successfully, but these errors were encountered: