Unicode characters are not recognised as alphanumeric on some versions of macOS #746

mbonani · 2017-12-12T23:13:34Z

aseba on windows (probably linux) accept in the code accentuate character.
on macosx it make error " indentifier must begin with _ or an alpha numeric character, found unicode character instead"
In the description of the language it is normally not permit. But generated code coming from blockly could have some. We should be consistent.

stephanemagnenat · 2017-12-13T13:31:55Z

Yes unicode identifiers should be supported, so this is a bug. On my mac (High Sierra) with my compiler and Qt from MacPorts, unicode works. My locale is:

LANG=
LC_COLLATE="C"
LC_CTYPE="UTF-8"
LC_MESSAGES="C"
LC_MONETARY="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_ALL=

However, with the version from the build server, it does not. So the problem is probably linked to the default setup in the C library or the Qt version used to compile Aseba from the build server.

stephanemagnenat · 2017-12-14T09:17:09Z

It seems some locales do not consider unicode characters to be alphanumeric. According to cpp reference.com, it could be forced. Maybe forcing the locale to "UTF8" could help.

Hence, a potential hack is to do:

std::locale utf8Locale("UTF8");
std::isalpha(c, utf8Locale);

in the Aseba compiler.

stephanemagnenat · 2018-01-09T15:50:41Z

Do we move that to 1.6.1?

mbonani · 2018-01-09T15:53:34Z

yes

ypiguet-epfl · 2018-02-23T10:34:56Z

The error message has typos which make it difficult to find in the source code. It's Identifiers must begin with _ or an alphanumeric character, found unicode character 0x?? instead where 0x?? stands for the hexadecimal unicode code point for the Basic Multilingual Plane (up to U+FFFF), something else beyond. It's for error ERROR_INVALID_IDENTIFIER thrown by method Aseba::Compiler::tokenize in aseba/compiler/lexer.cpp.

Suggestion: write your own unicode handling code and don't depend on Qt, c++ standard libraries, the OS, user settings, or any short-term hack.

cor3ntin · 2018-02-23T14:19:06Z

The underlying issue have been found, it's indeed because the tool rely on the underlying environment settings ( which it should not) and for some reason on OSX that environment may not be utf8.
Expect a fix in the coming weeks.

However, we should definitively not reinvent the wheel. Why locale support in the STL is somewhat lacking, it's top of the class in Qt, when used properly.

ypiguet-epfl · 2018-02-23T17:38:03Z

What about tools which use the compiler without a GUI, such as the switch? In mine, when compiled on macOS, I get the same error:

./vmshell --dis --src -
var é
Error at Line: 1 Col: 5 : Identifiers must begin with _ or an alphanumeric character, found unicode character 0xe9 instead
Compilation error.

Unicode character categories and utf-8 aren't rocket science. A platform-independent, Qt-independent, definitive solution would be nice.

cor3ntin · 2018-02-23T17:56:45Z

Unicode is anything but simple, especially if we don't want a dependency on icu ( which Qt has on most systems).
For this particular issue It should be doable to have a reasonably well behaving solution, don't worry.

ypiguet-epfl · 2018-02-24T10:28:58Z

While the whole unicode standard is large and prone to bugs, that isn't needed by the compiler or any Aseba non-graphical tool. Classifying code points between what's valid as first character in an identifier, what's valid as remaining characters in an identifier, and everything else would be enough. If by "unicode is anything but simple" you mean the text conversion to canonical forms, it's uncommon: Javascript explicitly doesn't do it for identifier matching, for instance. I'm not impressed by Qt unicode support, which fails to display reliably "e acute" with a combining acute accent (U+0065 U+0301) in Studio 1.6 on Mac.

I don't worry, I just hope there won't be dependencies difficult to fulfill or huge for the simple tools.

cor3ntin · 2018-02-24T11:13:41Z

See 09835e6#diff-0e1c4b3b848444324ad5dc75a312633cR195 for a temporary fix and a detailed explanation.

Nothing is simple if you want to do things properly, and I'm not a huge fan of half-backed, wheel reinventing solutions.

cor3ntin · 2018-03-07T16:27:39Z

It has been merge in master.
A PR for 1.6 has been made too #849

cor3ntin · 2018-03-07T17:25:00Z

Fixed

mbonani added the Mac OS specific label Dec 12, 2017

stephanemagnenat added the Bug label Dec 13, 2017

stephanemagnenat added this to the Version 1.6 milestone Dec 13, 2017

stephanemagnenat changed the title ~~caractarer like é ü are not accept to compile in macosx~~ Unicode characters are not recognised as alphanumeric on some versions of macOS Dec 13, 2017

stephanemagnenat modified the milestones: Version 1.6, Version 1.6.1 Jan 9, 2018

cor3ntin mentioned this issue Feb 24, 2018

Force utf8 locale while compiling #842

Closed

cor3ntin closed this as completed Mar 7, 2018

cor3ntin mentioned this issue Mar 9, 2018

Rework utf8 support #856

Open

MobsyaBot mentioned this issue Apr 17, 2018

Rework utf8 support Mobsya/aseba#81

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unicode characters are not recognised as alphanumeric on some versions of macOS #746

Unicode characters are not recognised as alphanumeric on some versions of macOS #746

mbonani commented Dec 12, 2017

stephanemagnenat commented Dec 13, 2017 •

edited

Loading

stephanemagnenat commented Dec 14, 2017 •

edited

Loading

stephanemagnenat commented Jan 9, 2018

mbonani commented Jan 9, 2018

ypiguet-epfl commented Feb 23, 2018

cor3ntin commented Feb 23, 2018

ypiguet-epfl commented Feb 23, 2018

cor3ntin commented Feb 23, 2018

ypiguet-epfl commented Feb 24, 2018

cor3ntin commented Feb 24, 2018

cor3ntin commented Mar 7, 2018

cor3ntin commented Mar 7, 2018

Unicode characters are not recognised as alphanumeric on some versions of macOS #746

Unicode characters are not recognised as alphanumeric on some versions of macOS #746

Comments

mbonani commented Dec 12, 2017

stephanemagnenat commented Dec 13, 2017 • edited Loading

stephanemagnenat commented Dec 14, 2017 • edited Loading

stephanemagnenat commented Jan 9, 2018

mbonani commented Jan 9, 2018

ypiguet-epfl commented Feb 23, 2018

cor3ntin commented Feb 23, 2018

ypiguet-epfl commented Feb 23, 2018

cor3ntin commented Feb 23, 2018

ypiguet-epfl commented Feb 24, 2018

cor3ntin commented Feb 24, 2018

cor3ntin commented Mar 7, 2018

cor3ntin commented Mar 7, 2018

stephanemagnenat commented Dec 13, 2017 •

edited

Loading

stephanemagnenat commented Dec 14, 2017 •

edited

Loading