-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unicode lexing #14
Comments
Concrete examples:
|
The lexer does support unicode, but these letters are not categorized as upper or lower case according to Unicode. I think it'd be fairly easy to copy what GHC does for the various unusual character classes. For reference here's the full mapping: UppercaseLetter -> upper
LowercaseLetter -> lower
TitlecaseLetter -> upper
ModifierLetter -> uniidchar -- see #10196
OtherLetter -> lower -- see #1103
NonSpacingMark -> uniidchar -- see #7650
SpacingCombiningMark -> other_graphic
EnclosingMark -> other_graphic
DecimalNumber -> digit
LetterNumber -> digit
OtherNumber -> digit -- see #4373
ConnectorPunctuation -> symbol
DashPunctuation -> symbol
OpenPunctuation -> other_graphic
ClosePunctuation -> other_graphic
InitialQuote -> other_graphic
FinalQuote -> other_graphic
OtherPunctuation -> symbol
MathSymbol -> symbol
CurrencySymbol -> symbol
ModifierSymbol -> symbol
OtherSymbol -> symbol
Space -> space
_other -> non_graphic |
I have the basics of this working locally, but I want to do a bit of cleanup in how the code is emitted, I'll make a PR later today or in the next few days. |
Thanks Iavor! |
Ah I just started working on this an hour ago and didn't see your comment here in between, a classic race condition 😅 I opened #15 with my approach, feel free to close of course 👍 I tested that it fixes phadej/cabal-extras#131 as expected. |
Ah, no worries. I think your changes look good, I'll make separate tickets for the other changes I was thinking of. |
@amesgen
phadej/cabal-extras#131
lists a few issues where Unicode Characters cause lexing issues. Would it be possible to add proper support for this?
The text was updated successfully, but these errors were encountered: