-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refine vertical punctuation logic based on Unicode standard #3608
Conversation
Base the Unicode character blocks off of the official Unicode 9.0 character database. Refined the logic distinguishing upright, rotated, and neutral characters based on Unicode Technical Report 50 (with some simplifications). In particular, not everything in the General Punctuation block is treated as having neutral orientation; instead, the vertical punctuation table is consulted.
d62da3e
to
0dc9bad
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is amazing. Thank you @1ec5.
@@ -64,6 +65,7 @@ module.exports = { | |||
// 'Batak': (char) => char >= 0x1BC0 && char <= 0x1BFF, | |||
// 'Lepcha': (char) => char >= 0x1C00 && char <= 0x1C4F, | |||
// 'Ol Chiki': (char) => char >= 0x1C50 && char <= 0x1C7F, | |||
// 'Cyrillic Extended-C': (char) => char >= 0x1C80 && char <= 0x1C8F, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any idea how I might've missed this and other blocks the first time around?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This block is new to Unicode 9.0, whereas the file you consulted was based on Unicode 8.0. I've updated this file to point to the Unicode Character Database, which will make it easier for us to keep this file up-to-date in the future (by diffing between Unicode releases).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
//if (isChar['CJK Unified Ideographs Extension C'](char)) return true; | ||
//if (isChar['CJK Unified Ideographs Extension D'](char)) return true; | ||
//if (isChar['CJK Unified Ideographs Extension E'](char)) return true; | ||
//if (isChar['CJK Compatibility Ideographs Supplement'](char)) return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer if this code block were copied into mapbox/DEPRECATED-mapbox-gl#29 than left commented out in the codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See mapbox/DEPRECATED-mapbox-gl#29 (comment) for an updated list.
The list of Unicode character blocks is now based on the official Unicode 9.0 character database, whereas the previous table was based on an older Unicode standard.
The logic distinguishing upright, rotated, and neutral characters has been refined based on Unicode Technical Report #50 (proposed revision 16, which corresponds with Unicode 9.0).
Here are some example effects of this PR:
To keep things simple, the Letterlike Symbols and Number Forms blocks as a whole are considered neutrally oriented. We can refine that portion further if this simplification turns out to be problematic.
Depends on mapbox/mapbox-gl-test-suite#169 (although technically no test suite change was required because the differences were so minor).
/cc @lucaswoj @nickidlugash