Refine vertical punctuation logic based on Unicode standard #3608

1ec5 · 2016-11-14T07:10:24Z

The list of Unicode character blocks is now based on the official Unicode 9.0 character database, whereas the previous table was based on an older Unicode standard.

The logic distinguishing upright, rotated, and neutral characters has been refined based on Unicode Technical Report #50 (proposed revision 16, which corresponds with Unicode 9.0).

Here are some example effects of this PR:

In a vertical “地圖©地圖盒子”, “©” is displayed upright instead of rotated.
In a vertical “地……圖”, “…” is displayed as an upright “︙” instead of a rotated “…”, so it is centered horizontally and no longer runs into the following CJK character.
In a vertical “地圖！”, “！” is displayed as an upright “︕”.

To keep things simple, the Letterlike Symbols and Number Forms blocks as a whole are considered neutrally oriented. We can refine that portion further if this simplification turns out to be problematic.

Depends on mapbox/mapbox-gl-test-suite#169 (although technically no test suite change was required because the differences were so minor).

/cc @lucaswoj @nickidlugash

Base the Unicode character blocks off of the official Unicode 9.0 character database. Refined the logic distinguishing upright, rotated, and neutral characters based on Unicode Technical Report 50 (with some simplifications). In particular, not everything in the General Punctuation block is treated as having neutral orientation; instead, the vertical punctuation table is consulted.

lucaswoj

This is amazing. Thank you @1ec5.

lucaswoj · 2016-11-14T17:32:44Z

js/util/is_char_in_unicode_block.js

@@ -64,6 +65,7 @@ module.exports = {
    // 'Batak': (char) => char >= 0x1BC0 && char <= 0x1BFF,
    // 'Lepcha': (char) => char >= 0x1C00 && char <= 0x1C4F,
    // 'Ol Chiki': (char) => char >= 0x1C50 && char <= 0x1C7F,
+    // 'Cyrillic Extended-C': (char) => char >= 0x1C80 && char <= 0x1C8F,


Any idea how I might've missed this and other blocks the first time around?

This block is new to Unicode 9.0, whereas the file you consulted was based on Unicode 8.0. I've updated this file to point to the Unicode Character Database, which will make it easier for us to keep this file up-to-date in the future (by diffing between Unicode releases).

lucaswoj · 2016-11-14T17:34:23Z

js/util/script_detection.js

+    //if (isChar['CJK Unified Ideographs Extension C'](char)) return true;
+    //if (isChar['CJK Unified Ideographs Extension D'](char)) return true;
+    //if (isChar['CJK Unified Ideographs Extension E'](char)) return true;
+    //if (isChar['CJK Compatibility Ideographs Supplement'](char)) return true;


I'd prefer if this code block were copied into mapbox/DEPRECATED-mapbox-gl#29 than left commented out in the codebase.

→ mapbox/DEPRECATED-mapbox-gl#29 (comment)

See mapbox/DEPRECATED-mapbox-gl#29 (comment) for an updated list.

1ec5 added the bug 🐞 label Nov 14, 2016

1ec5 self-assigned this Nov 14, 2016

1ec5 force-pushed the 1ec5-vertical-punctuation-tr branch from d62da3e to 0dc9bad Compare November 14, 2016 07:15

lucaswoj approved these changes Nov 14, 2016

View reviewed changes

Removed commented-out supplementary plane entries

0f523a7

See mapbox/DEPRECATED-mapbox-gl#29 (comment) for an updated list.

1ec5 mentioned this pull request Nov 14, 2016

Require vertical form of colon, ellipsis mapbox/mapbox-gl-test-suite#169

Merged

1ec5 merged commit f1d1bf4 into master Nov 14, 2016

1ec5 deleted the 1ec5-vertical-punctuation-tr branch November 14, 2016 22:25

1ec5 mentioned this pull request Feb 10, 2017

Upright CJK characters in vertically-oriented labels mapbox/mapbox-gl-native#7114

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refine vertical punctuation logic based on Unicode standard #3608

Refine vertical punctuation logic based on Unicode standard #3608

1ec5 commented Nov 14, 2016

lucaswoj left a comment

lucaswoj Nov 14, 2016

1ec5 Nov 14, 2016

lucaswoj Nov 14, 2016

lucaswoj Nov 14, 2016

1ec5 Nov 14, 2016

Refine vertical punctuation logic based on Unicode standard #3608

Refine vertical punctuation logic based on Unicode standard #3608

Conversation

1ec5 commented Nov 14, 2016

lucaswoj left a comment

Choose a reason for hiding this comment

lucaswoj Nov 14, 2016

Choose a reason for hiding this comment

1ec5 Nov 14, 2016

Choose a reason for hiding this comment

lucaswoj Nov 14, 2016

Choose a reason for hiding this comment

lucaswoj Nov 14, 2016

Choose a reason for hiding this comment

1ec5 Nov 14, 2016

Choose a reason for hiding this comment