Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add twitter-cldr for bidirectional text support #1841

Closed
wants to merge 2 commits into from

Conversation

mikemorris
Copy link
Contributor

This looks like it's currently inflating the size of the browserify'd package by over 1MB. I'm not sure how much of this could be avoidable by extracting the bidi logic from https://github.com/twitter/twitter-cldr-js (ugh, CoffeeScript) - right now this just pulls in the entire en.min.js localization file with datetime logic and such too. I THINK the bidi logic is templated and identical across all locales?

BUT, at least preliminarily, It Works™

Before
screen shot 2015-12-15 at 6 19 55 pm

After
screen shot 2015-12-15 at 6 19 08 pm

Refs mapbox/DEPRECATED-mapbox-gl#4

@@ -1,6 +1,8 @@
'use strict';

var resolveTokens = require('../util/token');
var TwitterCLDR = require('../../node_modules/twitter_cldr/min/en.min.js');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't rely on node_modules path, you can't just do require('twitter_cldr/min/en.min.js')

@mourner
Copy link
Member

mourner commented Dec 16, 2015

In theory, if we write a simple bidi algorithm implementation, we could cut this down to just a few tens of kilobytes. All we need is the codepoint mapping to see if a particular character is RTL or not:

Total 14kb.

@mikemorris
Copy link
Contributor Author

Extracted the bidi logic from twitter-cldr-js in 0ed31d5 and dropped the uncompressed size down to ~25kb. 5d1d891 is supposedly faster, but balloons bidi_classes.json up from ~8kb to over 40kb - should we rewrite this to be smaller, or is there a compression tool in the browserify pipeline that will rewrite the verbose keys?

@mourner
Copy link
Member

mourner commented Dec 16, 2015

5d1d891 is supposedly faster, but balloons bidi_classes.json up from ~8kb to over 40kb

I think the structure rewrite can be easily done in JS (once), not statically.

@camertron
Copy link

@mourner The codepoint mapping is where most of the size of the bidi feature comes from. When I initially wrote the feature I struggled to compress the mapping into a reasonable size. I'm sure more could be done in this regard (prefix tree, perhaps?).

@mikemorris
Copy link
Contributor Author

Alright, finally got a chance to look at this again. Most of the weight appears to come from the "bidiClass", "min" and "max" key labels. I'm going to try writing a script to generate data from the Unicode source tables in a format like the following:

{
    "BN": [[0,8],[14,27]],
    "B": [[10,10],[13,13],[28,30]]
    "S": [[9,9],[11,11],[31,31]],
    "WS": [[12,12],[32,32]]
}

There are a few variations I want to benchmark for performance and size, one with single-entry arrays for the single-codepoint ranges, and another splitting out a big array for each bidiClass for the single-codepoint entries and a second array for the ranges. I think Variant B (below) might be the best option.

Variant A

{
    "BN": [[0,8],[14,27]],
    "B": [[10],[13],[28,30]]
    "S": [[9],[11],[31]],
    "WS": [[12],[32]]
}

for (var bidiClass in Object.keys(data)) {
    var group = data[bidiClass];
    for (var range in group) {
        if (range.length === 1 && codepoint === range[0]) {
            return bidiClass;
        } else {
            if (codepoint >= range[0] && codepoint <= range[1]) return bidiClass;
        }
    }
}

Variant B

{
    "BN": [[],[[0,8],[14,27]]],
    "B": [[10,13],[[28,30]]]
    "S": [[9,11,31],[]],
    "WS": [[12,32],[]]
}

for (var bidiClass in Object.keys(data)) {
    var group = data[bidiClass];
    var points = group[0];
    var ranges = group[1];

    if (points.indexOf(codepoint) !== -1) {
        return bidiClass;
    } else {
        for (var range in ranges) {
            if (codepoint >= range[0] && codepoint <= range[1]) return bidiClass;
        }
    }
}

@camertron I'm not actually sure what a prefix tree is, would love a little more detail if you think it's a worthwhile approach.

@mikemorris
Copy link
Contributor Author

@camertron
Copy link

@mikemorris Here's the code that generates them.

@lucaswoj
Copy link
Contributor

Have you considered formats other than JSON for storing this data? In particular, the StructArray class might enable a more compact representation.

@lucaswoj
Copy link
Contributor

I am closing this PR because it has been inactive for a long time. The branch isn't going anywhere so please keep working on this feature and re-open the PR when it is ready!

Let me know if you have any questions.

@lucaswoj lucaswoj closed this Jul 20, 2016
@jfirebaugh jfirebaugh deleted the twitter-cldr-bidi branch February 3, 2017 18:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants