Copyright 2017 Moddable Tech, Inc.
Revised: March 3, 2017
Warning: These notes are preliminary. Omissions and errors are likely. If you encounter problems, please ask for assistance.
When using JavaScript, the most obvious format to localize strings is a dictionary. Applications use common keys to access localized strings.
var en = {
"I love you": "I love you",
"Me neither": "Me neither",
};
var fr = {
"I love you": "Je t'aime",
"Me neither": "Moi non plus",
};
var language = fr;
function localize(it) {
return language[it];
}
It is not always possible to use the English string as the key, because of homonyms, contexts, etc. However, when it is possible, it is recommended: the code is easier to read and obvious redundancies are avoided.
The trivial examples here above are used to compare solutions here under. Of course real applications use dictionaries with thousands of entries, small and large keys and values, so multiply accordingly.
Since usually applications need only one language at a time, dictionaries can be JSON files, loaded and unloaded when the user selects a language.
{"I love you":"I love you","Me neither":"Me neither"}
{"I love you":"Je t'aime","Me neither":"Moi non plus"}
Storing JSON files in ROM is a waste since all dictionaries have to define all keys. For instance here above the keys I love you
and Me neither
are repeated in the English and French dictionaries.
ROM | Size |
---|---|
en.json | 53 bytes |
fr.json | 54 bytes |
Loading JSON files uses a lot of RAM. Firstly the keys populate the XS symbols table. Secondly a dictionary allocales one slots for itseft, one slot by entry and one chunk by string.
RAM | Size |
---|---|
symbols | 64 bytes |
en.json | 80 bytes |
fr.json | 84 bytes |
Instead of JSON files, dictionaries could be JavaScript modules that XS can compile, link and preload in ROM.
export default {"I love you":"I love you","Me neither":"Me neither"}
export default {"I love you":"Je t'aime","Me neither":"Moi non plus"}
That would avoid redundant keys in ROM and would use no RAM. However the process would still populate the XS symbols table with keys, use six slots by dictionary for the module, export and object, and use one slot by entry.
ROM | Size |
---|---|
symbols | 64 bytes |
en.js | 160 bytes |
fr.js | 164 bytes |
Moreover, since XS has no special case for such objects, the time to lookup a string in ROM would be significant for dictionaries with a lot of entries.
The first optimization is to use strings tables instead of JSON files or JavaScript modules. Each table begins with the length of the table followed by the offsets of the strings in the table. All numbers are little-endian 32-bit integers.
2 12 23 I love you Me neither
2 12 22 Je t'aime Moi non plus
Like most resources in Moddable applications, strings tables are never loaded into RAM. XS allows to use read-only strings, so JavaScript strings can refer to the strings in the strings tables themselves.
ROM | Size |
---|---|
locals.en.mhr | 34 bytes |
locals.fr.mhr | 35 bytes |
A tool can generate the strings table from the JSON files. Applications can get strings from the tables thru a host function.
Now something is of course necessary to map keys to indexes into the tables, so I love you
maps to 0
, Me neither
maps to 1
, etc.
Again a dictionary could be used, at least there would be only one dictionary for all languages.
var locals = {
"I love you": 0,
"Me neither": 1,
};
var en = new StringTable("locals.en.mhr");
var fr = new StringTable("locals.fr.mhr");
var language = fr;
function localize(it) {
return language.get(locals[it]);
}
But such a dictionary would have the already mentioned drawbacks: populating the XS symbols table and taking time to lookup an index.
When all keys and results are known, a perfect hash function can map keys into results without collisions. A practical solution is to use two hash functions and an intermediary table. The first hash function maps keys into seeds. The second hash function uses the seeds to maps keys into results. The seeds table can be sparse, but both the seeds and results tables contain only one entry by result. Finding the seeds take some time but is done by a tool at build time.
Here are some references:
- CMPH: a C library with a lot of algorithms and explanations.
- Steve Hanov's blog: a detailed presentation of the practical solution in PHP.
- mixu/perfect: a node.js port.
Here the results are indexes into the strings tables. So the results table does not need to be stored, the results table is only used to reorder the strings tables.
Only the seeds table needs to be stored. Negative seeds signal that the second hash function can be skipped, the index is the absolute of the seed minus one.
2 -2 -1
The keys themselves do not need to be stored if applications use only valid keys. For the sake of debugging invalid keys, a debug table can also be generated. The debug table contains the keys ordered by the results table. When a key is mapped into an index, the debug table can be used to check if the key matches. The debug table is appended to the seeds table.
2 -2 -1 2 24 35 I love you Me neither
With this technique:
- Tables are in ROM, but are smaller than JavaScript modules.
- Localization does not populate the XS symbols table.
- Lookups are significantly faster, independently of the size of the dictionaries.
ROM | Size |
---|---|
locals.mhi (release) | 12 bytes |
locals.mhi (debug) | 46 bytes |
mclocal is a command line tool that generates strings tables and seeds table from JSON files.
For instance
mclocal en.json fr.json
will generate the locals.en.mhr
, locals.fr.mhr
and locals.mhi
here above.
mclocal unions the keys from all the JSON files and reports missing keys in the JSON files.
By convention, mcconfig will generate a make file with a rule to call mclocal for all JSON files in a strings
directory.
mclocal file+ [-d] [-o directory] [-r name]
file+
: one or more JSON files.-d
: to generate the debug table.-o directory
: the output directory. Defaults to the current directory.-r name
: the name of the output file. Defaults tolocals
.
Piu defines a class, Locals
, to get localized strings and to switch languages.
var locals = new Locals;
The constructor takes two arguments, name
and language
. The defaults are locals
and en
. Resources are accessed by combining name
, language
and the .mhi
or .mhr
extensions.
Applications switch the language with an accessor.
var what = locals.get("I love you"); // what == "I love you"
locals.language = "fr";
var quoi = locals.get("I love you"); // quoi == "Je t'aime"
For convenience, applications can define a global function to localize strings.
global.localize = function(it) {
return locals.get(it);
}