unicorndecode is a port of the Text::Unicode perl library into lua. It attempts to take unicode characters and represent them in ASCII. It does this by removing accents or transliterating other languages into Roman characters - which can occasionally work well and sometimes not so well!
unicorndecode is installed via luarocks: It works out of the box with Lua 5.2/5.3, LuaJIT 2.0/2.1 and will work with Lua 5.1 if luabitop is installed.
luarocks install unicorndecode
The decode
function takes in a string and returns the unidecoded version of that string and whether the string includes utf8 characters or not.
Example unidecode:
local unicorndecode = require('unicorndecode')
decodedString, isUTF8 = unicorndecode.decode('Brontë')
In this case, decodedString
is Bronte
and isUTF8
is true
.
- The unidecode data comes from the perl Text::Unidecode library. As such, this library has all of the same caveats that Text::Unidecode does. It would be a good idea to read that page to understand when unicorndecode should be used.
- The
unidecode_data.lua
table is created from the JSON file generated in UnicodeConverter passed throughmisc_scripts/convert_json_to_lua_table.lua
.
This library is released under the MIT License