This is the reduced-to-the-max re-write of liblouis in Rust.
Many if not most of the CVEs of liblouis are rooted in the manual memory management in the C version of liblouis.
Moving to Rust is of tremendous help not only for the solid memory management to avoid buffer overflow problems, but also to bring joy back into liblouis maintenance.
With the some small exceptions the parser is complete. Translation using only character and most translation opcodes basically works.
The original YAML test suite is supported and can be used to test the re-implementation.
Currently, the re-implementation passes 68% of the liblouis test suite successfully.
louis-rs is not a direct port of the liblouis C code to Rust. It uses the same tables and the same YAML tests but other than that it is a complete rewrite. It uses different data structures and does the translation using a different algorithm.
The goal is to be as compatible as possible with liblouis, when it makes sense.
Get help:
$ cargo run -- help
Translate some text:
$ cargo run -- translate ~/src/liblouis/tables/de-comp6.utb
> Guten Tag
⠈⠛⠥⠞⠑⠝⠀⠈⠞⠁⠛⠀
Test the parser:
$ cargo run -- parse
> nofor letter e 123-1
Letter { character: 'e', dots: Explicit([{Dot2, Dot1, Dot3}, {Dot1}]), constraints: EnumSet(Nofor) }
Run the tests in a YAML file:
$ LOUIS_TABLE_PATH=~/src/liblouis/tables cargo run -- check ~/src/liblouis/tests/braille-specs/de-de-comp8.yaml 2> /dev/null
================================================================================
8 tests run:
8 successes [100%]
0 failures [0%]
0 expected failures [0%]
0 unexpected successes [0%]
Run all YAML tests:
$ LOUIS_TABLE_PATH=~/src/liblouis/tables:~/src/liblouis ./target/release/louis check --summary ~/src/liblouis/tests/braille-specs/*.yaml ~/src/liblouis/tests/yaml/*.yaml 2> /dev/null
================================================================================
473240 tests run:
324695 successes [68.6%]
137803 failures [29.1%]
10319 expected failures [2.2%]
423 unexpected successes [0.1%]
Test the table query functionality:
$ LOUIS_TABLE_PATH=~/src/liblouis/tables cargo run -- query language=de,contraction=full
{"[...]/liblouis/tables/de-g2-detailed.ctb", "[...]/liblouis/tables/de-g2.ctb"}
- You need the Rust tool chain.
If you have any improvements or comments please feel free to file a pull request or an issue.
A lot of inspiration for the hand-rolled parser comes from the absolutely fantastic book Crafting Interpreters by Robert Nystrom. Surely Structure and Interpretation of Computer Programs has had some influence as must have the Compiler Construction classes with Niklaus Wirth (“as simple as possible but not simpler”).
The parser is built from the grammar used in tree-sitter-liblouis, which is a port of the EBNF grammar in rewrite-louis, which in turn is a just port of the Parsing expression grammar from louis-parser.
- [ ] Parse with context
- currently tables are parsed line by line. Opcodes have no idea whether a character or a class has been defined before
- Probably need to pass some context to the rule parser where character definitions and class names are kept
- this is solved with a two-pass compilation now. The first pass collects all relevant information and the second pass consequently uses that.
- [ ] (Emphasis and Caps) Indication
- presumably this could be done independently of translation, i.e. find indication locations and put them in the typeform array before even translating.
- [X] Add support for virtual dots
- Virtual dots are supported and are converted to Unicode Supplementary Private Use Area-A
- [ ] The correct, multipass and match opcodes
- [X] Currently the matching of input text against the rules is case
sensitive.
- [X] Make it case insensitive.
- [X] Now everything is case insensitive, even character definitions. This is probably not what we want. We might have to move the character definitions out of the trie into a separate structure.
- [X] Word boundaries so we could support beg- and endword.
- the unicode_segmentation crate would probably help. It has functions like split_word_bound_indices, that give you word bounds based on the Unicode standard.
- [X] Handle implicit braille definitions, i.e. ‘=’
- [ ] Typeforms
- [ ] Cursor handling
- [ ] Hyphenation
- will probably be delegated to the hyphenation crate
- [ ] Add an API so that the functionality can be used as a library
- [X] Table resolution based on metadata
- [ ] Display tables
- When testing the YAML files the display tables are used.
- However normal translation has currently no way to specify a display table
- [X] Handle undefined characters similarly to liblouis
- [ ] Use a well established FST or graph library as a bases
Copyright (C) 2023-2024 Swiss Library for the Blind, Visually Impaired and Print Disabled
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/.