Skip to content

liblouis/louis-rs

Repository files navigation

louis-rs: a liblouis re-implementation in Rust

This is the reduced-to-the-max re-write of liblouis in Rust.

Rationale

Many if not most of the CVEs of liblouis are rooted in the manual memory management in the C version of liblouis.

Moving to Rust is of tremendous help not only for the solid memory management to avoid buffer overflow problems, but also to bring joy back into liblouis maintenance.

Status

With the some small exceptions the parser is complete. Translation using only character and most translation opcodes basically works.

The original YAML test suite is supported and can be used to test the re-implementation.

Currently, the re-implementation passes 68% of the liblouis test suite successfully.

Relation to liblouis

louis-rs is not a direct port of the liblouis C code to Rust. It uses the same tables and the same YAML tests but other than that it is a complete rewrite. It uses different data structures and does the translation using a different algorithm.

The goal is to be as compatible as possible with liblouis, when it makes sense.

Usage

Get help:

$ cargo run -- help

Translate some text:

$ cargo run -- translate ~/src/liblouis/tables/de-comp6.utb 
> Guten Tag
⠈⠛⠥⠞⠑⠝⠀⠈⠞⠁⠛⠀

Test the parser:

$ cargo run -- parse
> nofor letter e 123-1
Letter { character: 'e', dots: Explicit([{Dot2, Dot1, Dot3}, {Dot1}]), constraints: EnumSet(Nofor) }

Run the tests in a YAML file:

$ LOUIS_TABLE_PATH=~/src/liblouis/tables cargo run -- check ~/src/liblouis/tests/braille-specs/de-de-comp8.yaml 2> /dev/null
================================================================================
8 tests run:
8 successes [100%]
0 failures [0%]
0 expected failures [0%]
0 unexpected successes [0%]

Run all YAML tests:

$ LOUIS_TABLE_PATH=~/src/liblouis/tables:~/src/liblouis ./target/release/louis check --summary ~/src/liblouis/tests/braille-specs/*.yaml ~/src/liblouis/tests/yaml/*.yaml 2> /dev/null 
================================================================================
473240 tests run:
324695 successes [68.6%]
137803 failures [29.1%]
10319 expected failures [2.2%]
423 unexpected successes [0.1%]

Test the table query functionality:

$ LOUIS_TABLE_PATH=~/src/liblouis/tables cargo run -- query language=de,contraction=full
{"[...]/liblouis/tables/de-g2-detailed.ctb", "[...]/liblouis/tables/de-g2.ctb"}

Prerequisites

Contributing

If you have any improvements or comments please feel free to file a pull request or an issue.

Acknowledgments

A lot of inspiration for the hand-rolled parser comes from the absolutely fantastic book Crafting Interpreters by Robert Nystrom. Surely Structure and Interpretation of Computer Programs has had some influence as must have the Compiler Construction classes with Niklaus Wirth (“as simple as possible but not simpler”).

The parser is built from the grammar used in tree-sitter-liblouis, which is a port of the EBNF grammar in rewrite-louis, which in turn is a just port of the Parsing expression grammar from louis-parser.

Todo [6/15]

  • [ ] Parse with context
    • currently tables are parsed line by line. Opcodes have no idea whether a character or a class has been defined before
    • Probably need to pass some context to the rule parser where character definitions and class names are kept
    • this is solved with a two-pass compilation now. The first pass collects all relevant information and the second pass consequently uses that.
  • [ ] (Emphasis and Caps) Indication
    • presumably this could be done independently of translation, i.e. find indication locations and put them in the typeform array before even translating.
  • [X] Add support for virtual dots
    • Virtual dots are supported and are converted to Unicode Supplementary Private Use Area-A
  • [ ] The correct, multipass and match opcodes
  • [X] Currently the matching of input text against the rules is case sensitive.
    • [X] Make it case insensitive.
    • [X] Now everything is case insensitive, even character definitions. This is probably not what we want. We might have to move the character definitions out of the trie into a separate structure.
  • [X] Word boundaries so we could support beg- and endword.
  • [X] Handle implicit braille definitions, i.e. ‘=’
  • [ ] Typeforms
  • [ ] Cursor handling
  • [ ] Hyphenation
  • [ ] Add an API so that the functionality can be used as a library
    • end expose it as a C ABI so that it can be used from other languages (see also cbindgen or even better Diplomat)
  • [X] Table resolution based on metadata
  • [ ] Display tables
    • When testing the YAML files the display tables are used.
    • However normal translation has currently no way to specify a display table
  • [X] Handle undefined characters similarly to liblouis
  • [ ] Use a well established FST or graph library as a bases
    • currently regular expressions are implemented using a simple directed acyclic graph. It would surely be better to use a well established library for that task such as rustfst, petgraph or graph.

License

Copyright (C) 2023-2024 Swiss Library for the Blind, Visually Impaired and Print Disabled

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/.

About

An experiment at re-writing liblouis in Rust

Resources

License

Stars

Watchers

Forks

Releases

No releases published