nlprule

A fast, low-resource Natural Language Processing and Error Correction library written in Rust. nlprule implements a rule- and lookup-based approach to NLP using resources from LanguageTool.

Python Usage

Install: pip install nlprule

Use:

from nlprule import Tokenizer, Rules

tokenizer = Tokenizer.load("en")
rules = Rules.load("en", tokenizer)

rules.correct("He wants that you send him an email.")
# returns: 'He wants you to send him an email.'

rules.correct("I can due his homework.")
# returns: 'I can do his homework.'

for s in rules.suggest("She was not been here since Monday."):
    print(s.start, s.end, s.replacements, s.source, s.message)
# prints:
# 4 16 ['was not', 'has not been'] WAS_BEEN.1 Did you mean was not or has not been?

for sentence in tokenizer.pipe("A brief example is shown."):
    for token in sentence:
        print(
            repr(token.text).ljust(10),
            repr(token.span).ljust(10),
            repr(token.tags).ljust(24),
            repr(token.lemmas).ljust(24),
            repr(token.chunks).ljust(24),
        )
# prints:
# ''         (0, 0)     ['SENT_START']           []                       []                      
# 'A'        (0, 1)     ['DT']                   ['A', 'a']               ['B-NP-singular']       
# 'brief'    (2, 7)     ['JJ']                   ['brief']                ['I-NP-singular']       
# 'example'  (8, 15)    ['NN:UN']                ['example']              ['E-NP-singular']       
# 'is'       (16, 18)   ['VBZ']                  ['be', 'is']             ['B-VP']                
# 'shown'    (19, 24)   ['VBN']                  ['show', 'shown']        ['I-VP']                
# '.'        (24, 25)   ['.', 'PCT', 'SENT_END'] ['.']                    ['O']

# and every call here takes less than 1ms! (on an i5 8600k)

Rust Usage

Recommended setup:

Cargo.toml

[dependencies]
nlprule = "<version>"

[build-dependencies]
nlprule-build = "<version>" # must be the same as the nlprule version!

build.rs

fn main() {
    println!("cargo:rerun-if-changed=build.rs");

    nlprule_build::BinaryBuilder::new(
        &["en"],
        std::env::var("OUT_DIR").expect("OUT_DIR is set when build.rs is running"),
    )
    .build()
    .validate();
}

src/main.rs

use nlprule::{Rules, Tokenizer, tokenizer_filename, rules_filename};

fn main() {
    let mut tokenizer_bytes: &'static [u8] = include_bytes!(concat!(
        env!("OUT_DIR"),
        "/",
        tokenizer_filename!("en")
    ));
    let mut rules_bytes: &'static [u8] = include_bytes!(concat!(
        env!("OUT_DIR"),
        "/",
        rules_filename!("en")
    ));

    let tokenizer = Tokenizer::from_reader(&mut tokenizer_bytes).expect("tokenizer binary is valid");
    let rules = Rules::from_reader(&mut rules_bytes).expect("rules binary is valid");

    assert_eq!(
        rules.correct("She was not been here since Monday.", &tokenizer),
        String::from("She was not here since Monday.")
    );
}

nlprule and nlprule-build versions are kept in sync.

Main features

Rule-based Grammatical Error Correction through multiple thousand rules.
A text processing pipeline doing sentence segmentation, part-of-speech tagging, lemmatization, chunking and disambiguation.
Support for English, German and Spanish.
Spellchecking. (in progress)

Goals

A single place to apply spellchecking and grammatical error correction for a downstream task.
Fast, low-resource NLP suited for running:
1. as a pre- / postprocessing step for more sophisticated (i. e. ML) approaches.
2. in the background of another application with low overhead.
3. client-side in the browser via WebAssembly.
100% Rust code and dependencies.

Comparison to LanguageTool

	\|Disambiguation rules\|	\|Grammar rules\|	LT version	nlprule time	LanguageTool time
English	843 (100%)	3725 (~ 85%)	5.2	1	1.7 - 2.0
German	486 (100%)	2970 (~ 90%)	5.2	1	2.4 - 2.8
Spanish	Experimental support. Not fully tested yet.

See the benchmark issue for details.

Acknowledgements

All credit for the resources used in nlprule goes to LanguageTool who have made a Herculean effort to create high-quality resources for Grammatical Error Correction and broader NLP.

License

nlprule is licensed under the MIT license or Apache-2.0 license, at your option.

The nlprule binaries (*.bin) are derived from LanguageTool v5.2 and licensed under the LGPLv2.1 license. nlprule statically and dynamically links to these binaries. Under LGPLv2.1 §6(a) this does not have any implications on the license of nlprule itself.

Name		Name	Last commit message	Last commit date
Latest commit History 319 Commits
.github/workflows		.github/workflows
bench		bench
binaries		binaries
build		build
examples		examples
nlprule		nlprule
python		python
scripts		scripts
.gitignore		.gitignore
Cargo.toml		Cargo.toml
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
README.md		README.md
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

nlprule

Main features

Goals

Comparison to LanguageTool

Acknowledgements

License

About

Licenses found

Releases

Packages

Languages

License

Licenses found

zhang703652632/nlprule

Folders and files

Latest commit

History

Repository files navigation

nlprule

Main features

Goals

Comparison to LanguageTool

Acknowledgements

License

About

Resources

License

Licenses found

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages