Status: incomplete. Don't use yet.
You probably don't need a lemmatizer, but if you do, trefwurd's got you covered.
Trefwurd is..
- fast (20k unique tokens/s)
- lightweight (pure Python, zero dependencies)
- low memory footprint
- robust
- overridable, with custom exception lists
- easy to train
Trefwurd is compatible with Python 3.6 and up, because type annotations and f-strings are beautiful.
$ pip install trefwurd
Download pretrained lemmatization models.
$ python3 -m trefwurd download {iso-lang-code}
import trefwurd
lemmatizer = trefwurd.load("nl")
lemmatizer.lemmatize("honden", "NOUN")
lemmatizer.lemmatize([("honden", "NOUN"), ("eten", "VERB"), ("alles", "NOUN"))
lemmatizer.lemmatize(["honden", "eten", "alles"])
TODO: make table.
TODO: Um... Add tests.