-
-
Notifications
You must be signed in to change notification settings - Fork 7
Neural morphology
UralicNLP can handle out-of-vocabulary words thanks to its new neural fallback functionality.
Natas is needed for neural models
pip install natas
Just pass neural_fallback=True to your favorite functions:
from uralicNLP import uralicApi
uralicApi.generate("koirailla+V+Act+Ind+Prs+Sg1", "fin", neural_fallback=True)
>> [('koirailen', -0.0015927295899018645)]
uralicApi.analyze("hörpähdin", "fin", neural_fallback=True)
>> [('hörpähtää+V+Act+Ind+Prt+Sg1', -0.27097199857234955)]
uralicApi.lemmatize("nirhautan", "fin", neural_fallback=True)
>> ['nirhauttaa']
The methods also take in a parameter n_best that is 1 by default. Increasing the value will make the neural model predict more candidates. This is potentially useful for homonyms.
If you are interested in training your own models, you can get all inflectional forms for a word by running the following:
from uralicNLP import uralicApi
uralicApi.get_all_forms("kissa", "N", "fin")
Just pass a lemma, its part of speech and language. Other possible arguments are descriptive=True (picks a descriptive or a normative FST), limit_forms=-1 (how many forms to generate) and filter_out=["#", "+Der", "+Cmp","+Err"] (the tags you do not want in the output).
If you get an error, you will need to install pip hfst-dev
.
Hämäläinen, M., Partanen, N., Rueter, J., & Alnajjar, K. (2021). Neural Morphology Dataset and Models for Multiple Languages, from the Large to the Endangered. In Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa 2021)
UralicNLP is an open-source Python library by Mika Hämäläinen