Skip to content

Neural morphology

Mika Hämäläinen edited this page Nov 27, 2024 · 9 revisions

UralicNLP can handle out-of-vocabulary words thanks to its new neural fallback functionality.

Requirements

Natas is needed for neural models

pip install natas

How to use neural fallback

Just pass neural_fallback=True to your favorite functions:

from uralicNLP import uralicApi
uralicApi.generate("koirailla+V+Act+Ind+Prs+Sg1", "fin", neural_fallback=True)
>> [('koirailen', -0.0015927295899018645)]
uralicApi.analyze("hörpähdin", "fin", neural_fallback=True)
>> [('hörpähtää+V+Act+Ind+Prt+Sg1', -0.27097199857234955)]
uralicApi.lemmatize("nirhautan", "fin", neural_fallback=True)
>> ['nirhauttaa']

The methods also take in a parameter n_best that is 1 by default. Increasing the value will make the neural model predict more candidates. This is potentially useful for homonyms.

Data for training

If you are interested in training your own models, you can get all inflectional forms for a word by running the following:

from uralicNLP import uralicApi
uralicApi.get_all_forms("kissa", "N", "fin")

Just pass a lemma, its part of speech and language. Other possible arguments are descriptive=True (picks a descriptive or a normative FST), limit_forms=-1 (how many forms to generate) and filter_out=["#", "+Der", "+Cmp","+Err"] (the tags you do not want in the output).

If you get an error, you will need to install pip hfst-dev.

Cite

Hämäläinen, M., Partanen, N., Rueter, J., & Alnajjar, K. (2021). Neural Morphology Dataset and Models for Multiple Languages, from the Large to the Endangered. In Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa 2021)

Clone this wiki locally