Repositori de recursos sparknlp pel català
Amb proves utilitzant el test de ANCORA_UD:
Total phrases to process: 1846
Correctly tokenized 1539
Incorrectly tokenized at least on token: 307
Percent correct: 0.8878656554712893
average pos: 0.9409626950727363
average ner: 0.9704804659904799
average lemma: 0.8026376091041432
Common errors in tokenization ''' [(("'", "'."), 6), (('Bages-Berguedà', 'Bages'), 5), (('Pel', 'P'), 5), (("'", '.'), 5), (('Mas-Colell', 'Mas'), 5), (("'n", "'"), 5), (('PSC-CpC', 'PSC'), 5), (("'ls", "'"), 5), (('"', ','), 5), (('bar-club', 'bar'), 5), (("'", "',"), 4), (('Jean-Marie', 'Jean'), 4), (('5%', '5'), 4), (("'", ','), 3), (('Jordi-Joan', 'Jordi'), 3), (('nord-oest', 'nord'), 3), (('"', '".'), 3), (('pel', 'p'), 3), (("l'", "l'11"), 3), ((')', ').'), 3)] '''