sparknlp_ca

Repositori de recursos sparknlp pel català

Amb proves utilitzant el test de ANCORA_UD:

Total phrases to process: 1846

Correctly tokenized 1539

Incorrectly tokenized at least on token: 307

Percent correct: 0.8878656554712893

average pos: 0.9409626950727363

average ner: 0.9704804659904799

average lemma: 0.8026376091041432

Common errors in tokenization ''' [(("'", "'."), 6), (('Bages-Berguedà', 'Bages'), 5), (('Pel', 'P'), 5), (("'", '.'), 5), (('Mas-Colell', 'Mas'), 5), (("'n", "'"), 5), (('PSC-CpC', 'PSC'), 5), (("'ls", "'"), 5), (('"', ','), 5), (('bar-club', 'bar'), 5), (("'", "',"), 4), (('Jean-Marie', 'Jean'), 4), (('5%', '5'), 4), (("'", ','), 3), (('Jordi-Joan', 'Jordi'), 3), (('nord-oest', 'nord'), 3), (('"', '".'), 3), (('pel', 'p'), 3), (("l'", "l'11"), 3), ((')', ').'), 3)] '''

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.ipynb_checkpoints		.ipynb_checkpoints
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
ca_lemma_dict.tsv		ca_lemma_dict.tsv
catalan_tests.py		catalan_tests.py
convertPyTorch2TF_Transformers.py		convertPyTorch2TF_Transformers.py
convertPyTorch2TF_Transformersv2.py		convertPyTorch2TF_Transformersv2.py
lookups2sparklemma.py		lookups2sparklemma.py
metrics_tok.txt		metrics_tok.txt
proves_pipeline.ipynb		proves_pipeline.ipynb
spanish_pipeline.py		spanish_pipeline.py
tok_errors.tsv		tok_errors.tsv
tokenizer_notebook.ipynb		tokenizer_notebook.ipynb
tokenizer_tests.py		tokenizer_tests.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

sparknlp_ca

About

Releases 3

Packages

Contributors 2

Languages

License

projecte-aina/sparknlp_ca

Folders and files

Latest commit

History

Repository files navigation

sparknlp_ca

About

Resources

License

Stars

Watchers

Forks

Releases 3

Packages 0

Contributors 2

Languages

Packages