-
Notifications
You must be signed in to change notification settings - Fork 31
Corpus cleanup: progress tracker
Clemens Neudecker edited this page Jun 9, 2016
·
11 revisions
This page allows tracking the progress of the corpus cleanup.
- enp_de.bio
L000001 - L002000]
- enp_fr.bio
L000001 - L082493]
- enp_de.bio
L000001 - L002000]
- enp_de.bio
L000001 - L002000]
- enp_de.bio
L000001 - L002000]
- enp_de.bio
L000001 - L002000]
enp_fr.bio- enp_de.bio
L000001 - L005000]
- fix use of B-/I- in compliance with CoNLL convention
replace B/I-LIEU with B/I-LOC and B/I-PERS with B/I-PER in French corpusassemble basic metadata (titles, issues, dates)harmonize files and directoriesremove unnecessary 'POS' tags (reduces file size considerably)remove metadata noise and empty files (0 bytes)consistently use spaces instead of tabs