You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to train NER by using WikiNER in multiple NLP libraries starting with flair. One thing I would like to share between all is an identical training corpus which flair does a great job in creating a train, dev, and test corpora. In addition, it uses BIOES format which I am interested in comparing it with IOB or IOB2.
Is there any way to easily save/export loaded datasets on disk in CoNLL format?
I would like to save corpus.train, corpus.dev, and corpus.test in CoNLL format on disk and share the same dataset between multiple NLP libraries to compare the final performance.
Many thanks,
Maziyar
The text was updated successfully, but these errors were encountered:
Hi @maziyarpanahi we have no in-built method for this, but you can write a simple method to write out the column format you need. I.e. you can iterate through all sentences in the three splits, then iterate over all tokens of each sentence and write to file the attributes you want.
Something like this:
# got through each sentenceforsentenceincorpus.dev:
# go through each token of sentencefortokeninsentence:
# print what you need (text and NER value)print(f"{token.text}\t{token.get_tag('ner').value}")
# print newline at end of each sentenceprint()
Hi,
I am trying to train NER by using WikiNER in multiple NLP libraries starting with flair. One thing I would like to share between all is an identical training corpus which flair does a great job in creating a train, dev, and test corpora. In addition, it uses BIOES format which I am interested in comparing it with IOB or IOB2.
Is there any way to easily save/export loaded datasets on disk in CoNLL format?
For instance:
I would like to save
corpus.train
,corpus.dev
, andcorpus.test
in CoNLL format on disk and share the same dataset between multiple NLP libraries to compare the final performance.Many thanks,
Maziyar
The text was updated successfully, but these errors were encountered: