How to save/export flair's datasets corpus in CoNLL format #988

maziyarpanahi · 2019-08-10T13:01:47Z

Hi,

I am trying to train NER by using WikiNER in multiple NLP libraries starting with flair. One thing I would like to share between all is an identical training corpus which flair does a great job in creating a train, dev, and test corpora. In addition, it uses BIOES format which I am interested in comparing it with IOB or IOB2.

Is there any way to easily save/export loaded datasets on disk in CoNLL format?

For instance:

import flair.datasets
corpus = flair.datasets.WIKINER_ENGLISH()

I would like to save corpus.train, corpus.dev, and corpus.test in CoNLL format on disk and share the same dataset between multiple NLP libraries to compare the final performance.

Many thanks,
Maziyar

The text was updated successfully, but these errors were encountered:

alanakbik · 2019-08-12T07:33:42Z

Hi @maziyarpanahi we have no in-built method for this, but you can write a simple method to write out the column format you need. I.e. you can iterate through all sentences in the three splits, then iterate over all tokens of each sentence and write to file the attributes you want.

Something like this:

# got through each sentence
for sentence in corpus.dev:

    # go through each token of sentence
    for token in sentence:
        # print what you need (text and NER value)
        print(f"{token.text}\t{token.get_tag('ner').value}")

    # print newline at end of each sentence
    print()

maziyarpanahi · 2019-08-13T22:42:46Z

This is just beautiful! Easy, clean, and it works great!

Thanks a lot mate 👍

maziyarpanahi added the question Further information is requested label Aug 10, 2019

maziyarpanahi closed this as completed Aug 13, 2019

chelseagzr mentioned this issue Jul 3, 2024

[Feature]: Function to write a ColumnCorpus instance to files #3488

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to save/export flair's datasets corpus in CoNLL format #988

How to save/export flair's datasets corpus in CoNLL format #988

maziyarpanahi commented Aug 10, 2019

alanakbik commented Aug 12, 2019

maziyarpanahi commented Aug 13, 2019

How to save/export flair's datasets corpus in CoNLL format #988

How to save/export flair's datasets corpus in CoNLL format #988

Comments

maziyarpanahi commented Aug 10, 2019

alanakbik commented Aug 12, 2019

maziyarpanahi commented Aug 13, 2019