Suggestion for feature improvement: Return dictionary with parsed references #23

walter-hernandez · 2020-08-07T15:26:40Z

Hello all:

A suggestion for an improvement in NeuralParscit is to return a dictionary with the tokens with the same label. For example:

After parsing the reference:
"Calzolari, N. (1982) Towards the organization of lexical definitions on a database structure. In E. Hajicova (Ed.), COLING '82 Abstracts, Charles University, Prague, pp.61-64."

the result would be:
'author author date title title title title title title title title title title editor editor editor editor booktitle booktitle booktitle editor institution location pages'

However, it could rather return a dictionary like this:
{'author': ['Calzolari,', 'N.'], 'date': ['(1982)'], 'title': ['Towards', 'the', 'organization', 'of', 'lexical', 'definitions', 'on', 'a', 'database', 'structure.'], 'editor': ['In', 'E.', 'Hajicova', '(Ed.),', 'Charles'], 'booktitle': ['COLING', "'82", 'Abstracts,'], 'institution': ['University,'], 'location': ['Prague,'], 'pages': ['pp.61-64.']}

The dictionary above could later be used for detokenize the lists and get something like:
{'author': 'Calzolari, N.',
'date': '(1982)',
'title': 'Towards the organization of lexical definitions on a database structure.',
'editor': 'In E. Hajicova (Ed.), Charles',
'booktitle': "COLING '82 Abstracts,",
'institution': 'University,',
'location': 'Prague,',
'pages': 'pp.61-64.'}

The code to get something like the above would be:
`result_parsing = neural_parscit.predict_for_text(text=reference, show=False)
result_parsing = [t for t in result_parsing.split(" ")]

result_dict = {}

for token, token_label in zip(reference_tokenized, result_parsing):
if token_label not in result_dict.keys():
result_dict[token_label] = []

result_dict[token_label].append(token)

detokenize everything

result_dict = {k:md.detokenize(v) for k,v in result_dict.items()}`

The detokenizer used is MosesDetokenizer, which is in the library sacremoses

The text was updated successfully, but these errors were encountered:

abhinavkashyap · 2020-08-10T02:02:43Z

@walter-hernandez Thanks for the request. Are you suggesting the dcitionary output in addition to the string being returned or as the only way to obtain output from predict_for_text

walter-hernandez · 2020-08-10T09:21:17Z

The dictionary output could be an additional feature to obtain an output from predict_for_text

walter-hernandez changed the title ~~Feature improvement: Return dictionary with parse references~~ Suggestion for feature improvement: Return dictionary with parsed references Aug 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestion for feature improvement: Return dictionary with parsed references #23

Suggestion for feature improvement: Return dictionary with parsed references #23

walter-hernandez commented Aug 7, 2020

abhinavkashyap commented Aug 10, 2020

walter-hernandez commented Aug 10, 2020

Suggestion for feature improvement: Return dictionary with parsed references #23

Suggestion for feature improvement: Return dictionary with parsed references #23

Comments

walter-hernandez commented Aug 7, 2020

detokenize everything

abhinavkashyap commented Aug 10, 2020

walter-hernandez commented Aug 10, 2020