`obonet` is incapable to parse the `definition` terms correctly in obo files #107

erikyao · 2021-07-20T20:59:33Z

Version

obonet==0.3.0

Related To

Priority

Low. Currently it's not an issue. Maybe an issue in the future.

Problem

The def field of obo ontology has a format of <def_string> [<dbxref>]. See GO.format.obo-1_4.html#S.2.2.

Library obonet will read such a field incorrectly into a whole string. E.g.

'"A ribonucleoprotein complex that contains an RNA molecule ..." [GOC:sgd_curators, PMID:10690410, PMID:14729943, PMID:7510714]'

However the the def fields within the current ChEBI obo file all have empty <dbxref> lists. Our current implenentation is to trim them from the string values of def fields. E.g.

 '"A macrocyclic lactone with a ring of twelve or more members derived from a polyketide." []'

will be trimmed to

'A macrocyclic lactone with a ring of twelve or more members derived from a polyketide.'

Note that the quotes inside will also be removed.

Our current implementation cannot handle any def field with a non-empty <dbxref> list.

Solution

pronto is another library to read obo files. It's more heavy-weight yet low-level. It has a clear class hierarchy but at the same time not well-documented. An alternative implementation to the OntologyReader in chebi_parser.py is ProntoOntologyReader.py.

Performance-wise:

pronto is about 4-times slower than obonet
- E.g. with rel201/chebi_lite.obo, ProntoOntologyReader uses ~150 seconds to load the file and generate all 146,183 documents, while our implementation with obonet uses only ~30 seconds.
pronto uses slightly more memory than obonet

We can also watch for the update of obonet on this issue.

The text was updated successfully, but these errors were encountered:

newgene · 2023-08-21T23:30:40Z

@DylanWelzel obonet now on v1.0.0. Let's re-evaluate if we still need to keep our local fix for this issue.

DylanWelzel · 2023-08-22T23:17:27Z

obonet v1.0.0 still does not correctly parse the def field. The local fix will stay but I've updated the obonet version to v1.0.0 in the requirements.

erikyao added the enhancement label Jul 20, 2021

erikyao self-assigned this Jul 20, 2021

erikyao changed the title ~~obonet is incapable to parse the definition terms in obo files~~ obonet is incapable to parse the definition terms correctly in obo files Jul 20, 2021

erikyao assigned newgene Jul 20, 2021

newgene assigned DylanWelzel Aug 21, 2023

newgene mentioned this issue Aug 21, 2023

load_obo parser needs some specific parsing on definition field biothings/pending.api#140

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`obonet` is incapable to parse the `definition` terms correctly in obo files #107

`obonet` is incapable to parse the `definition` terms correctly in obo files #107

erikyao commented Jul 20, 2021 •

edited

Loading

newgene commented Aug 21, 2023

DylanWelzel commented Aug 22, 2023

obonet is incapable to parse the definition terms correctly in obo files #107

obonet is incapable to parse the definition terms correctly in obo files #107

Comments

erikyao commented Jul 20, 2021 • edited Loading

Version

Related To

Priority

Problem

Solution

newgene commented Aug 21, 2023

DylanWelzel commented Aug 22, 2023

`obonet` is incapable to parse the `definition` terms correctly in obo files #107

`obonet` is incapable to parse the `definition` terms correctly in obo files #107

erikyao commented Jul 20, 2021 •

edited

Loading