Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct label and biological process issues for model annotation loader, possible other improvements #253

Open
kltm opened this issue May 8, 2018 · 16 comments
Assignees

Comments

@kltm
Copy link
Member

kltm commented May 8, 2018

Looking at the output of a loader run: http://tomodachi.berkeleybop.org/amigo/search/model_annotation

The values in the return documents that drive this interface are like:

      {
        "document_category":"model_annotation",
        "id":"gomodel:581e072c00000062|http://model.geneontology.org/581e072c00000062/581e072c00000098|FB:FBgn0005672|GO:0048019|",
        "annotation_unit":"gomodel:581e072c00000062|http://model.geneontology.org/581e072c00000062/581e072c00000098|FB:FBgn0005672|GO:0048019|",
        "annotation_unit_label":"gomodel:581e072c00000062 spi Dmel GO:0048019",
        "annotation_unit_label_searchable":"gomodel:581e072c00000062 spi Dmel GO:0048019",
        "model":"gomodel:581e072c00000062",
        "model_label":"EGFR_SPI_DMEL",
        "model_label_searchable":"EGFR_SPI_DMEL",
        "model_label_searchable":"EGFR_SPI_DMEL",
        "model_url":"http://noctua.berkeleybop.org/editor/graph/581e072c00000062.ttl",
        "model_state":"development",
        "model_date":"2016-11-07",
        "model_date_searchable":"2016-11-07",
        "model_date_searchable":"2016-11-07",
        "enabled_by":"FB:FBgn0005672",
        "enabled_by_searchable":"FB:FBgn0005672",
        "enabled_by_label":"spi Dmel",
        "enabled_by_label_searchable":"spi Dmel",
        "enabled_by_label_searchable":"spi Dmel",
        "function_class":"GO:0048019",
        "function_class_label":"GO:0048019",
        "function_class_label_searchable":"GO:0048019",
        "function_class_label_searchable":"GO:0048019",
        "function_class_closure_map":"{GO:0048019=null}",
        "location_list_closure_map":"{GO:0005615=null}",
        "owl_blob_json":"JSON_BLOB_REMOVED_FOR_BREVITY",
        "evidence_type":"ECO:0000315",
        "evidence_type_label":"ECO:0000315",
        "evidence_type_label_searchable":"ECO:0000315",
        "evidence_type_label_searchable":"ECO:0000315",
        "evidence_type_closure_map":"{ECO:0000315=null}",
        "reference":["PMID:7833286","PMID:7833286\t"],
        "contributor":["http://orcid.org/0000-0003-3212-6364"],
        "evidence_type_closure":["ECO:0000315"],
        "reference_searchable":[
          "PMID:7833286",
          "PMID:7833286\t",
          "PMID:7833286",
          "PMID:7833286\t"],
        "contributor_searchable":[
          "http://orcid.org/0000-0003-3212-6364",
          "http://orcid.org/0000-0003-3212-6364"],
        "location_list_closure":["GO:0005615"],
        "function_class_closure":["GO:0048019"],
        "location_list":["GO:0005615"],
        "location_list_label":["GO:0005615"],
        "score":1.0
}

There are several oddities in this load:

  • The id and annotation_unit are generated as a pipe-separated combination of model id, M, P, and C; it seems that one of those is missing as there is a trailing pipe
  • all *_searchable fields seem to be repeats; this may not be a "real" issue, practically speaking
  • reference and reference_searchable seem to contain tabs; they should not
  • all *_closure_map fields contain a "null", which would seem to indicate some issue with JSON deserialization or similar; the form should be id:label; note that this could merely be that there is something wrong with labels (see below)
  • for *_label*, in almost all cases, there are no labels, just ids; it could be that the ontology information to get label is not loading properly as part of the command or something deeper
  • there should be a field set process_class, just as there is function_class; no process fields exist

There are also a couple of issues that should be coordinated with other development, such as either switching to names for contributor (or making an id/label map), adding groups, etc.

While these may all be separate tickets, the tooling to get at the issues is similar for starters.

@kltm
Copy link
Member Author

kltm commented May 8, 2018

The used command line is:

java -Xms1024M -DentityExpansionLimit=4086000 -Djava.awt.headless=true -Xmx192G -jar ./java/lib/owltools-runner-all.jar http://purl.obolibrary.org/obo/go/extensions/go-gaf.owl http://purl.obolibrary.org/obo/eco.owl http://purl.obolibrary.org/obo/ncbitaxon/subsets/taxslim.owl http://purl.obolibrary.org/obo/cl/cl-basic.owl http://purl.obolibrary.org/obo/go/extensions/gorel.owl http://purl.obolibrary.org/obo/pato.owl http://purl.obolibrary.org/obo/po.owl http://purl.obolibrary.org/obo/chebi.owl http://purl.obolibrary.org/obo/uberon/basic.owl http://purl.obolibrary.org/obo/wbbt.owl http://purl.obolibrary.org/obo/go/extensions/go-modules-annotations.owl http://purl.obolibrary.org/obo/go/extensions/go-taxon-subsets.owl --log-info --merge-support-ontologies --merge-imports-closure --remove-subset-entities upperlevel --remove-disjoints --silence-elk --reasoner elk --solr-taxon-subset-name amigo_grouping_subset --solr-eco-subset-name go_groupings --remove-equivalent-to-nothing-axioms --solr-url http://localhost:8080/solr/ --solr-log /tmp/golr_timestamp.log --read-model-folder /home/bbop/local/src/git/noctua-models/models/ --read-model-url-prefix http://noctua.berkeleybop.org/editor/graph/ --solr-load-models

This requires a checkout of geneontology/noctua-models; I suspect that --solr-url should be set to mock to prevent actually attempting a load against a server.

@kltm
Copy link
Member Author

kltm commented May 8, 2018

Tagging @yy20716

@yy20716
Copy link
Contributor

yy20716 commented May 9, 2018

I began checking one of the issues, i.e. the field process_class is not generated. I speculate that this issue is also related to the namespace issue. Here is the trace:
- The main problem is that that the process map here is always empty.
- This process map is generated by checking whether bpSet.contains(bpType) from this part but the bpSet is always empty.
- The bpSet is (supposed to be) filled up by the method getAspect which uses graph.getNamespace, but it always return the null right now.

@cmungall I think that this happens because the input graph (e.g. go-gaf.owl?) does not include namespace tags anymore. This would work again if we could disable or revise the logic that checks namespace, but please let me know if you have any other suggestions. Thank you.

@cmungall
Copy link
Member

cmungall commented May 9, 2018 via email

@cmungall
Copy link
Member

cmungall commented May 9, 2018

bpSet = reasoner.getSubClassesOf(IRI.create("http://purl.obolibrary.org/obo/GO_0008150"))

@yy20716
Copy link
Contributor

yy20716 commented May 10, 2018

@cmungall I tried your approach but that one does not work because getSubClass does not accept IRI object.

reasoner.getSubClassesOf(IRI.create("http://purl.obolibrary.org/obo/GO_0008150"))

So I tried this way:

OWLClass c = graph.getManager().getOWLDataFactory().getOWLClass(IRI.create("http://purl.obolibrary.org/obo/GO_0008150"));

The problem is that the ontologies do not have the term GO_0008150 inside, e.g.,

System.out.println("c: " + c + " num: " + model.getDeclarationAxioms(c).size());

The above code prints

c: <http://purl.obolibrary.org/obo/GO_0008150> num: 0

i.e., no such instances in ontology. Because of that, the reasoner just returns owl:Nothing, thus the bpSet is still null. I guess that's why the previous codes refers to the namespace tag (instead of depending on the reasoner). Can you please advise what to do in this situation? I first thought that we could make and use the separate reasoner instance that loaded the full gene ontology, but maybe this is a way overkill.

@cmungall
Copy link
Member

cmungall commented May 10, 2018 via email

@yy20716
Copy link
Contributor

yy20716 commented May 10, 2018

  • Uh, so it turns out that I didn't deep-copy all ontologies but only did the ones in noctua-model, which is why the reasoning was not happening properly. I forgot the fact that the single OWLOntologyManager can contains multiple ontologies inside.

  • I revised the codes so that all loaded ontologies can be deep-copied. I now see that labels are properly generated but we do still not experience memory leaks. I will soon make a pull request that contains revised codes. Thank you so much for your patience.

@yy20716
Copy link
Contributor

yy20716 commented May 11, 2018

  • FYI. @cmungall, unfortunately I was not able to find the post I was talking about (it's possible that my memory is not correct or it was just my imagination). However, I got to know the usage of the copy method from this post, i.e. OWLOntologyManager.copyOntology does not copy imports which complains that deep-copying ontology does not process import statements (it is now solved). It's possible that that user would use the copyOntology method for a different purpose, but I thought that this method could be used to address our issue here.

  • Let me summarize the problem and the solution for you (and possibly others):

    • Using reasoners via owlapi (particularly elk) sometimes causes inferred or referred axioms remain in the ontology manager even after we call flush and dispose methods in owlapi, i.e. GC does or cannot collect memories used for such axioms even if they are not used anymore for some reasons. This is not noticeable or not a significant issue if we run the reasoner only one time or a few times, i.e. once the ontology manager is destroyed, all linked axioms and ontologies are automatically removed by GC (and therefore no memory leaks happen).

    • However, the situation here may be a bit unique, i.e., we do not destroy the ontology manager but keep using it to avoid repeated loading of tboxes. In other words, we import tbox only once (e.g., go-gaf.owl) and do reasoning over ttl files in noctua-models for generating solr indexes.

      • Essentially what happens here is that we repeat reasoning over abox + tbox, where (i) tbox is fixed and (ii) a different set of aboxes are used each time. The number of abox sets is often pretty large, e.g., approx 2k ttl files in the noctua's case. We observed that unreleased axioms are piling up in the ontology manager for each reasoning session, even if we create and use the reasoner instance for each reasoning step. This issue later causes OOM and halts the execution of owltools.
    • It's possible that there are different ways to clean up such axioms, but my observation was that such axioms are only released once the connected ontology manager object is destroyed. However, the challenge here is that we don't want to destroy this manager because we then have to import all tbox statements again, which takes a lot of time. It's possible that this bug comes from the elk but I was not fully sure about this.

    • One workaround I found is to deep-copy ontology instances into a temporary manager and destory the copied one once we finish using it. An overall flow can be summarized as follow:

      • As usual, we first import all axioms in tbox to the ontology manager
      • for each reasoning session, we do not directly create a reasoner over the tbox but
        • Create a temporary manager and deep-copy ontology instances from the initial manager to this new manager,
        • Create a new, temporary reasoner instance over the copied instance and do reasoning.
        • Dispose of all these temporary manager and reasoner.
    • Please note that this deep-copying method is available at OWLAPI, i.e., the ontology copy will create a new OWLOntology instance with same ontology annotations, same ID, and same axioms. Format, document IRI will also be copied over. This process does not take much time and memory because it is in-memory operation. This cannot be achieved using other external libraries such as Kyro or Google's cloning library, i.e. reasoners are not serializable.

@balhoff it's a bit late to ask this question to you, but we wonder whether you have experienced any similar issues before. If you have, could you please share your experience and approach? We thought that similar issues could arise in the minerva but maybe it was not that obvious like this case. Thank you.

p.s. as requested, I will see if I can reproduce this issue using simple, dummy ontologies.

@cmungall
Copy link
Member

@yy20716 thank you for the detailed analysis and write-up. @ignazio1977 is this something to bring up on the owlapi tracker (ignore the bulk of this ticket, just look at the comment above this one)

@ignazio1977
Copy link

ignazio1977 commented May 13, 2018

@cmungall Yes it sounds like an issue for OWLAPI. Not sure how this happens, as managers do not refer axioms directly - I'd guess there must be an ontology that doesn't get cleared when emptying the manager. An example, or a heap dump of a manager with the noctua-models loaded, would help.

@yy20716
Copy link
Contributor

yy20716 commented May 14, 2018

  • I began to think that this might not be the issue from OWLAPI but something deep in owltools causes them. I was working on making an example case with the same loader codes but I was not able to reproduce this issue, i.e. no memory leaks, while I still see the memory leak from the old owltools' codes. I believe that the cultprits would be OWLGraphWrapper* classes but I am still not sure about the root of the causes yet. But anyway I will continue trying before today and upload the test codes soon.

  • In my opinion, deep-copying ontologies with a temporary manager would still be a good workaround to avoid the memory leaks. If any other memory leaks would be observed again in the owltools (or other ones such as minerva), the same workaround could be applied for a while. However, my suggestion is that eventually we would need to simplify codes and build a minimal, dedicated loader, so that we can easily track the issue and prevent any similar issues in the future.

Thank you so much for your patience and understanding.

@yy20716
Copy link
Contributor

yy20716 commented May 14, 2018

Here is the repo that contains the testcodes I built: https://github.com/yy20716/owlapi-repeat-model-load

  • It loads four different noctua-model files using the same manager but I was not able to reproduce the issue. I hope that someone else could continue investigating this issue and nicely refactor owltools in the future. Thank you so much.

@cmungall
Copy link
Member

Thanks @yy20716! We can continue it from here.

@ignazio1977 we'll let you know if we can reproduce this purely in the context of the OWLAPI

@ignazio1977
Copy link

Reading the whole issue, one thing comes to mind: some reasoners register themselves as ontology change listeners with the ontology manager for their root ontology. Letting the reasoner variable go out of scope does not remove all references to the reasoner (the reasoner is still referenced by the ontology manager in its listener collection). In turn, a reasoner would hold references to ontologies and possibly axiom lists.

Usually a reasoner should unregister itself when the dispose() method is called. You might want to verify if this is the issue here by checking the list of ontology change listeners associated with the manager that appears to be leaking memory.

What I'd do is to ensure dispose() is called when I'm done with a reasoner and ensure the list of listeners does not increase in size once I've unloaded the abox that I no longer need to use. If the list is increasing in size, there must be extra listeners not being removed - that would provide a clue as to what is holding onto objects that are no longer needed.

Dropping the manager altogether and working with a new manager - i.e., the workaround you already have in place - would fit with this analysis, as this would effectively also drop all the listeners attached to the old manager. Cleaning the listeners would be much faster than copying the ontologies, though.

@fanavarro
Copy link

fanavarro commented Sep 9, 2020

Hi everyone. I found an strange behavior when copying ontologies with imports and I think it could be related with the issues you are discussing here. I've open an issue in owlapi if you are interested in.

Greetings!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants