Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor code to handle provenance edge-attributes for "primary knowledge" apis #549

Closed
colleenXu opened this issue Jan 20, 2023 · 6 comments
Assignees
Labels
good first issue Good for newcomers

Comments

@colleenXu
Copy link
Collaborator

colleenXu commented Jan 20, 2023

When @tokebe and I were reviewing the PR for issue #463 , we noticed some issues with older code.

This code was meant to address "situation C": non-TRAPI apis that are generating knowledge/inferred-associations from data and should therefore be labeled as "primary knowledge sources". They also do not provide their edge-attributes in TRAPI format (if the ydid, we could ingest their edge-attributes the same way we ingest edge-attributes from TRAPI APIs).

The desired refactoring is:

  • rather than using a hard-coded list of API names (which can change as the API names change), add a property to APIs listed in the config API_LIST variable that says "this is a primary knowledge source"
  • update the list of APIs fall under "situation C" (aka have the property in their API_LIST object mentioned above). There are only 3: LitVar, Multiomics EHR Risk, Multiomics Wellness
    • Note that some multiomics and text-mining APIs are NOT on this list because they provide their edge-attributes in TRAPI format: Multiomics BigGIM-DrugResponse, Multiomics ClinicalTrials, Text Mining Targeted
  • Change the behavior: the kgEdge.inforesCuries (is this the infores of the API?) should be put into the value of a "primary knowledge source" edge-attribute along with the kgEdge.sources. Then there's a separate "aggregator knowledge source" edge-attribute with infores:biothings-explorer as the value.

Related: we want to discuss with @andrewsu if a BioThings API of a primary knowledge source then also counts as a primary knowledge source (maybe it doesn't because of the parsing / organizing of the data that happens?).

Discussed with Andrew 2023-01-23. He said this is correct, that BioThings APIs usually are NOT primary knowledge sources because of the parsing / organizing of data that is done. So the small list above is correct (we only want those labeled as primary knowledge sources).

Many BioThings APIs may be in this situation, like
  • BioThings BindingDB
  • BioThings GTRx
  • BioThings Rhea
  • BioThings SEMMEDDB
  • BioThings DDInter
  • BioThings iDISK
  • BioThings pfocr
  • DISEASES
@colleenXu colleenXu added the good first issue Good for newcomers label Jan 23, 2023
@tokebe
Copy link
Member

tokebe commented Jan 24, 2023

The primary code changes would be in knowledge_graph.js, with some code in the query_graph_handler index.js to pass the API_LIST in.

@colleenXu
Copy link
Collaborator Author

@tokebe are you available to test Rohan's PRs to see if the issue is addressed?

@colleenXu
Copy link
Collaborator Author

This issue is related to the Translator priority of "having primary_knowledge_source set for each Edge". I'll review @rjawesome's PRs ASAP.

First, I'll list my review of the current main-branch behavior:

  • Multiomics Wellness KP API:
    • it's the only one that uses the current "situation C" code and still falls into the category of "situation C"
    • When I query it, I get Edges that are missing a primary_knowledge_source (see screenshot below). So I'll be checking if this is fixed with the PRs.
  • Litvar:
    • the current behavior is fine - it has a source property in its x-bte annotation that's assigned as the primary_knowledge_source (dbsnp) while litvar itself is listed as an aggregator_knowledge_source (see screenshot below). So I'll check the behavior with the PRs and probably remove the "primary knowledge source" tag for Litvar.
  • Multiomics EHR risk kp api (formerly clinical risk kp api):
    • because the name changed, BTE is incorrectly treating it as "not situation C". So when I query it, I get Edges that are missing a primary_knowledge_source (see screenshot below). So I'll be checking if this is fixed with the PRs.
current Multiomics Wellness KP Edge: missing primary

Screen Shot 2023-03-09 at 12 45 40 PM

current Litvar Edge

Screen Shot 2023-03-09 at 1 00 15 PM

current Multiomics EHR risk KP Edge: missing primary

Screen Shot 2023-03-09 at 1 02 53 PM

@tokebe
Copy link
Member

tokebe commented Mar 28, 2023

Deployed to prod 🚀

@tokebe tokebe closed this as completed Mar 28, 2023
@colleenXu
Copy link
Collaborator Author

colleenXu commented Apr 18, 2023

I'm not sure that primary_knowledge_source is marked on all Edges. These are missing it...

  • If I query CTD through BTE (SmallMolecule MESH:C006303 -> Gene), I see that BTE isn't assigning CTD as the primary source. This is happening for all instances + my local instance (all main branches). This is related to this PR as well
  • this is happening with Multiomics Wellness and Multiomics EHR Risk too: BTE isn't assigning it as primary source (the PR was previously tested and working before...)

@colleenXu
Copy link
Collaborator Author

Closing in favor of #627

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants