Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update BTE to biolink-model 2.4.8 #473

Closed
6 tasks done
colleenXu opened this issue Jul 27, 2022 · 10 comments
Closed
6 tasks done

update BTE to biolink-model 2.4.8 #473

colleenXu opened this issue Jul 27, 2022 · 10 comments

Comments

@colleenXu
Copy link
Collaborator

colleenXu commented Jul 27, 2022

Multiomics EHR risk kp api needs to have 2.4.8 since it has new predicates it needs to use

Process to update biolink (mostly @tokebe, some me)

  • Dev makes PR in biolink-model repo
  • CX Reviews 2.4.8 (differences from current 2.2.13)
  • basic testing of PR
  • merging it to main branch + deploying it

Parallel process (mostly me)

  • Updating Multiomics EHR risk kp api x-bte annotation
  • Updating x-bte annotation if needed for the 2.4.8 change...
@colleenXu
Copy link
Collaborator Author

@tokebe
Copy link
Member

tokebe commented Jul 27, 2022

PR here

@colleenXu
Copy link
Collaborator Author

colleenXu commented Aug 1, 2022

Differences between 2.2.13 (current) and 2.4.8 (upgrading-to) that we might care about (mostly for CX to deal with)


Predicates:

  • deprecated
    • biolink:has_real_world_evidence_of_association_with (line 1733)
    • biolink:approved_to_treat (line 4822; so its inverse is deprecated too)?
  • added
    • biolink:associated_with_likelihood_of (starts line 1745) / biolink:likelihood_associated_with. Likely won't make any x-bte annotation to this; instead use the more specific children terms
    • biolink:associated_with_increased_likelihood_of (starts line 1758) / biolink:increased_likelihood_associated_with. For multiomics ehr risk kp api
    • biolink:associated_with_decreased_likelihood_of (starts line 1758) / biolink:decreased_likelihood_associated_with. For multiomics ehr risk kp api
    • biolink:target_for (starts line 1778) / biolink:has_target: specifically for a gene-disease relationship (A gene is a target of a disease when its products are druggable and when a drug interaction with the gene product could have a therapeutic effect)
      • could use with the edge-attribute "has_evidence" where the value is something in the druggable_gene_category_enum (aka DrugCentral / Pharos stuff; starts line 244) (starts line 10629)
    • biolink:assesses (starts line 2014) / biolink:is_assessed_by: like a chemical's affect was assessed in an assay on a protein/tissue/phenotype/organism/cell-line....
  • inverse added:
    • biolink:has_active_component (starts line 1807): inverse of active_in, used to relate a CellularComponent to a GeneOrGeneProduct
    • biolink:has_predisposing_factor (starts line 4587): inverse of predisposes
    • biolink:is_ameliorated_by (starts line 4717): inverse of ameliorates
    • biolink:is_exacerbated_by (starts line 4742): inverse of exacerbates
    • biolink:occurs_in_disease (starts line 6350)
  • changed spelling:
    • has_capability (was capability_of before...; line 5705)
  • usage:
    • use part_of to relate MolecularActivity (aka reaction) to Pathway it participates in

SEMMEDDB mappings:

  • predicates:
    • biolink:associated_with mapped to SEMMEDDB:associated_with (line 1522)
    • biolink:interacts_with mapped to SEMMEDDB:interacts_with (line 2049; was mapped before to physically_interacts_with)
    • biolink:diagnoses mapped to SEMMEDDB:diagnoses (line 2246; was mapped before to has_biomarker)
    • biolink:predisposes mapped to SEMMEDDB:predisposes (line 4585)
    • broad-mapping of biolink:exacerbates to SEMMEDDB:complicates (line 4740)
  • node categories / semantic types
    • narrow-mapping of ChemicalEntity to STY:T129 (aka imft / Immunologic Factor) (line 7997)
    • narrow-mapping of SmallMolecule to a bunch of things (starts line 8024): bacs / hops / horm / inch / orch / carb / eico / lipd / nsba / opco / strd / vitamin

Prefixes:

  • INXIGHT (https://drugs.ncats.io/)? ncats.drug: 'https://drugs.ncats.io/drug/' (line 84). Added as a prefix for ChemicalEntity (line 8005), MolecularMixture (line 8185), Drug (line 8396), Protein (line 9000)
  • bioplanet (already changed biothings bioplanet yamls to use this prefix): ncats.bioplanet: 'https://tripod.nih.gov/bioplanet/detail.jsp?pid=' (line 103). Added as a prefix for Pathway (line 8317)
  • CPT is a prefix for Procedure (line 7785)

Node or edge-attributes:

  • deprecates:
    • biolink:source (line 738)
    • biolink:original_knowledge_source (line 6770)
  • adds:
    • biolink:has_chemical_role (lines 1289-1294). But the spelling is confusing...line 7195 introduces a chemical_role as an attribute
    • biolink:routes_of_delivery (lines 1348-1353) but gives a specific set of values (drug_delivery_enum, in lines 278-283): inhalation, oral, absorbtion through the skin (TYPO), intravenous injection
    • multiomics ehr risk kp api attributes? (starts line 7046) "supporting study"
  • biolink:provided_by as a node-attribute to say which knowledge-provider created/assembled node and its attributes (line 6754)

Other (probably not important to us right now, but good to know)

  • Starting to add "opposite" relationships for predicates/attributes (node ones??) using biolink:opposite_of in the predicate's annotations section
  • CHEMBL.MECHANISM:substrate related-mapping to biolink:increases_degradation_of (line 3135). Hmm...
  • TYPO line 3169 value: increases degredation of
  • deprecates node categories / semantic types GeneOntologyClass (line 7297), UnclassifiedOntologyClass (line 7300), Nutrient (line 8415; and its children Macronutrient and Micronutrient), Vitamin (line 8436)
  • MISTAKE line 9756: original knowledge source added as a "slot" even though it was deprecated?

@colleenXu
Copy link
Collaborator Author

colleenXu commented Aug 1, 2022

Questions for @tokebe:

  • how do we handle deprecated stuff (property under the predicate or node category/semantic-type, not removed from yaml yet). I notice that often the hierarchy (is_a) is removed, so that might lead to some "undefined" predicates??
  • does BTE strip any unintentional extra white-space from the x-bte annotations (think predicates and input/output IDs and types...)? sometimes I add an extra space on accident....and it's super tedious to go through all x-bte annotations to try to fix this if it's an issue... >.<
  • do we have code regarding edge-attributes and biolink:original_knowledge_source? I think we don't (update edge provenance info to comply with Translator Standard -- July 1 #208 (comment), CTRL-F "original" since the posts in this issue are lengthy)....but I think this is a quick check...

@tokebe
Copy link
Member

tokebe commented Aug 1, 2022

  • It would appear that BTE doesn't take into account the deprecated property in the biolink-model package, nor seemingly anywhere relating to the biolink model. I'm not sure I follow on how this might lead to undefined predicates, could you be more specific?
  • If you mean trailing whitespace specifically in the yamls...I'm reasonably sure that this kind of whitespace is trimmed when yaml is converted to JSON, which (if I understand the process) happens before BTE even retrieves the smartapi specs?
    • I understand that fixing every instance would require going back through every file, and isn't terribly necessary given it doesn't present an issue, however I do recommend an auto-formatter for general use to avoid this going forward.
  • I don't see any hardcoded mentions of biolink:original_knoweldge_source across the workspace.

@colleenXu
Copy link
Collaborator Author

"Deprecated": Actually....oops this shouldn't be a problem. I think it's fine that we don't take into account deprecated. It's more for curators/x-bte annotation writers and editors....to know that we shouldn't use those predicates and node-category/semantic-types anymore...

  • My worry was that if a predicate or node-category/semantic-type is deprecated AND its is_a property is removed, BTE will put the predicate/node-category as "biolink:undefined" and may have buggy behavior.
    • I think this has happened in the past, but with predicates/node-category stuff that wasn't in the biolink-model at all....so a slightly different situation
    • I suspect this may also happen when the is_a property is removed, since then BTE may not know how to place the term in the hierarchy of predicates/node-categories
  • But now that I check:
    • has_real_world_evidence_of_association_with (line 1733) still has an is_a property
    • approved_to_treat (line 4822) still has an is_a property
    • Vitamin (line 8436) still has an is_a property

@colleenXu
Copy link
Collaborator Author

2nd point on trailing white-space on yamls:

  • yep that was my worry. I don't see any trailing whitespaces ( " or " ) in BTE's data/predicates.json. I do see some in the smartapi_specs.json but not for x-bte annotation (for descriptions. although only a few are true issues for formatting. Others are legit).
  • I'm not aware of how to use an auto-formatter for yaml....I only just found one for Markdown (github I think)...

@colleenXu
Copy link
Collaborator Author

And 3rd point, you misspelled biolink:original_knowledge_source....I presume you did search correctly :P

@tokebe
Copy link
Member

tokebe commented Aug 1, 2022

I actually searched for knowledge_source more generally because I recalled some instances of that, but none of them had original_. But yes, I just typed my answer quickly :)

@colleenXu
Copy link
Collaborator Author

colleenXu commented Aug 17, 2022

Done. Updated predicates for MyChem, MyGene, BioThings SEMMEDDB: see commits NCATS-Tangerine/translator-api-registry@d02f89a and NCATS-Tangerine/translator-api-registry@40000c3

and refreshed their registrations afterwards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants