Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

phase 1: provenance refactor for edges from some multiomics KPs, text-mining KP, TRAPI KPs #617

Closed
colleenXu opened this issue Apr 12, 2023 · 10 comments

Comments

@colleenXu
Copy link
Collaborator

colleenXu commented Apr 12, 2023

Background

Overview

Text-Mining KP and some Multiomics KPs

We will expect Text-Mining KP and some Multiomics KPs (ClinicalTrials, BIGGIM-drug-response) BioThings KP APIs to...

  • keep provenance in their edge-attributes data (to keep TRAPI 1.3 compliance; this can be removed once all instances have moved to TRAPI 1.4)
  • add TRAPI-1.4 compliant edge.sources data in a separate field association.sources of each record (like association.edge-attributes and edge-attribute data...)

Then we will...

  • make branches for those KP's SmartAPI yamls -> there, add a x-bte-response-mapping to the TRAPI-1.4-provenance-data field. Something like: trapi_sources: association.sources
  • Make temporary SmartAPI overrides, have the dev instance use these overrides
    • during the deployment process to test/prod, we'll merge the changes to the SmartAPI yamls / remove the temporary overrides (like what we did for the biolink3 migration)
  • BTE uses the x-bte-response-mapping to ingest the TRAPI-1.4 provenance data. It should be an array, and then it can add an element to that array that references their KP API infores...

example:

"edge_TMKP_1":
{
  "subject": "CHEBI:12345",
  "object": "MONDO:456",
  "predicate": "biolink:treats",
  "attributes": [ ... ],
  "sources": [
    { 
      "resource_id": "infores:biothings-explorer", 
      "resource_role": "aggregator_knowledge_source", 
      "upstream_resource_ids`: [ "infores:text-mining-targeted" ] 
    },
    {....},  // other elements are the trapi-1.4 sources data text-mining-targeted provided
    {....}
  ]
}

TRAPI KPs

We will expect TRAPI KP edges to have a sources property on their edges already. We'll then add an element for BTE that references their KP API infores...

example:

"edge_automat_hetio":
{
  "subject": "thing1",
  "object": "thing2",
  "predicate": "biolink:affects",
  "attributes": [ ... ],
  "sources": [
    { 
      "resource_id": "infores:biothings-explorer", 
      "resource_role": "aggregator_knowledge_source", 
      "upstream_resource_ids`: [ "infores:automat-hetio" ] 
    },
    {....},  // other elements are the trapi-1.4 sources data that automat-hetio provided
    {....}
  ]
}
@colleenXu
Copy link
Collaborator Author

Note that COHD's dev instance seems to be on TRAPI 1.4 (we can access it through the registration we currently use, but they also registered a separate yaml for TRAPI 1.4)

However, I haven't checked their /query responses to see if they are providing provenance as we expect, and whether we can use it to develop and test our code for this issue...

From my post here: #597 (comment)

@colleenXu
Copy link
Collaborator Author

colleenXu commented Apr 15, 2023

For the multiomics / text-mining KP stuff:

I'm wondering, do we always want to add the resource ID as infores:biothings-explorer? That makes sense for the ARA-endpoints. but for the team-specific / api-specific endpoints, maybe it makes sense to add the resource ID as infores:service-provider-trapi...

@colleenXu
Copy link
Collaborator Author

Note that the post above is related to this, and it looks like we haven't done the infores:service-provider-trapi handling yet

@colleenXu
Copy link
Collaborator Author

Pasting my Translator Slack message to Multiomics / Text-Mining KPs below

BTE's dev instance now has support for TRAPI 1.4 provenance (aka the sources section on Edges).

If your BioThings API includes TRAPI 1.4 sources data, the following is needed to hook this up with BTE:

  • make a branch or fork of the SmartAPI yaml registered for your API
  • edit each operation:
    • In the parameters.fields : the JSON-notation-paths listed here (string that's comma-delimited) should cover the sources data field. This part of the query to BioThings APIs specifies what parts of the record to return in the response
    • Ex: if the sources data was in association.sources, this would work
        parameters:
          fields: object.MONDO,association.edge_attributes,association.sources
  • edit each entry in the x-bte-response-mapping section: add a key-value pair. The key is trapi_sources and the value is the JSON-notation-path to the sources data. BTE will recognize this key and handle the data in the field specified appropriately
    • Ex:
  x-bte-response-mapping:
    mondo-object:
      MONDO: object.MONDO
      edge-attributes: association.edge_attributes
      trapi_sources: association.sources
  • let me know. I'll be writing a SmartAPI-overrides file so BTE will use these TRAPI 1.4-specific files to retrieve the sources data and handle it appropriately. Once this SmartAPI-overrides file is deployed on BTE's dev instance, the changes will go live <=10 min later

@colleenXu
Copy link
Collaborator Author

colleenXu commented May 10, 2023

Status of the multiomics / text-mining APIs

[Update in progress 2023-06-01 evening]

We'll be using temporary SmartAPI overrides (currently on main) to direct BTE to query for and ingest the TRAPI 1.4 sources data from Multiomics/Text-Mining KPs.

The override now contains links for all 4 KPs.

However, these KPs' x-bte annotation are at different states, for their registered yamls (staying at TRAPI 1.3 and used by BTE prod) and the override yamls (with the changes for TRAPI 1.4 sources data and used by all other BTE instances).

both yamls working

  • text-mining targeted:
  • Multiomics EHR Risk:
    • as far as I know, the x-bte annotation in both yamls is working as-intended
    • next steps would be discussing the auto-generated x-bte annotation (from right before this commit. The diff may be useful here)
      • if need be, adding / removing operations in the working yamls, using it as reference
      • how to fix it
      • how to test it thoroughly
    • note: the override yaml is in a branch, no PR yet for when we deploy TRAPI 1.4 on all instances

1 yaml working

no yamls working

colleenXu referenced this issue May 11, 2023
text-mining targeted and multiomics clinicaltrials. to support trapi 1.4 sources data ingest
@colleenXu
Copy link
Collaborator Author

colleenXu commented May 11, 2023

Recording info on hooking BTE up to TRAPI 1.4 KPs in #614 (comment)

@colleenXu
Copy link
Collaborator Author

colleenXu commented May 24, 2023

Multiomics ClinicalTrials KP: NEEDED SmartAPI yaml edits

Edits for registered yaml (TRAPI 1.3 instances)

Main issue: The deployed change flipped the subject/object in records (now Treatments are subjects and Diseases are objects). This was a BREAKING change and now BTE is not properly querying this API

Addressing the main issue: Change this file, then ask me to update the registration. Once the registration is updated, it'll be <=10 min before BTE picks up this update.

  • line 58: 00065273_C0025362_C0009079 -> replace with 00065273_C0009079_C0025362
  • line 122: 00065273_C0025362_C0009079 -> replace with 00065273_C0009079_C0025362
  • line 123: 00065273_C0025362_C0171023 -> replace with 00065273_C0171023_C0025362
  • line 178, 279, 592, 622, 636: subject.UMLS -> replace with object.UMLS
  • line 598, 616, 633: object.UMLS -> replace with subject.UMLS

Optional things to do:

  • add a comment explaining the versioning of the BioThings API (what is this the date of?) (add around line 16, comments start with #)
  • add an API-level tag "multiomics" (add around line 32)
Edits for forked yaml (TRAPI 1.4 instances)

Main issue: same as above

Addressing the main issue:

  • Change this file in fork. Once the changes are pushed, BTE will automatically take the parsed file in <=10 min.
  • same list of edits as above, except different line numbers (bold) are involved for the edits below:
    • line 178, 279, 592, 623, 639: subject.UMLS -> replace with object.UMLS
    • line 598, 617, 635: object.UMLS -> replace with subject.UMLS

Optional things to do:

  • same as above

Other notes:

  • TRAPI-1.4 sources data: some entries could include the upstream_resource_ids field from the TRAPI spec
  • BTE will provide edges as both Treatment->Disease and Disease->Treatment
    • it DOES NOT force KG Edges to be a specific direction. Instead, the Edges match the direction of the corresponding QEdge.
    • the predicate used right now doesn't have a directionality to it, according to data-modeling team...
    • see my post here for more details

@colleenXu
Copy link
Collaborator Author

colleenXu commented May 25, 2023

Multiomics EHR Risk KP: NEEDED SmartAPI yaml edits

Edits for registered yaml (TRAPI 1.3 instances)

Main issue

The deployed changes change a node-category, how edge-attributes are formatted, and other formatting. These are BREAKING changes and now BTE is not properly querying this API or properly parsing responses.

Addressing the main issue

Change this file, then ask me to update the registration AND remove the primarySource tag here. Once the registration is updated, it'll be <=10 min before BTE picks up this update.

Note: There are 148 operations and 12 response-mapping entries.

  1. find this section of text (note the indent and commas! 148 instances)
old text
            association.provenance,
            association.auc_roc,association.p_values,
            association.feature_coefficient,association.odd_ratio,
            association.classifier,association.original_predicate,association.provided_date,

and replace it with this text (note the indent and comma!!!)

            association.edge_attributes,
  1. find the following and replace:
    a. "Disease" (note the quotation marks! 106 instances), replace with "biolink:Disease"
    b. "PhenotypicFeature" (114 instances), replace with "biolink:PhenotypicFeature"
    c. "Procedure" (28 instances), replace with "biolink:Procedure"
    d. ChemicalSubstance (80 instances), replace with ChemicalEntity. Then look for "ChemicalEntity" (note the quotation marks! 48 instances), replace with "biolink:ChemicalEntity"

  2. find the following and replace:
    a. {{ queryInputs | wrap (73 instances), replace with {{ queryInputs | rmPrefix() | wrap
    b. | replPrefix('NCIT') (38 instances), replace with | rmPrefix()
    c. | addPrefix("SNOMEDCT") (27 instances), replace with | rmPrefix()
    d. | addPrefix("UNII") (10 instances), replace with | rmPrefix()

  3. and last, find this section of text (there are 12 instances in the response-mapping section)

old text
      model_url: association.provenance
      classifier_used: association.classifier
      "biolink:original_predicate": association.original_predicate
      date_provided: association.provided_date
      auc_roc: association.auc_roc
      p-value: association.p_values
      feature_coefficient: association.feature_coefficient
      odds_ratio: association.odd_ratio

and replace it with this text

      edge-attributes: association.edge_attributes

Later important tasks

  • get a list of meta-triples for this KP: unique combos of subject-prefix / subject-category / predicate / object-prefix / object-category (and qualifier-set, if applicable)
  • then edit the x-bte annotations to add / remove operations.
    • this may involve editing all 3 x-bte sections (the list under /query, the written-out operations in /components, and the response-mapping in /components)
    • These are the missing operations I found (but I only checked a fraction of the yaml). If you search these operation names, you'll find the notes and link-outs in the yaml on them (but the link-outs will need a find-replace fix, see the optional section below)
known operations to add
  • DiseaseNCIT_increased_DiseaseSNOMEDCT
  • DiseaseSNOMEDCT_increased_DiseaseMONDO
  • DiseaseSNOMEDCT_increased_DiseaseNCIT
  • DiseaseSNOMEDCT_increased_DiseaseSNOMEDCT
  • DiseaseNCIT_increased_PhenoHP
  • DiseaseNCIT_increased_PhenoNCIT
  • DiseaseNCIT_increased_PhenoSNOMEDCT
  • DiseaseSNOMEDCT_increased_PhenoNCIT
  • DiseaseSNOMEDCT_increased_PhenoSNOMEDCT
  • DiseaseSNOMEDCT_increased_ProcedureNCIT

Optional things to do

  • add a comment explaining the versioning of the BioThings API (what is this the date of?) (add around line 12, comments start with #)
  • add an API-level tag "multiomics" (add around line 28)
  • update the commented link-outs to the BioThings API
    • .type:Disease -> .type:"biolink:Disease" (97 instances)
    • .type:PhenotypicFeature -> .type:"biolink:PhenotypicFeature" (97 instances)
    • .type:Procedure -> .type:"biolink:Procedure" (33 instances)
    • .type:ChemicalEntity -> .type:"biolink:ChemicalEntity" (29 instances)
  • update comments on number of records retrievable for each set of operations, the testExamples
What's needed for TRAPI 1.4 support

Once needed fixes above are done for the registered yaml....follow the instructions here (Translator Slack link)

GitHubbit added a commit to GitHubbit/clinical_risk_kp that referenced this issue May 31, 2023
…s_explorer#617 instructions from Colleen on how to amend yaml for TRAPI 1.4 migration
GitHubbit added a commit to GitHubbit/clinical_risk_kp that referenced this issue May 31, 2023
GitHubbit added a commit to GitHubbit/clinical_risk_kp that referenced this issue May 31, 2023
biothings/biothings_explorer#617 (comment)
Made meta-triples in EHR_Risk_parser.ipynb and outputted relevant x-bte
sections that go in the yaml to ehr_risk_yaml_xbte_portions.txt. Copied
them from the text file to the yaml
This effort auto-generates the x-bte operations

your changes. Lines starting
GitHubbit added a commit to GitHubbit/translator-api-registry that referenced this issue Jun 2, 2023
@colleenXu
Copy link
Collaborator Author

colleenXu commented Jun 28, 2023

[as of 2023-07-13 evening]

We're preparing to move all BTE instances to TRAPI 1.4, which will involve removing the SmartAPI overrides and updating the registered yamls.

ready for migration

@tokebe
Copy link
Member

tokebe commented Aug 3, 2023

Closing as complete. Any future problems with provenance (further/alternate support for KPs, bugs, etc.) can be tracked in future issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants