phase 1: provenance refactor for x-bte operations #604

colleenXu · 2023-03-29T07:11:26Z

Overview

TRAPI 1.4 includes refactoring to move provenance into its own section sources (rather than being inside attributes). An edge's sources can also reference each other; example: explain that BTE got info from MyChem which got info from CHEBI (this x-bte operation).

This issue is specifically for handling edges from x-bte operations, but it is good reference for other issues that involve provenance.

TRAPI spec references

migration guide (currently in a PR)
bullets 8 and 12 of the changelog:

Enhance encoding of EPC retrieval sources by adding Edge.sources as list of RetrievalSource items (required, minItems: 1) https://github.com/NCATSTranslator/ReasonerAPI/pull/393/files

Change RetrievalSource.resource and upstream_resources to RetrievalSource.resource_id and upstream_resource_ids for consistency: https://github.com/NCATSTranslator/ReasonerAPI/pull/418/files

Details

At the moment, this seems straightforward: create a sources structure that represents:
"source" from x-bte operation -> KP API -> BTE

(implementation musing: modify the TRAPI output only?)

It would be more complicated if...

we were merging edges when they came from different underlying KP APIs....but we haven't thought about doing this...
we wanted to process source info from certain fields of the raw API response (with post-processing) and put that into the edge.sources. An idea for the future maybe...
we wanted to construct sources.source_record_urls from certain fields of the raw API response (maybe with post-processing). An idea for the future maybe...

underlying "KP" API isn't a primary knowledge source

Most APIs ingested through x-bte operations are not "primary knowledge sources", so they're not tagged primarySource: true in the API_LIST.

Walking through it

TRAPI edge has a sources property that's the same level as subject, 'predicate, object, attributes. The value is an array with 3 elements:
1 element for the "source" property in the operation. It's a "primary knowledge source": { "resource_id": "infores:chebi", "resource_role": "primary_knowledge_source" }
1 element for the API (use its infores). It's an "aggregator" and should reference the "source" property: { "resource_id": "infores:mychem-info", "resource_role": "aggregator_knowledge_source", "upstream_resource_ids: [ "infores:chebi" ] }`
1 element for the BTE (use its infores). It's an "aggregator" and should reference the API infores: { "resource_id": "infores:biothings-explorer", "resource_role": "aggregator_knowledge_source", "upstream_resource_ids: [ "infores:mychem-info" ] }`

Putting it together:

"edge_1":
{
  "subject": "CHEBI:12345",
  "object": "REACT:456",
  "predicate": "biolink:participates_in",
  "attributes": [],
  "sources": [
    {
      "resource_id": "infores:chebi", 
      "resource_role": "primary_knowledge_source"
    },
    { 
      "resource_id": "infores:mychem-info", 
      "resource_role": "aggregator_knowledge_source", 
      "upstream_resource_ids`: [ "infores:chebi" ] 
    },
    { 
      "resource_id": "infores:biothings-explorer", 
      "resource_role": "aggregator_knowledge_source", 
      "upstream_resource_ids`: [ "infores:mychem-info" ] 
    }
  ]
}

underlying "KP" API IS a primary knowledge source

A few APIs ingested through x-bte operations are "primary knowledge sources", so they're tagged primarySource: true in the API_LIST AND they lack the "sources" property in their x-bte operations (since there is no deeper source). We'll use CTD API as an example.

Walking through it

sources array has 2 elements:
1 element for the API (use its infores). It's a "primary": { "resource_id": "infores:ctd", "resource_role": "primary_knowledge_source" }
1 element for the BTE (use its infores). It's an "aggregator" and should reference the API infores: { "resource_id": "infores:biothings-explorer", "resource_role": "aggregator_knowledge_source", "upstream_resource_ids: [ "infores:ctd" ] }`

Putting it together:

"edge_2":
{
  "subject": "OMIM:615075",
  "object": "NCBIGene:1499",
  "predicate": "biolink:related_to",
  "attributes": [],
  "sources": [
    {
      "resource_id": "infores:ctd", 
      "resource_role": "primary_knowledge_source"
    },
    { 
      "resource_id": "infores:biothings-explorer", 
      "resource_role": "aggregator_knowledge_source", 
      "upstream_resource_ids`: [ "infores:ctd" ] 
    }
  ]
}

The text was updated successfully, but these errors were encountered:

tokebe · 2023-04-14T15:24:33Z

@colleenXu My understanding is that this will require BTE code changes to utilize. Could I have a bare-minimum example yaml to test against?

colleenXu · 2023-04-15T01:58:40Z

I'm wondering, do we always want to add the resource ID as infores:biothings-explorer? That makes sense for the ARA-endpoints. but for the team-specific / api-specific endpoints, maybe it makes sense to add the resource ID as infores:service-provider-trapi...

colleenXu · 2023-04-15T06:43:58Z

@tokebe I'm not sure what you mean. I don't think the SmartAPI yaml or API_LIST config file need adjusting....so I guess you may need example queries to test that the desired TRAPI output is made?

In the opening post's section "underlying "KP" API isn't a primary knowledge source", I'm using this x-bte operation as an example. So for testing, you could query MyChem through BTE in a one-hop query from SmallMolecule "CHEBI:15724" -> MolecularActivity (as specified in the testExamples section of that operation that's commented out).

In the other scenario "underlying "KP" API IS a primary knowledge source", I'm using this x-bte operation as an example. So for testing, you could query CTD through BTE in a one-hop query from Disease "OMIM:615075" -> Gene (as specified in the testExamples section of that operation that's commented out).

tokebe · 2023-04-17T15:14:57Z

I'm wondering, do we always want to add the resource ID as infores:biothings-explorer? That makes sense for the ARA-endpoints. but for the team-specific / api-specific endpoints, maybe it makes sense to add the resource ID as infores:service-provider-trapi...

I agree, that makes sense. I'll work on that at a lower priority vs. just getting it working.

RE: what I meant

I understand now, I think I got turned around somewhere reading the issue and thought there would be x-bte changes or something. Those examples will work, thanks!

colleenXu · 2023-04-18T02:01:15Z

@tokebe okay! yeah, I don't think the fundamental behavior needs any x-bte changes. (The "more complicated" bullet points I put in the "details" section would need x-bte changes, but that's my musings on potential future work...)

I agree that the infores:biothings-explorer vs infores:service-provider-trapi is a minor thing to do after it's working.

colleenXu · 2023-04-20T23:25:11Z

Note that the hidden comments above discuss "keeping TRAPI 1.3 provenance edge-attributes", which is for BioThings APIs where we ingest their data in TRAPI-format (some Multiomics APIs, Text-Mining Targeted). So it IS NOT RELEVANT TO THIS ISSUE. It is tangential to #617

colleenXu · 2023-05-03T23:34:56Z

Note that it looks like we haven't done the infores:service-provider-trapi handling yet

tokebe · 2023-08-03T17:18:15Z

The above mentioned last part of implementation is now tracked in #678

colleenXu · 2024-02-08T20:52:44Z

Documentation on this provenance/TRAPI edge sources format: https://github.com/NCATSTranslator/ReasonerAPI/blob/master/ImplementationGuidance/Specifications/retrieval_provenance_specification.md

colleenXu added the trapi 1.4 label Mar 29, 2023

colleenXu changed the title ~~changes to provenance format~~ phase 1: provenance refactor for x-bte operations Apr 12, 2023

colleenXu mentioned this issue Apr 12, 2023

phase 1: provenance refactor for edges from some multiomics KPs, text-mining KP, TRAPI KPs #617

Closed

This comment was marked as off-topic.

Sign in to view

This was referenced Apr 27, 2023

TRAPI 1.4: Provenance refactor biothings/bte_trapi_query_graph_handler#149

Merged

TRAPI 1.4: Provenance refactor biothings/smartapi-kg.js#79

Merged

TRAPI 1.4: Provenance refactor biothings/api-respone-transform.js#49

Merged

This was referenced Jul 26, 2023

for team-specific/api-specific endpoints, add the source as infores:service-provider-trapi #678

Closed

BTE potentially not providing primary_knowledge_source on all edges #627

Closed

tokebe closed this as completed Aug 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

phase 1: provenance refactor for x-bte operations #604

phase 1: provenance refactor for x-bte operations #604

colleenXu commented Mar 29, 2023 •

edited

Loading

tokebe commented Apr 14, 2023

colleenXu commented Apr 15, 2023

colleenXu commented Apr 15, 2023

tokebe commented Apr 17, 2023

colleenXu commented Apr 18, 2023

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

colleenXu commented Apr 20, 2023

colleenXu commented May 3, 2023

tokebe commented Aug 3, 2023

colleenXu commented Feb 8, 2024

phase 1: provenance refactor for x-bte operations #604

phase 1: provenance refactor for x-bte operations #604

Comments

colleenXu commented Mar 29, 2023 • edited Loading

Overview

Details

underlying "KP" API isn't a primary knowledge source

underlying "KP" API IS a primary knowledge source

tokebe commented Apr 14, 2023

colleenXu commented Apr 15, 2023

colleenXu commented Apr 15, 2023

tokebe commented Apr 17, 2023

RE: what I meant

colleenXu commented Apr 18, 2023

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

colleenXu commented Apr 20, 2023

colleenXu commented May 3, 2023

tokebe commented Aug 3, 2023

colleenXu commented Feb 8, 2024

colleenXu commented Mar 29, 2023 •

edited

Loading