Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

phase 1: provenance refactor for x-bte operations #604

Closed
colleenXu opened this issue Mar 29, 2023 · 12 comments
Closed

phase 1: provenance refactor for x-bte operations #604

colleenXu opened this issue Mar 29, 2023 · 12 comments

Comments

@colleenXu
Copy link
Collaborator

colleenXu commented Mar 29, 2023

Overview

TRAPI 1.4 includes refactoring to move provenance into its own section sources (rather than being inside attributes). An edge's sources can also reference each other; example: explain that BTE got info from MyChem which got info from CHEBI (this x-bte operation).

This issue is specifically for handling edges from x-bte operations, but it is good reference for other issues that involve provenance.

TRAPI spec references

Details

At the moment, this seems straightforward: create a sources structure that represents:
"source" from x-bte operation -> KP API -> BTE

(implementation musing: modify the TRAPI output only?)

It would be more complicated if...

  • we were merging edges when they came from different underlying KP APIs....but we haven't thought about doing this...
  • we wanted to process source info from certain fields of the raw API response (with post-processing) and put that into the edge.sources. An idea for the future maybe...
  • we wanted to construct sources.source_record_urls from certain fields of the raw API response (maybe with post-processing). An idea for the future maybe...

underlying "KP" API isn't a primary knowledge source

Most APIs ingested through x-bte operations are not "primary knowledge sources", so they're not tagged primarySource: true in the API_LIST.

Walking through it
  • TRAPI edge has a sources property that's the same level as subject, 'predicate, object, attributes. The value is an array with 3 elements:
  • 1 element for the "source" property in the operation. It's a "primary knowledge source": { "resource_id": "infores:chebi", "resource_role": "primary_knowledge_source" }
  • 1 element for the API (use its infores). It's an "aggregator" and should reference the "source" property: { "resource_id": "infores:mychem-info", "resource_role": "aggregator_knowledge_source", "upstream_resource_ids: [ "infores:chebi" ] }`
  • 1 element for the BTE (use its infores). It's an "aggregator" and should reference the API infores: { "resource_id": "infores:biothings-explorer", "resource_role": "aggregator_knowledge_source", "upstream_resource_ids: [ "infores:mychem-info" ] }`

Putting it together:

"edge_1":
{
  "subject": "CHEBI:12345",
  "object": "REACT:456",
  "predicate": "biolink:participates_in",
  "attributes": [],
  "sources": [
    {
      "resource_id": "infores:chebi", 
      "resource_role": "primary_knowledge_source"
    },
    { 
      "resource_id": "infores:mychem-info", 
      "resource_role": "aggregator_knowledge_source", 
      "upstream_resource_ids`: [ "infores:chebi" ] 
    },
    { 
      "resource_id": "infores:biothings-explorer", 
      "resource_role": "aggregator_knowledge_source", 
      "upstream_resource_ids`: [ "infores:mychem-info" ] 
    }
  ]
}

underlying "KP" API IS a primary knowledge source

A few APIs ingested through x-bte operations are "primary knowledge sources", so they're tagged primarySource: true in the API_LIST AND they lack the "sources" property in their x-bte operations (since there is no deeper source). We'll use CTD API as an example.

Walking through it
  • sources array has 2 elements:
  • 1 element for the API (use its infores). It's a "primary": { "resource_id": "infores:ctd", "resource_role": "primary_knowledge_source" }
  • 1 element for the BTE (use its infores). It's an "aggregator" and should reference the API infores: { "resource_id": "infores:biothings-explorer", "resource_role": "aggregator_knowledge_source", "upstream_resource_ids: [ "infores:ctd" ] }`

Putting it together:

"edge_2":
{
  "subject": "OMIM:615075",
  "object": "NCBIGene:1499",
  "predicate": "biolink:related_to",
  "attributes": [],
  "sources": [
    {
      "resource_id": "infores:ctd", 
      "resource_role": "primary_knowledge_source"
    },
    { 
      "resource_id": "infores:biothings-explorer", 
      "resource_role": "aggregator_knowledge_source", 
      "upstream_resource_ids`: [ "infores:ctd" ] 
    }
  ]
}
@tokebe
Copy link
Member

tokebe commented Apr 14, 2023

@colleenXu My understanding is that this will require BTE code changes to utilize. Could I have a bare-minimum example yaml to test against?

@colleenXu
Copy link
Collaborator Author

I'm wondering, do we always want to add the resource ID as infores:biothings-explorer? That makes sense for the ARA-endpoints. but for the team-specific / api-specific endpoints, maybe it makes sense to add the resource ID as infores:service-provider-trapi...

@colleenXu
Copy link
Collaborator Author

@tokebe I'm not sure what you mean. I don't think the SmartAPI yaml or API_LIST config file need adjusting....so I guess you may need example queries to test that the desired TRAPI output is made?


In the opening post's section "underlying "KP" API isn't a primary knowledge source", I'm using this x-bte operation as an example. So for testing, you could query MyChem through BTE in a one-hop query from SmallMolecule "CHEBI:15724" -> MolecularActivity (as specified in the testExamples section of that operation that's commented out).

In the other scenario "underlying "KP" API IS a primary knowledge source", I'm using this x-bte operation as an example. So for testing, you could query CTD through BTE in a one-hop query from Disease "OMIM:615075" -> Gene (as specified in the testExamples section of that operation that's commented out).

@tokebe
Copy link
Member

tokebe commented Apr 17, 2023

I'm wondering, do we always want to add the resource ID as infores:biothings-explorer? That makes sense for the ARA-endpoints. but for the team-specific / api-specific endpoints, maybe it makes sense to add the resource ID as infores:service-provider-trapi...

I agree, that makes sense. I'll work on that at a lower priority vs. just getting it working.

RE: what I meant

I understand now, I think I got turned around somewhere reading the issue and thought there would be x-bte changes or something. Those examples will work, thanks!

@colleenXu
Copy link
Collaborator Author

@tokebe okay! yeah, I don't think the fundamental behavior needs any x-bte changes. (The "more complicated" bullet points I put in the "details" section would need x-bte changes, but that's my musings on potential future work...)

I agree that the infores:biothings-explorer vs infores:service-provider-trapi is a minor thing to do after it's working.

@gloriachin

This comment was marked as off-topic.

@colleenXu

This comment was marked as off-topic.

@gloriachin

This comment was marked as off-topic.

@colleenXu
Copy link
Collaborator Author

Note that the hidden comments above discuss "keeping TRAPI 1.3 provenance edge-attributes", which is for BioThings APIs where we ingest their data in TRAPI-format (some Multiomics APIs, Text-Mining Targeted). So it IS NOT RELEVANT TO THIS ISSUE. It is tangential to #617

@colleenXu
Copy link
Collaborator Author

Note that it looks like we haven't done the infores:service-provider-trapi handling yet

@tokebe
Copy link
Member

tokebe commented Aug 3, 2023

The above mentioned last part of implementation is now tracked in #678

@tokebe tokebe closed this as completed Aug 3, 2023
@colleenXu
Copy link
Collaborator Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants