Capturing linkouts to source urls #6

mbrush · 2022-10-13T17:35:52Z

The UI team has requested a clear and consistent way to capture a url that links out to the information resource that is the source of an edge.

Ideally, this would be a specific page/record within the resource where the knowledge expressed in the edge can be found, and further explored. e.g. https://www.ebi.ac.uk/chembl/compound_report_card/CHEMBL1098/.

But lacking this, a url for the general landing page for the information resource would suffice. e.g. https://www.ebi.ac.uk/chembl/.

Several proposals have been made for how to capture this information:

Proposal 1: Use the existing Attribute.value_url field in the Attribute holding a primary source to hold a high level landing page url for that source - and a nested Attribute to hold a source record url if available.

The rationale here is that this field is meant to capture a url describing the value of an Attribute object - which in the case of a primary source Attribute is the infores of the resource that originally provided the edge.
If this field was reserved for a high level url of the resource, a url for a specific record within this resource that actually contains the reported knowledge would need to live elsewhere.
Given that we can now nest Attributes, one reasonable place would be a nested attribute keyed on a new biolink edge property such as record_url

Example: Source URL representation for statement that "Bupivacaine physically interacts with LEF1" (a fictitious example)

  "edges": [
    {
      "id": "Association001",
      "subject": "CHEBI:3215",
      "predicate": "biolink:interacts_with",
      "object": "NCBIGene:51176",
      "attributes": [
        {
          "attribute_type_id": "biolink:aggregator_knowledge_source",
          "value": "infores:chembl",
          "value_type_id": "biolink:InformationResource",
          "value_url": "https://www.ebi.ac.uk/chembl",
          "description": "ChEMBL is a manually curated database of bioactive molecules...",
          "attribute_source": "infores:molecular_data_provider",
          "attributes": [
            {
              "attribute_type_id": "biolink:has_subpage",
              "value": "https://www.ebi.ac.uk/chembl/compound_report_card/CHEMBL1098/",
              "attribute_source": "infores:molecular_data_provider"
            }
          ]
        }
      ]
    }
  ]

Proposal 2: Use the existing Attribute.value_url field in the Attribute holding a knowledge source to hold the most specific url available

Here, if only a high level landing page url for the source is provided, it goes in the value_url field. If a more specific record url is provided, it would go here (and the general landing page would not be explicitly provided).
This approach is simpler (requires no nesting), and provides the end user with the information they would want.

Example: Example: Source URL representation for statement that "Bupivacaine physically interacts with LEF1"

{
  "edges": [
    {
      "id": "Association001",
      "subject": "CHEBI:3215",
      "predicate": "biolink:interacts_with",
      "object": "NCBIGene:51176",
      "attributes": [
        {
          "attribute_type_id": "biolink:primary_knowledge_source",
          "value": "infores:clinical-trials-gov",
          "value_type_id": "biolink:InformationResource",
          "value_url": "https://www.clinicaltrials.gov",      # for this source, only the general landing page url is provided
          "description": "ClinicalTrials.gov is...",
          "attribute_source": "infores:chembl"
        },
        {
          "attribute_type_id": "biolink:aggregator_knowledge_source",
          "value": "infores:chembl",
          "value_type_id": "biolink:InformationResource",
          "value_url": "https://www.ebi.ac.uk/chembl/compound_report_card/CHEMBL1098/",   # for this source, we get a record url
          "description": "ChEMBL is a manually curated database of bioactive molecules...",
          "attribute_source": "infores:molecular_data_provider"
        }
      ]
    }
  ]
}

Proposal 3: Use the existing Attribute.value_url field in the Attribute holding a knowledge source to hold a high level landing page url for that source - and an entirely separate top level Attribute to hold available source record urls.

Here we would follow Proposal 1 in reserving the Attribute.value_url filed in an object holding a source for the homepage url of the source.
But instead of nesting a specific record url, we would place this in a separate top level Attribute object keyed on an edge property like source_record_urls

Example: Source URL representation for statement that "Bupivacaine physically interacts with LEF1"


{
  "edges": [
    {
      "id": "Association001",
      "subject": "CHEBI:3215",
      "predicate": "biolink:interacts_with",
      "object": "NCBIGene:51176",
      "attributes": [
        {
          "attribute_type_id": "biolink:aggregator_knowledge_source",
          "value": "infores:chembl",
          "value_type_id": "biolink:InformationResource",
          "value_url": "https://www.ebi.ac.uk/chembl",
          "description": "ChEMBL is a manually curated database of bioactive molecules...",
          "attribute_source": "infores:molecular_data_provider"
        },
        {
          "attribute_type_id": "biolink:source_record_urls",      
          "value": "https://www.ebi.ac.uk/chembl/compound_report_card/CHEMBL1098/", . . .  #  could contain multiple record urls, from different sources, if these are available 
          "attribute_source": "infores:molecular_data_provider"
        }
      ]
    }
  ]
}

The text was updated successfully, but these errors were encountered:

edeutsch · 2022-10-25T05:13:22Z

I favor either proposal 1 or 3. not 2. After pondering a bit more, I'm thinking that 1 is a bit better than 3, since it is more elegant in the case where there are mulitple sources in the chain and they each have nice landing pages. Proposal 1 captures that elegantly without the confusing shorthand of proposal 2.

What is the distinction between "biolink:record_url" as used in proposal 1 versus "biolink:source_record_urls" as used in proposal 3?

mbrush · 2022-10-25T16:57:25Z

Thanks @edeutsch re:

What is the distinction between "biolink:record_url" as used in proposal 1 versus "biolink:source_record_urls" as used in proposal 3?

Those were just the names I thought would be best for the edge property if we choose proposal 1 vs 3, where the semantics for the value urls are slightly different given the context in which they are found. But I have no strong preferences for what we name the property we choose to define, as long as it makes sense int he context in which it will be found in a TRAPI message.

mbrush · 2023-01-26T22:00:50Z

Given that we have settled on how retrieval provenance metadata will be refactored into dedicated 'RetreivalSource' objects (see NCATSTranslator/ReasonerAPI#386) - we also need to refactor the proposals above to illustrate how they would work in this context.

In NCATSTranslator/ReasonerAPI#392, a proposal is made to add additional fields to the initial RetrievalSource object that would support capture of source record urls in a way analogous to the preferred proposal in this ticket (Proposal 1).
If this seems acceptable, we can move ahead and close this issue.

If there are concerns, we can draft alternate proposals that mirror the other approaches put forth in this ticket.

mbrush mentioned this issue Jan 26, 2023

Add properties to the RetrievalSource object NCATSTranslator/ReasonerAPI#392

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Capturing linkouts to source urls #6

Capturing linkouts to source urls #6

mbrush commented Oct 13, 2022 •

edited

Loading

edeutsch commented Oct 25, 2022 •

edited

Loading

mbrush commented Oct 25, 2022

mbrush commented Jan 26, 2023

Capturing linkouts to source urls #6

Capturing linkouts to source urls #6

Comments

mbrush commented Oct 13, 2022 • edited Loading

edeutsch commented Oct 25, 2022 • edited Loading

mbrush commented Oct 25, 2022

mbrush commented Jan 26, 2023

mbrush commented Oct 13, 2022 •

edited

Loading

edeutsch commented Oct 25, 2022 •

edited

Loading