Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Capturing linkouts to source urls #6

Open
mbrush opened this issue Oct 13, 2022 · 3 comments
Open

Capturing linkouts to source urls #6

mbrush opened this issue Oct 13, 2022 · 3 comments

Comments

@mbrush
Copy link
Collaborator

mbrush commented Oct 13, 2022

The UI team has requested a clear and consistent way to capture a url that links out to the information resource that is the source of an edge.

Ideally, this would be a specific page/record within the resource where the knowledge expressed in the edge can be found, and further explored. e.g. https://www.ebi.ac.uk/chembl/compound_report_card/CHEMBL1098/.

But lacking this, a url for the general landing page for the information resource would suffice. e.g. https://www.ebi.ac.uk/chembl/.

Several proposals have been made for how to capture this information:


Proposal 1: Use the existing Attribute.value_url field in the Attribute holding a primary source to hold a high level landing page url for that source - and a nested Attribute to hold a source record url if available.

  • The rationale here is that this field is meant to capture a url describing the value of an Attribute object - which in the case of a primary source Attribute is the infores of the resource that originally provided the edge.
  • If this field was reserved for a high level url of the resource, a url for a specific record within this resource that actually contains the reported knowledge would need to live elsewhere.
  • Given that we can now nest Attributes, one reasonable place would be a nested attribute keyed on a new biolink edge property such as record_url

Example: Source URL representation for statement that "Bupivacaine physically interacts with LEF1" (a fictitious example)

  "edges": [
    {
      "id": "Association001",
      "subject": "CHEBI:3215",
      "predicate": "biolink:interacts_with",
      "object": "NCBIGene:51176",
      "attributes": [
        {
          "attribute_type_id": "biolink:aggregator_knowledge_source",
          "value": "infores:chembl",
          "value_type_id": "biolink:InformationResource",
          "value_url": "https://www.ebi.ac.uk/chembl",
          "description": "ChEMBL is a manually curated database of bioactive molecules...",
          "attribute_source": "infores:molecular_data_provider",
          "attributes": [
            {
              "attribute_type_id": "biolink:has_subpage",
              "value": "https://www.ebi.ac.uk/chembl/compound_report_card/CHEMBL1098/",
              "attribute_source": "infores:molecular_data_provider"
            }
          ]
        }
      ]
    }
  ]

Proposal 2: Use the existing Attribute.value_url field in the Attribute holding a knowledge source to hold the most specific url available

  • Here, if only a high level landing page url for the source is provided, it goes in the value_url field. If a more specific record url is provided, it would go here (and the general landing page would not be explicitly provided).
  • This approach is simpler (requires no nesting), and provides the end user with the information they would want.

Example: Example: Source URL representation for statement that "Bupivacaine physically interacts with LEF1"

{
  "edges": [
    {
      "id": "Association001",
      "subject": "CHEBI:3215",
      "predicate": "biolink:interacts_with",
      "object": "NCBIGene:51176",
      "attributes": [
        {
          "attribute_type_id": "biolink:primary_knowledge_source",
          "value": "infores:clinical-trials-gov",
          "value_type_id": "biolink:InformationResource",
          "value_url": "https://www.clinicaltrials.gov",      # for this source, only the general landing page url is provided
          "description": "ClinicalTrials.gov is...",
          "attribute_source": "infores:chembl"
        },
        {
          "attribute_type_id": "biolink:aggregator_knowledge_source",
          "value": "infores:chembl",
          "value_type_id": "biolink:InformationResource",
          "value_url": "https://www.ebi.ac.uk/chembl/compound_report_card/CHEMBL1098/",   # for this source, we get a record url
          "description": "ChEMBL is a manually curated database of bioactive molecules...",
          "attribute_source": "infores:molecular_data_provider"
        }
      ]
    }
  ]
}

Proposal 3: Use the existing Attribute.value_url field in the Attribute holding a knowledge source to hold a high level landing page url for that source - and an entirely separate top level Attribute to hold available source record urls.

  • Here we would follow Proposal 1 in reserving the Attribute.value_url filed in an object holding a source for the homepage url of the source.
  • But instead of nesting a specific record url, we would place this in a separate top level Attribute object keyed on an edge property like source_record_urls

Example: Source URL representation for statement that "Bupivacaine physically interacts with LEF1"


{
  "edges": [
    {
      "id": "Association001",
      "subject": "CHEBI:3215",
      "predicate": "biolink:interacts_with",
      "object": "NCBIGene:51176",
      "attributes": [
        {
          "attribute_type_id": "biolink:aggregator_knowledge_source",
          "value": "infores:chembl",
          "value_type_id": "biolink:InformationResource",
          "value_url": "https://www.ebi.ac.uk/chembl",
          "description": "ChEMBL is a manually curated database of bioactive molecules...",
          "attribute_source": "infores:molecular_data_provider"
        },
        {
          "attribute_type_id": "biolink:source_record_urls",      
          "value": "https://www.ebi.ac.uk/chembl/compound_report_card/CHEMBL1098/", . . .  #  could contain multiple record urls, from different sources, if these are available 
          "attribute_source": "infores:molecular_data_provider"
        }
      ]
    }
  ]
}
@edeutsch
Copy link

edeutsch commented Oct 25, 2022

I favor either proposal 1 or 3. not 2. After pondering a bit more, I'm thinking that 1 is a bit better than 3, since it is more elegant in the case where there are mulitple sources in the chain and they each have nice landing pages. Proposal 1 captures that elegantly without the confusing shorthand of proposal 2.

What is the distinction between "biolink:record_url" as used in proposal 1 versus "biolink:source_record_urls" as used in proposal 3?

@mbrush
Copy link
Collaborator Author

mbrush commented Oct 25, 2022

Thanks @edeutsch re:

What is the distinction between "biolink:record_url" as used in proposal 1 versus "biolink:source_record_urls" as used in proposal 3?

Those were just the names I thought would be best for the edge property if we choose proposal 1 vs 3, where the semantics for the value urls are slightly different given the context in which they are found. But I have no strong preferences for what we name the property we choose to define, as long as it makes sense int he context in which it will be found in a TRAPI message.

@mbrush
Copy link
Collaborator Author

mbrush commented Jan 26, 2023

Given that we have settled on how retrieval provenance metadata will be refactored into dedicated 'RetreivalSource' objects (see NCATSTranslator/ReasonerAPI#386) - we also need to refactor the proposals above to illustrate how they would work in this context.

In NCATSTranslator/ReasonerAPI#392, a proposal is made to add additional fields to the initial RetrievalSource object that would support capture of source record urls in a way analogous to the preferred proposal in this ticket (Proposal 1).
If this seems acceptable, we can move ahead and close this issue.

If there are concerns, we can draft alternate proposals that mirror the other approaches put forth in this ticket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants