Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

investigate TRAPI validation errors showing in ARAX UI #587

Closed
andrewsu opened this issue Mar 14, 2023 · 25 comments
Closed

investigate TRAPI validation errors showing in ARAX UI #587

andrewsu opened this issue Mar 14, 2023 · 25 comments
Assignees
Labels
external Requires fixes to an external service trapi 1.4

Comments

@andrewsu
Copy link
Member

andrewsu commented Mar 14, 2023

An updated reasoner-validator has been deployed to the beta version of the ARAX UI. BTE (and most teams) are showing errors. Example links are below, more info can be found by mousing over and clicking on the TRAPI 1.3 column:

Example 1: https://arax.ncats.io/beta/?r=30fa40cd-e74d-4532-ada1-9914c87db20a

TRAPI validator error

BTE response: https://arax.ncats.io/api/arax/v1.3/response/cba4c3db-1318-4937-b755-1ba8655fe773

{
  "errors": {
    "error.knowledge_graph.edge.attribute.type_id.not_curie": {
      "PUBCHEM.COMPOUND:84029--biolink:treats->MONDO:0003085": [
        {
          "attribute_type_id": "original_subject_name"
        },
        {
          "attribute_type_id": "original_object_name"
        }
      ]
    },
    "error.knowledge_graph.edge.attribute.type_id.unknown": {
      "biolink:has_evidence_count": null,
      "biolink:original_knowledge_source": null,
      "biolink:supporting_document": null,
      "biolink:supporting_study_result": null,
      "biolink:tmkp_confidence_score": null
    },
    "error.knowledge_graph.edge.provenance.missing_primary": {
      "PUBCHEM.COMPOUND:12035--biolink:affects->NCBIGene:6935": null,
      "PUBCHEM.COMPOUND:3033--biolink:treats->MONDO:0018102": null,
      "PUBCHEM.COMPOUND:34755--biolink:treats->MONDO:0003085": null,
      "PUBCHEM.COMPOUND:445154--biolink:affects->NCBIGene:6925": null,
      "PUBCHEM.COMPOUND:5202--biolink:affects->NCBIGene:6925": null,
      "PUBCHEM.COMPOUND:5329102--biolink:affects->NCBIGene:6935": null,
      "PUBCHEM.COMPOUND:5988--biolink:affects->NCBIGene:6925": null,
      "PUBCHEM.COMPOUND:9427--biolink:affects->NCBIGene:6935": null
    },
    "error.knowledge_graph.edge.qualifiers.qualifier.invalid": {
      "PUBCHEM.COMPOUND:12035--biolink:affects->NCBIGene:6935": [
        {
          "qualifier_type_id": "biolink:qualified_predicate",
          "qualifier_value": "biolink:causes"
        },
        {
          "qualifier_type_id": "biolink:object_aspect_qualifier",
          "qualifier_value": "activity_or_abundance"
        }
      ],
      "PUBCHEM.COMPOUND:445154--biolink:affects->NCBIGene:6925": [
        {
          "qualifier_type_id": "biolink:qualified_predicate",
          "qualifier_value": "biolink:causes"
        },
        {
          "qualifier_type_id": "biolink:object_aspect_qualifier",
          "qualifier_value": "activity_or_abundance"
        }
      ],
      "PUBCHEM.COMPOUND:5202--biolink:affects->NCBIGene:6925": [
        {
          "qualifier_type_id": "biolink:qualified_predicate",
          "qualifier_value": "biolink:causes"
        },
        {
          "qualifier_type_id": "biolink:object_aspect_qualifier",
          "qualifier_value": "activity_or_abundance"
        }
      ],
      "PUBCHEM.COMPOUND:5329102--biolink:affects->NCBIGene:6935": [
        {
          "qualifier_type_id": "biolink:qualified_predicate",
          "qualifier_value": "biolink:causes"
        },
        {
          "qualifier_type_id": "biolink:object_aspect_qualifier",
          "qualifier_value": "activity_or_abundance"
        }
      ],
      "PUBCHEM.COMPOUND:5988--biolink:affects->NCBIGene:6925": [
        {
          "qualifier_type_id": "biolink:qualified_predicate",
          "qualifier_value": "biolink:causes"
        },
        {
          "qualifier_type_id": "biolink:object_aspect_qualifier",
          "qualifier_value": "activity_or_abundance"
        }
      ],
      "PUBCHEM.COMPOUND:9427--biolink:affects->NCBIGene:6935": [
        {
          "qualifier_type_id": "biolink:qualified_predicate",
          "qualifier_value": "biolink:causes"
        },
        {
          "qualifier_type_id": "biolink:object_aspect_qualifier",
          "qualifier_value": "activity_or_abundance"
        }
      ]
    }
  },
  "information": {},
  "warnings": {}
}

Example 2: https://arax.ncats.io/beta/?r=61f8c292-39d3-4f8c-9831-ad26c6a793b6

image

More info on the validation errors are available at https://ncatstranslator.github.io/reasoner-validator/validation_codes_dictionary.html

@colleenXu
Copy link
Collaborator

colleenXu commented Mar 15, 2023

How to use the TRAPI validator:

  • to run the TRAPI validator locally (in python), see my (updated) notebook
  • send TRAPI queries to the ARS, then look them up in the ARAX-beta UI website
    • Sending to the ARS: POST request tohttps://ars-prod.transltr.io/ars/api/submit (other URLs listed here))
    • Look at the request's response, and copy the value of the pk field
    • Go to https://arax.ncats.io/beta/, click import (middle top of page). Paste the pk value and click load.
    • Hover over the value in the TRAPI 1.3? column to see the overview (Andrew pasted an screenshot of what this looks like in example 2)
    • Click on value to get a pop-up with the full message (Andrew pasted the contents for example 1 in a collapsed section)

@colleenXu
Copy link
Collaborator

colleenXu commented Mar 15, 2023

In Andrew's examples, there are 4 categories of errors raised by the validator. I'm posting a set of 3 comments that cover these error categories.


Investigation Part 1: "invalid qualifiers"

My guess is that the validator is flagging edges where the biolink:qualified_predicate is biolink:causes. However, the error message "invalid qualifier" isn't specific enough to know exactly what is the problem.

However, it looks like we are modeling this exactly how we were told by Sierra: put the biolink prefix on qualified-predicate values and no prefixes on the enum values (Translator Slack discussion, notes, and related Translator Slack discussion)

From Example 1 (this is JSON)
    "error.knowledge_graph.edge.qualifiers.qualifier.invalid": {
      "PUBCHEM.COMPOUND:12035--biolink:affects->NCBIGene:6935": [
        {
          "qualifier_type_id": "biolink:qualified_predicate",
          "qualifier_value": "biolink:causes"
        },
        {
          "qualifier_type_id": "biolink:object_aspect_qualifier",
          "qualifier_value": "activity_or_abundance"
        }
      ],
From the notebook (this is python dict, printed)

https://github.com/colleenXu/RegistryMetadataDev/blob/main/ValidatingBTE_Response.ipynb

  'error.knowledge_graph.edge.qualifiers.qualifier.invalid': {'PUBCHEM.COMPOUND:214350--biolink:affects->NCBIGene:3785': [{'qualifier_type_id': 'biolink:qualified_predicate',
     'qualifier_value': 'biolink:causes'},
    {'qualifier_type_id': 'biolink:object_aspect_qualifier',
     'qualifier_value': 'activity'},
    {'qualifier_type_id': 'biolink:object_direction_qualifier',
     'qualifier_value': 'increased'},
    {'qualifier_type_id': 'biolink:causal_mechanism_qualifier',
     'qualifier_value': 'activation'}],

IN-PROGRESS: Translator Slack discussion with Sierra and Richard B

@edeutsch
Copy link

I was thinking that "activity_or_abundance" wasn't a valid value. Is that a valid value?

@edeutsch
Copy link

Where does one find a list of allowed values? Not finding it here:
https://biolink.github.io/biolink-model/docs/aspect_qualifier.html

@colleenXu
Copy link
Collaborator

@edeutsch there's an enum here that includes activity_or_abundance. Also see the Translator #data-modeling Slack thread

@colleenXu
Copy link
Collaborator

colleenXu commented Mar 16, 2023

Investigation Part 2: "unknown edge attribute-type-id" and "missing primary knowledge source"

Both of these are only on edges from Text-Mining KP.

To investigate, I ran a query earlier today, where BTE would only use Text-Mining KP edges.

  • Looks like primary_knowledge_source is there. So the examples in the opening post were perhaps generated before Text-Mining KP's update (last week, fixed this).
  • In today's query, the validator only raised the "unknown edge attribute-type-id" error.
{
  "errors": {
    "error.knowledge_graph.edge.attribute.type_id.unknown": {
      "biolink:has_evidence_count": null,
      "biolink:supporting_document": null,
      "biolink:supporting_study_result": null,
      "biolink:tmkp_confidence_score": null
    }
  },
  "information": {},
  "warnings": {}
}

STATUS: I've informed their team through Translator Slack. It'll be their responsibility to handle this.

@edeutsch
Copy link

Thanks. FYI https://arax.ncats.io/beta is now running the TRAPI validator version 3.4.9

@colleenXu
Copy link
Collaborator

colleenXu commented Mar 16, 2023

Investigation Part 3: "edge attribute_type_id not curie"

This still needs discussion within our team.

My notes on this specific error
  • Minor-ish: original_subject_name / original_object_name are keys in semmeddb's response-mapping. They don't have equivalents in the biolink-model (but are similar to original subject and original object which are in the biolink-model and are used in semmeddb's response-mapping)
  • Why have those edge-attributes? Easier to "debug", maybe useful if we force canonical direction only in the future
My commentary on the larger issues
  • The validator is encountering / will encounter errors because a lot of edge-attributes aren't in the biolink-model / type_id or value aren't curies…so…what do we do?
    • We could adjust x-bte annotation (queries/response-mapping), so we only provide what we can currently put in Translator/TRAPI/biolink-model's exact format
    • But that's super limiting…
    • This is related to my notes from last week's group meeting on how to process non-Translator-standard stuff into Translator standard (technical aspects? And lots-of-stuff-to-comb-through…)
  • Potential Future issue: qualifiers/other info may be "invalid" when edges are "reversed" compared to biolink-model's canonical. We have an issue on this

@colleenXu
Copy link
Collaborator

Note that with the ARAX-beta update that just happened, Andrew's examples now give have different validation-reports that have more issues. I haven't analyzed these yet.

@colleenXu colleenXu self-assigned this Mar 22, 2023
@colleenXu
Copy link
Collaborator

colleenXu commented May 6, 2023

A more recent example:

ARAX link with validation codes dictionary

Validation json
{
  "errors": {
    "error.knowledge_graph.edge.attribute.type_id.not_curie": {
      "NCBIGene:1017--biolink:participates_in->ncats.bioplanet:bioplanet_1139": [
        {
          "attribute_type_id": "pathway_categories"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->ncats.bioplanet:bioplanet_117": [
        {
          "attribute_type_id": "pathway_categories"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->ncats.bioplanet:bioplanet_1185": [
        {
          "attribute_type_id": "pathway_categories"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->ncats.bioplanet:bioplanet_1291": [
        {
          "attribute_type_id": "pathway_categories"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->ncats.bioplanet:bioplanet_1296": [
        {
          "attribute_type_id": "pathway_categories"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->ncats.bioplanet:bioplanet_1421": [
        {
          "attribute_type_id": "pathway_categories"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->ncats.bioplanet:bioplanet_1541": [
        {
          "attribute_type_id": "pathway_categories"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->ncats.bioplanet:bioplanet_1582": [
        {
          "attribute_type_id": "pathway_categories"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->ncats.bioplanet:bioplanet_1592": [
        {
          "attribute_type_id": "pathway_categories"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->ncats.bioplanet:bioplanet_1623": [
        {
          "attribute_type_id": "pathway_categories"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->ncats.bioplanet:bioplanet_194": [
        {
          "attribute_type_id": "pathway_categories"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->ncats.bioplanet:bioplanet_279": [
        {
          "attribute_type_id": "pathway_categories"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->ncats.bioplanet:bioplanet_285": [
        {
          "attribute_type_id": "pathway_categories"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->ncats.bioplanet:bioplanet_292": [
        {
          "attribute_type_id": "pathway_categories"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->ncats.bioplanet:bioplanet_454": [
        {
          "attribute_type_id": "pathway_categories"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->ncats.bioplanet:bioplanet_490": [
        {
          "attribute_type_id": "pathway_categories"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->ncats.bioplanet:bioplanet_537": [
        {
          "attribute_type_id": "pathway_categories"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->ncats.bioplanet:bioplanet_551": [
        {
          "attribute_type_id": "pathway_categories"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->ncats.bioplanet:bioplanet_61": [
        {
          "attribute_type_id": "pathway_categories"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->ncats.bioplanet:bioplanet_64": [
        {
          "attribute_type_id": "pathway_categories"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->ncats.bioplanet:bioplanet_70": [
        {
          "attribute_type_id": "pathway_categories"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->ncats.bioplanet:bioplanet_71": [
        {
          "attribute_type_id": "pathway_categories"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->ncats.bioplanet:bioplanet_875": [
        {
          "attribute_type_id": "pathway_categories"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->ncats.bioplanet:bioplanet_883": [
        {
          "attribute_type_id": "pathway_categories"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->ncats.bioplanet:bioplanet_907": [
        {
          "attribute_type_id": "pathway_categories"
        }
      ]
    }
  },
  "information": {},
  "warnings": {
    "warning.knowledge_graph.edge.attribute.type_id.not_association_slot": {
      "biolink:Attribute": [
        {
          "edge_id": "NCBIGene:1017--biolink:actively_involved_in->GO:0042770"
        },
        {
          "edge_id": "NCBIGene:1017--biolink:actively_involved_in->GO:0007264"
        },
        {
          "edge_id": "NCBIGene:1017--biolink:actively_involved_in->GO:0031570"
        },
        {
          "edge_id": "NCBIGene:1017--biolink:actively_involved_in->GO:0006977"
        },
        {
          "edge_id": "NCBIGene:1017--biolink:actively_involved_in->GO:0044774"
        },
        {
          "edge_id": "NCBIGene:1017--biolink:actively_involved_in->GO:0007265"
        },
        {
          "edge_id": "NCBIGene:1017--biolink:actively_involved_in->GO:0044773"
        },
        {
          "edge_id": "NCBIGene:1017--biolink:actively_involved_in->GO:0072331"
        },
        {
          "edge_id": "NCBIGene:1017--biolink:actively_involved_in->GO:0007093"
        }
      ]
    },
    "warning.knowledge_graph.edge.predicate.non_canonical": {
      "NCBIGene:1017--biolink:participates_in->BIOCARTA:cellcyclepathway": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->BIOCARTA:efppathway": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->BIOCARTA:fbw7pathway": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->BIOCARTA:g1pathway": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->BIOCARTA:mcmpathway": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->BIOCARTA:p27pathway": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->BIOCARTA:p53pathway": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->BIOCARTA:rbpathway": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->KEGG.PATHWAY:hsa04068": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->KEGG.PATHWAY:hsa04110": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->KEGG.PATHWAY:hsa04114": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->KEGG.PATHWAY:hsa04115": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->KEGG.PATHWAY:hsa04914": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->KEGG.PATHWAY:hsa04934": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->KEGG.PATHWAY:hsa05160": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->KEGG.PATHWAY:hsa05162": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->KEGG.PATHWAY:hsa05165": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->KEGG.PATHWAY:hsa05166": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->KEGG.PATHWAY:hsa05169": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->KEGG.PATHWAY:hsa05203": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->KEGG.PATHWAY:hsa05215": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->REACT:R-HSA-1266738": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->REACT:R-HSA-157579": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->REACT:R-HSA-162582": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->REACT:R-HSA-174143": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->REACT:R-HSA-212436": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->REACT:R-HSA-2559586": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->REACT:R-HSA-453274": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->REACT:R-HSA-453276": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->REACT:R-HSA-5693538": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->REACT:R-HSA-5693607": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->REACT:R-HSA-6791312": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->REACT:R-HSA-6804116": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->REACT:R-HSA-6804756": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->REACT:R-HSA-6806003": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->REACT:R-HSA-68949": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->REACT:R-HSA-69052": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->REACT:R-HSA-69231": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->REACT:R-HSA-69236": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->REACT:R-HSA-69273": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->REACT:R-HSA-69275": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->REACT:R-HSA-69278": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->REACT:R-HSA-69580": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->REACT:R-HSA-69615": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->REACT:R-HSA-69620": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->REACT:R-HSA-69656": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->REACT:R-HSA-73857": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->REACT:R-HSA-73886": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->REACT:R-HSA-73894": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->REACT:R-HSA-8848021": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->REACT:R-HSA-8849470": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->REACT:R-HSA-9616222": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->REACT:R-HSA-9659787": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->REACT:R-HSA-9675126": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->REACT:R-HSA-983231": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->WIKIPATHWAYS:WP1530": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->WIKIPATHWAYS:WP2261": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->WIKIPATHWAYS:WP2374": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->WIKIPATHWAYS:WP2431": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->WIKIPATHWAYS:WP2446": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->WIKIPATHWAYS:WP2586": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->WIKIPATHWAYS:WP4172": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->WIKIPATHWAYS:WP45": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->WIKIPATHWAYS:WP4658": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->WIKIPATHWAYS:WP466": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->WIKIPATHWAYS:WP707": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->ncats.bioplanet:bioplanet_1139": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->ncats.bioplanet:bioplanet_117": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->ncats.bioplanet:bioplanet_1185": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->ncats.bioplanet:bioplanet_1291": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->ncats.bioplanet:bioplanet_1296": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->ncats.bioplanet:bioplanet_1421": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->ncats.bioplanet:bioplanet_1541": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->ncats.bioplanet:bioplanet_1582": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->ncats.bioplanet:bioplanet_1592": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->ncats.bioplanet:bioplanet_1623": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->ncats.bioplanet:bioplanet_194": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->ncats.bioplanet:bioplanet_279": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->ncats.bioplanet:bioplanet_285": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->ncats.bioplanet:bioplanet_292": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->ncats.bioplanet:bioplanet_454": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->ncats.bioplanet:bioplanet_490": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->ncats.bioplanet:bioplanet_537": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->ncats.bioplanet:bioplanet_551": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->ncats.bioplanet:bioplanet_61": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->ncats.bioplanet:bioplanet_64": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->ncats.bioplanet:bioplanet_70": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->ncats.bioplanet:bioplanet_71": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->ncats.bioplanet:bioplanet_875": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->ncats.bioplanet:bioplanet_883": [
        {
          "predicate": "biolink:participates_in"
        }
      ],
      "NCBIGene:1017--biolink:participates_in->ncats.bioplanet:bioplanet_907": [
        {
          "predicate": "biolink:participates_in"
        }
      ]
    },
    "warning.knowledge_graph.node.unmapped_prefix": {
      "BIOCARTA:cellcyclepathway": [
        {
          "categories": "['biolink:Pathway']"
        }
      ],
      "BIOCARTA:efppathway": [
        {
          "categories": "['biolink:Pathway']"
        }
      ],
      "BIOCARTA:fbw7pathway": [
        {
          "categories": "['biolink:Pathway']"
        }
      ],
      "BIOCARTA:g1pathway": [
        {
          "categories": "['biolink:Pathway']"
        }
      ],
      "BIOCARTA:mcmpathway": [
        {
          "categories": "['biolink:Pathway']"
        }
      ],
      "BIOCARTA:p27pathway": [
        {
          "categories": "['biolink:Pathway']"
        }
      ],
      "BIOCARTA:p53pathway": [
        {
          "categories": "['biolink:Pathway']"
        }
      ],
      "BIOCARTA:rbpathway": [
        {
          "categories": "['biolink:Pathway']"
        }
      ],
      "NCBIGene:1017": [
        {
          "categories": "['biolink:BiologicalEntity']"
        }
      ]
    },
    "warning.query_graph.node.ids.unmapped_to_categories": {
      "n0": [
        {
          "categories": "['biolink:BiologicalEntity']",
          "unmapped_ids": "['NCBIGene:1017']"
        }
      ]
    }
  }
}

Analysis of errors

There's 1 type of error, where an Edge attribute's type_id isn't a curie and biolink-model/TRAPI/Translator expects it to be...

  • This is coming from the x-bte annotation response-mapping for bioplanet pathway-gene for pathway_categories.
  • We decided in a May 3rd group meeting to remove this response-mapping, which will resolve this error.
  • However, this is part of larger issues, which I pointed out in "Investigation Part 3" above

Analysis of warnings

There are multiple warnings, however these are either not related to BTE or are ongoing issues...

  • warning.knowledge_graph.edge.attribute.type_id.not_association_slot
    • This validation code is alerting that some edge-attributes have the type_id biolink:Attribute and the validator doesn't think this is right
    • Those Edges aren't from Service Provider/BTE; we get them from Automat Hetio (TRAPI KP).
  • warning.knowledge_graph.edge.predicate.non_canonical
    • This has to do with Respect canonical predicate direction #476
    • However, this arises because we build our KG Edges to follow the QEdge direction, so node- and edge-bindings align correctly. It seems confusing and introducing errors on purpose, to try to force canonical direction and create KG Edges that don't follow the QEdge direction (node-bindings/edge-bindings don't follow the KG Edge direction??)....so this is an ongoing topic for us to discuss / bring up to the rest of Translator.
  • warning.knowledge_graph.node.unmapped_prefix
    • almost all of these are "biolink-model doesn't know BIOCARTA is a Pathway id-prefix". We'd have to ask them to add it.
    • also includes NCBIGene:1017 as a BiologicalEntity: so "biolink-model doesn't know NCBIGene could be a BiologicalEntity id-prefix". However, this does seem odd: pretty much all NCBIGene IDs should be Genes...
      • this is coming from SRI Node Norm. Asked them about it (Translator Slack link), and they said this is unintended and they'll look into it
  • warning.query_graph.node.ids.unmapped_to_categories
    • this is odd. I think this corresponds to this validation code and would be similar to the warning above "biolink-model doesn't know NCBIGene could be a BiologicalEntity id-prefix".
    • It looks to me that reasoner-validator would raise this warning or another warning/error if the category was missing from an QNode that has an ID. However, I thought it was fine TRAPI to have a QNode with an ID and no category...

colleenXu referenced this issue in NCATS-Tangerine/translator-api-registry May 6, 2023
it's more of an node attribute and reasoner-validator considers it not a valid edge-attribute type_id
@colleenXu
Copy link
Collaborator

Regarding Investigation Part 3 / attribute_type_id and value issues...

  • involves going through each operation and its response-mapping keys? Would compiling a list help or just going through them all?
  • it would be easier to pair with Sierra (biolink-model) and go through each response-mapping key and add it to the biolink-model or find a curie for it?

@andrewsu
Copy link
Member Author

andrewsu commented May 6, 2023

Just a note for future reference. Given how much other stuff we have on our plate, I'm only concerned with TRAPI validation errors at the moment. Validation warnings can be put on the back burner unless there's something specific we collectively agree at a prioritization meeting that we want to address...

@colleenXu colleenXu changed the title investigate TRAPI 1.3 validation error investigate TRAPI validation errors showing in ARAX UI Jun 28, 2023
@colleenXu
Copy link
Collaborator

From today's group meeting:

Deadline (link to Translator google spreadsheet) in mid-July…

Fully semantically valid biolink 3.5.0 + TRAPI 1.4.0 from all actors in PROD

Our understanding is that this basically == "green checkmark in ARAX UI" for creative-mode queries going through UI/ARS. Unclear if sending queries to BTE through ARAX UI will also trigger the validation

This "green checkmark in ARAX UI" is now a goal. We agreed to start paying more attention to that part of the ARAX UI, bringing up issues we see, and addressing them

@edeutsch
Copy link

Unclear if sending queries to BTE through ARAX UI will also trigger the validation

I think the answer is probably that it will not. Just because of how the system has evolved, validation happens during fetching of pre-existing PKs from the ARS. but not queries initiated through our UI. But not completely certain in this answer.

@colleenXu
Copy link
Collaborator

One known issue is edge-attributes not being curies / biolink-model association-slot terms (Discussed in the past, here as "Part 3" and here)

I'd like to run new queries through the UI/ARS, and see if these are still showing up as a validation error.

And if we still need to address it...

  • for now:
    • comment out response-mapping for edge info that doesn't have a clear, valid attribute_type_id (curies / biolink-model association-slot terms). This will remove a LOT of edge-info...>.<
    • alert text-mining / multiomics teams if their edge attributes aren't passing validation
  • for later: a more careful process, with biolink-model / data-modeling devs, to review edge-info. May involve post-processing (BTE, JQ?), adding association-slots to biolink-model...

@colleenXu
Copy link
Collaborator

Analyzing the latest ARAX-UI+reasoner-validator validation of BTE responses

query: creative-mode MVP1, what treats acanthosis nigricans
{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "categories":["biolink:ChemicalEntity"]
                },
                "n1": {
                    "ids":["MONDO:0007035"],
                    "categories":["biolink:DiseaseOrPhenotypicFeature"],
                    "name": "acanth"
               }
            },
            "edges": {
                "e0": {
                    "subject": "n0",
                    "object": "n1",
                    "predicates": ["biolink:treats"],
                    "knowledge_type": "inferred"
                }
            }
        }
    }
}

Using ARAX-UI-beta: parent PK and BTE's response PK

Versioning: Reasoner Validator version '3.7.4' validating against TRAPI schema version 'v1.4.2' and Biolink Model version '3.5.0'. (from ARAX-UI-beta's validation message)

Analyzing Errors

There were no critical ("red") errors.

There were some lower-level/ordinary ("orange") errors: click here to expand

Edge has a value that is not a CURIE for attribute_type_id (Knowledge Graph Edge Attribute)

To-do: described in my previous comment. We already know about this issue.

full text
	$ infores:semmeddb -> infores:biothings-semmeddb -> infores:biothings-explorer
		# original_object_name:
		- edge_id: 
			UMLS:C0021463--biolink:treats->MONDO:0008487
	$ global
		# semmeddb_predication_count:
		- edge_id: 
			PUBCHEM.COMPOUND:5281807--biolink:treats->HP:0000855
	$ infores:dgidb -> infores:biothings-dgidb -> infores:biothings-explorer
		# dgidb_interaction_claim_source:
		- edge_id: 
			PUBCHEM.COMPOUND:14720269--biolink:affects->NCBIGene:2261
	$ infores:monarchinitiative -> infores:biolink-api -> infores:biothings-explorer
		# monarch_source_database:
		- edge_id: 
			MONDO:0043003--biolink:has_phenotype->MONDO:0007035
	$ infores:hpo-annotations -> infores:mydisease-info -> infores:biothings-explorer
		# omim_refs:
		- edge_id: 
			MONDO:0008696--biolink:has_phenotype->HP:0000855

Qualifier_type_id for edge has unresolved qualifier_value (Knowledge Graph Edge Qualifier)

	$ global
		# activity_or_abundance:
		- edge_id | qualifier_type_id: 
			PUBCHEM.COMPOUND:86705695--biolink:affects->NCBIGene:2261 | biolink:object_aspect_qualifier

activity_or_abundance is a valid value for an aspect qualifier.

To-do: Follow up with Sierra (biolink-model) and Richard B (reasoner-validator). There appears to be errors in biolink-model's chemical affects gene association:

This has been previously discussed in this issue.

Edge has unknown attribute_type_id (Knowledge Graph Edge Attribute)

	$ infores:text-mining-provider-targeted -> infores:biothings-explorer
		# biolink:semmed_agreement_count:
		- edge_id: 
			PUBCHEM.COMPOUND:5743--biolink:treats->MONDO:0008487
	$ global
		# biolink:semmed_agreement_count:
		- edge_id: 
			PUBCHEM.COMPOUND:3397--biolink:treats->MONDO:0008487

To-do: Follow-up with Edgar (Text-Mining). This is coming from their KP, and the issue seems to be that biolink-model 3.5.0 doesn't have the association-slot/edge-attribute "semmed_agreement_count"...

Analyzing Warnings

These all come from biolink-model or NodeNorm behavior...and I don't think code changes are needed for BTE.

It's a less-urgent to-do to let those tools' devs know.

click to expand

Node identifiers found unmapped to target categories for node (Query Graph Node)

	$ global
		# n1:
		- unmapped_ids | categories: 
			['MONDO:0007035'] | ['biolink:DiseaseOrPhenotypicFeature']

I don't think this is something to worry about. This is a mismatch between what a user may assign a QNode category to be vs. what the categories of the QNode IDs are in NodeNorm.

The origin is: biolink-model doesn't list MONDO as an accepted prefix for DiseaseOrPhenotypicFeature (in fact no prefixes are given).

Node identifier found unmapped to target categories for node (Knowledge Graph Node)

	$ global
		# CHEMBL.COMPOUND:CHEMBL3545300:
		- categories: 
			['biolink:ChemicalEntity']

I don't think this is something to worry about.

The origin is: biolink-model doesn't list CHEMBL.COMPOUND as an accepted prefix for ChemicalEntity

Node has a abstract or mixin category (Knowledge Graph Node)

	$ global
		# biolink:BiologicalEntity:
		- node_id: 
			UMLS:C1099354

I don't think this is something to worry about. This ID is RNA, Small Interfering.

The origin is:

Edge has an attribute_type_id that is not an association slot (Knowledge Graph Edge Attribute)

	$ infores:hpo-annotations -> infores:mydisease-info -> infores:biothings-explorer
		# biolink:evidence_type:
		- edge_id: 
			MONDO:0008696--biolink:has_phenotype->HP:0000855
	$ global
		# biolink:support_graphs:
		- edge_id: 
			CHEMBL.COMPOUND:CHEMBL4298120--biolink:treats->MONDO:0007035
	$ infores:biothings-explorer
		# biolink:support_graphs:
		- edge_id: 
			MESH:C034130--biolink:treats->MONDO:0007035

Both evidence_type and support_graphs exist in biolink-model, but aren't association-slots.

We can remove the evidence_type edge-attribute (see previous comment)...

but the biolink:support_graphs is what we've been told to use as attribute_type_id for support-graphs for edges...

@colleenXu
Copy link
Collaborator

colleenXu commented Jul 14, 2023

[DONE: 2023-07-20 late night]

As discussed in the earlier comment, I'm commenting out / adjusting the fields BTE queries for and the response-mapping. This will address a pervasive TRAPI validation error: Edge has a value that is not a CURIE for attribute_type_id

Covered in these commits:

colleenXu added a commit to colleenXu/biolink-model that referenced this issue Jul 21, 2023
chemical-affects-gene-association aspect qualifiers were set to the "Part" enum, when they should probably be set to the "Aspect" Enum. 

This was identified in biothings/biothings_explorer#587 (comment) (under Analyzing Errors, Qualifier_type_id for edge has unresolved qualifier_value (Knowledge Graph Edge Qualifier)
@colleenXu
Copy link
Collaborator

colleenXu commented Jul 21, 2023

Seeing what validation issues still exist after adjusting x-bte annotation

Ran the same creative-mode MVP1 query from my post last week. I think our responses are basically good!

Using ARAX-UI-dev: parent PK and BTE's response PK

Versioning: same as before (Reasoner Validator version '3.7.4' validating against TRAPI schema version 'v1.4.2' and Biolink Model version '3.5.0'.). from ARAX-UI's validation message

Analyzing Errors

There were still no critical ("red") errors.

Of the lower-level/ordinary ("orange") errors: these are all addressed or communicated to the other teams responsible
  • No longer happening: Edge has a value that is not a CURIE for attribute_type_id (Knowledge Graph Edge Attribute). Which shows that the x-bte annotation adjustments addressed this...
  • biolink-model / reasoner-validator issue: Qualifier_type_id for edge has unresolved qualifier_value. I've done my to-do and followed up with Sierra (biolink-model) and Richard B (reasoner-validator), with the biolink PR here
  • text-mining issue: Edge has unknown attribute_type_id (Knowledge Graph Edge Attribute). I've done my to-do and followed up with Edgar + Bill B (Text-Mining) in Translator Slack. Below is the error message that's still occurring. EDIT: here's the biolink-model PR that adds this attribute_type_id...
    • Pasting my previous note: the issue seems to be that biolink-model 3.5.0 doesn't have the association-slot/edge-attribute "semmed_agreement_count"...
trapi validation error
* Knowledge Graph Edge Attribute Type Id:
=> Edge has unknown attribute_type_id
	$ infores:text-mining-provider-targeted -> infores:biothings-explorer
		# biolink:semmed_agreement_count:
		- edge_id: 
			PUBCHEM.COMPOUND:5743--biolink:treats->MONDO:0008487
	$ global
		# biolink:semmed_agreement_count:
		- edge_id: 
			PUBCHEM.COMPOUND:3397--biolink:treats->MONDO:0008487

Analyzing Warnings

these are all addressed or communicated to the other teams responsible
  • no longer happening: Node identifiers found unmapped to target categories for node (Query Graph Node). The change is likely due to NodeNorm (prod instance) update earlier this week. Now BTE should be retrieving the "preferred/most-specific" category for the ID as the top entry of the categories list from NodeNorm responses (which is Disease, not DiseaseOrPhenotypicFeature).
    • I was mistaken in my earlier post - this wasn't from the user-given category, it's from the NodeNorm-given category...
  • Node identifier found unmapped to target categories for node (Knowledge Graph Node) for CHEMBL.COMPOUND:CHEMBL3545300: I've followed up with Sierra and Richard B in Translator Slack.
    * Previous note: The origin is: biolink-model doesn't list CHEMBL.COMPOUND as an accepted prefix for ChemicalEntity
  • no longer happening: Node has a abstract or mixin category (Knowledge Graph Node) for UMLS:C1099354 RNA, Small Interfering. Now it's not showing up. Just in case, I checked the response from BTE before it goes through the ARS (this link should be valid for as long as BTE caches previous responses...), and it's also missing there. But even if it shows up again, I'm pretty sure this warning will stop happening for two reasons:
  • Edge has an attribute_type_id that is not an association slot (Knowledge Graph Edge Attribute): I've made a follow-up issue in biolink-model and tagged Sierra and Richard B. EDIT: here's the biolink-model PR.
    • this is still happening for biolink:support_graphs. This isn't happening anymore for biolink:evidence_type after the x-bte annotation adjustments.

@tokebe
Copy link
Member

tokebe commented Aug 1, 2023

Marking as external because a lot of the remaining issues with validation seem to come from the validator rather than actual invalid problems from BTE.

@tokebe tokebe added the external Requires fixes to an external service label Aug 1, 2023
@colleenXu
Copy link
Collaborator

colleenXu commented Aug 8, 2023

Update on what's going on now:

  • ARAX's TRAPI validation service is pinned to use reasoner-validator 3.8.0 and biolink-model 3.5.0. It shows all the same errors/warnings as before.
    • During today's Translator Architecture meeting, we discussed updating what ARAX does so it uses biolink-model 3.5.3
    • once it makes this change, we should show up as a green-check (only warnings, see next point)
  • When biolink-model 3.5.3 is used for validation (see in my updated notebook for running TRAPI validation locally), we only have 1 warning left! Which is the ChemicalEntity / CHEMBL.COMPOUND one.
  • It may be whack-a-mole to deal with this kind of warning, since it'll come up whenever biolink-model doesn't fully list each possible ID-prefix for a node category...

@colleenXu
Copy link
Collaborator

Moving to on-hold because by now, it's very much a "wait for Translator to update biolink-model / reasoner-validator"...

@edeutsch
Copy link

edeutsch commented Aug 9, 2023

The latest validator version 3.8.0 configured to validate against Biolink model 3.5.3 is now deployed at https://arax.ci.transltr.io/

@andrewsu
Copy link
Member Author

andrewsu commented Aug 9, 2023

With my n=1 spot check at https://arax.ci.transltr.io/?r=a13266eb-fd85-4bd6-95b9-6510832b35e8, BTE is returning a green check. @colleenXu unless you had additional testing you wanted to do, feel free to close this issue and check BTE off on NCATSTranslator/Feedback#379. Thanks all!

image

@colleenXu
Copy link
Collaborator

Okay, going to close this issue. Yay!

@colleenXu colleenXu added this to the 2023-08-18 Code Freeze milestone Aug 10, 2023
@colleenXu
Copy link
Collaborator

Note that I see validation issues when BTE uses the interacts_with predicate (either in query_graph or in knowledge_graph) because this predicate is a mixin.

The use of this predicate was previously brought up in biolink/biolink-model#1171

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
external Requires fixes to an external service trapi 1.4
Projects
None yet
Development

No branches or pull requests

4 participants