phase 1: aux-graph/result.analyses refactor for basic querying #603

colleenXu · 2023-03-29T07:04:18Z

READ FIRST:

this issue will be changed as we discuss it, get clarification from the TRAPI team, or get changes from the TRAPI team

Overview

TRAPI 1.4 includes major refactoring of the Response structure. This issue covers the desired behavior for basic querying (non-creative-mode).

timing: ASAP
priority: high

General requirements

query_graph cannot be modified, result.node_bindings must only include QNodes from query_graph. From this line in the migration guide
removes result.edge_bindings and result.score
adds auxiliary_graphs (are basically groups of edges) and result.analyses
- Auxiliary graphs are graphs that provide support or evidence for results or edges (creative-mode / maybe ID-expansion). From this line in the migration guide
- If aux-graphs are referenced on the edges, they don't need to be referenced in the results as well. From this line in the migration guide
- result.analyses.edge_bindings must only include QEdges from query_graph. From this line in the migration guide

TRAPI spec references

migration guide
bullets 7 and 13 of the changelog:

Encoding extra supporting graph information in TRAPI This is a complex breaking change that moves Result.edge_bindings into Result.analyses which link to AuxiliaryGraph objects https://github.com/NCATSTranslator/ReasonerAPI/pull/389/files

Change Analysis.reasoner_id to Analysis.resource_id https://github.com/NCATSTranslator/ReasonerAPI/pull/416/files

Relevant scenarios

There are 2 scenarios for a result, depending on whether its Result Nodes match exactly with the original QNodes (aka no QNode ID/node-expansion involved) or not.

I use this query in my examples below

Two-hop Predict similar to a creative-mode treats template. Takes me about ~2 min to run in my local.

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "ids":["MONDO:0005377"],
                    "categories":["biolink:DiseaseOrPhenotypicFeature"],
                    "name": "noonan"
                },
                "n1": {
                    "categories":["biolink:Gene"],
                    "is_set": true
                },
                "n2": {
                    "categories":["biolink:ChemicalEntity"]
               }
            },
            "edges": {
                "eA": {
                    "subject": "n0",
                    "object": "n1",
                    "predicates": ["biolink:caused_by"]
                },
                "eB": {
                    "subject": "n1",
                    "object": "n2",
                    "predicates": ["biolink:regulated_by", "biolink:affected_by"]
                }
            }
        }
    }
}

My "technical implementation" musings

Before results-assembly, ID/node-expansion behavior changes?

don't mutate the "query_graph" in the output TRAPI response (fine to do whatever we need in underlying query execution)
"ID" in all the text below refers to entity-IDs/Curies unless otherwise specified
During query-execution, when an original QNode's IDs are expanded (descendant IDs are retrieved)...keep track of which original ID(s) each descendant ID comes from
- in the example query below, there was only 1 original ID....that will expand to a set of 48 IDs (itself and 47 descendants)
- it may be possible for one descendant ID to correspond to multiple original IDs (the original IDs are ancestors/descendants of each other)....can we handle this?
at some point, create subclass_of edges for each descendant ID that's used in KP Edges...
- subject: descendant ID
- object: original ID
- predicate: biolink:subclass_of
- sources: [ "resource_id": "infores:biothings-explorer", "resource_role": "primary_knowledge_source" ]
  - ideally, the primary element would be the relevant ontology infores....and there'd be a separate element to say this edge comes from BTE....but let's not worry about that right now

Other musings:

basically replace result.score -> result.analyses.score
BTE will not use result.analyses.support_graphs because we don't add "supporting nodes and edges" for our scoring
BTE will reference auxiliary-graphs in "collapsed" Edges that it creates
probably all edges in knowledge_graph.edges should be referenced in either a result.analyses object's edge_bindings or an auxiliary_graphs object's edges...

no ID/node-expansion was involved

In this situation, for the result.analyses array, make an object where:

edge_bindings property holds the old result.edge_bindings content
score property holds the old result.score content
resource_id property is infores:biothings-explorer

a current result

notice the node_bindings: the n0 ID is the same as the bound Node ID (so no ID-expansion was done)

            {
                "node_bindings": {
                    "n0": [
                        {
                            "id": "MONDO:0005377"
                        }
                    ],
                    "n1": [
                        {
                            "id": "NCBIGene:3315"
                        }
                    ],
                    "n2": [
                        {
                            "id": "PUBCHEM.COMPOUND:23667548"
                        }
                    ]
                },
                "edge_bindings": {
                    "eA": [
                        {
                            "id": "54d9ed32bec4d12369592709e20c997f"
                        }
                    ],
                    "eB": [
                        {
                            "id": "51a4d02f0097f1ddb6af1d96631e1177"
                        },
                        {
                            "id": "50c2279ba69bc6eb9474133c71e89a6b"
                        }
                    ]
                },
                "score": 2.2933946031476955
            }

new result

            {
                "node_bindings": {
                    "n0": [
                        {
                            "id": "MONDO:0005377"
                        }
                    ],
                    "n1": [
                        {
                            "id": "NCBIGene:3315"
                        }
                    ],
                    "n2": [
                        {
                            "id": "PUBCHEM.COMPOUND:23667548"
                        }
                    ]
                },
                "analyses": [
                    {
                        "resource_id": "infores:biothings-explorer",
                        "edge_bindings": {
                            "eA": [
                                {
                                    "id": "54d9ed32bec4d12369592709e20c997f"
                                }
                            ],
                            "eB": [
                                {
                                    "id": "51a4d02f0097f1ddb6af1d96631e1177"
                                },
                                {
                                    "id": "50c2279ba69bc6eb9474133c71e89a6b"
                                }
                            ]
                        }
                        "score": 2.2933946031476955
                    }
                ]
            }

ID/node-expansion was involved

READ THIS FIRST:

This is still being discussed by the TRAPI team / Translator
the info below is based on TRAPI 1.4.0-beta3 and the discussions Jackson and I had on this topic

Desired behavior is described with these slides: https://docs.google.com/presentation/d/1OzwQ6yBKOmluvmcOZU7FFf8n7YNrKr21wRz-FHha79s/edit#slide=id.g22b562e9c67_0_163

My "technical implementation" musings

We want a grouping of edges (KP Edges + subclass_of edges for the descendant IDs) that corresponds to 1 QEdge (that involves the expanded QNode) + represents only 1 "MetaPath" (so the KP Edges should probably have the same predicate and their end that doesn't correspond to an expanded QNode should only have 1 ID/entity bound to it)

each group of edges should become an auxiliary-graph (object in the auxiliary_graph section). Ideally, the auxiliary_graph section is a unique set of edge groups
- note the format of the auxiliary-graphs in the migration guide
- there's a key (autogenerated hash of the IDs in alphanumeric-order?)
- there's a value: an object {"edges": [ ... ] }. The edge IDs for the group go into that list
each group of edges should also be used to create 1 new edge. This edge should correspond exactly with the QEdge. It basically traverses (collapses?) the "MetaPath" represented by this group
- should match the direction of the QEdge
  - one end is the original ID, corresponding to the original QNode
  - the other end corresponds to the QNode on the other end of the QEdge
- the predicate....I'm not sure. If the KP Edges have the same predicate, we can just use that.
  - the lowest-common-ancestor of the predicates of the KP edge(s) of the set?
  - the QEdge predicate? This is the most-general we'd want to go...
- it has 1 edge attribute:
  - one where attribute_type_id is biolink:support_graphs and the value is an array of strings. Each string is a key for the auxiliary-graph corresponding to this group of edges
- sources: [ "resource_id": "infores:biothings-explorer", "resource_role": "primary_knowledge_source" ]. Should be okay because this edge only exists on the BTE level...

Then we use these new edges to create the "result":

the node_bindings use the new edges (so the descendant IDs are NOT used here)
1 result.analyses object is made, similar to the first scenario:
its edge_bindings use the new edges
score and resource_id are the same as the first scenario

The text was updated successfully, but these errors were encountered:

colleenXu · 2023-04-11T22:31:03Z

Other references:

colleenXu · 2023-04-12T04:16:13Z

Some added notes:

we can generate subclass_of edges in the canonical direction (descendant ID -(subclass_of)-> original ID)
for edges with supporting graphs, the attribute_type_id should be biolink:support_graphs and value should be an array of strings like [ "aux_graph_key_hash" ]

colleenXu · 2023-04-12T08:42:11Z

Updated a note and the slides for provenance refactor (for "new" edges)

colleenXu · 2023-04-19T17:14:03Z

Notes from today's group meeting:

AS made a poll on implementation schedule for consortium: Poll
Jackson implementation time
aux graphs: week
query_id: ~2-4 days

Context: NCATS will present Translator as a whole in Sept. Not clear how important this "subclassing" issue is for that critical path

We don't have either implemented yet, so…do we have a plan on what to do?

wait on the poll above

we'd prefer NOT to implement query_id, then immediately refactor it for aux-graph work

so…maybe we only do aux-graphs

JC and CX may discuss the query_id method in more detail later

colleenXu · 2023-07-19T23:46:56Z

While it would always be good to have more testing, after several rounds of fixing bugs with nodes / "multiple aux-graphs" / "multiple results" / "multiple edges related to subclassing"... the basic behavior seems fixed.

We also haven't heard of issues related to our implementation of this (TRAPI validation / UI team / Translator Testing efforts).

So...closing this and we can reopen / make a new issue if problems and bugs arise.

colleenXu added the trapi 1.4 label Mar 29, 2023

colleenXu changed the title ~~complex changes to TRAPI result format~~ phase 1: aux-graph/result.analyses refactor for non-creative-mode querying Apr 5, 2023

colleenXu changed the title ~~phase 1: aux-graph/result.analyses refactor for non-creative-mode querying~~ phase 1: aux-graph/result.analyses refactor for basic querying Apr 11, 2023

colleenXu mentioned this issue Apr 11, 2023

overview and management of TRAPI 1.4 features #613

Closed

15 tasks

This was referenced Apr 12, 2023

phase 1: processing TRAPI-1.4 KP sub-query responses (aux-graph/result.analyses refactor) #614

Closed

phase 2: creative-mode and aux-graph/result-analyses refactor #615

Closed

tokebe mentioned this issue May 11, 2023

TRAPI 1.4 Results format biothings/bte_trapi_query_graph_handler#150

Merged

colleenXu mentioned this issue May 31, 2023

many identical aux-graphs in a non-creative-mode response #648

Closed

colleenXu closed this as completed Jul 19, 2023

colleenXu mentioned this issue Aug 10, 2023

Scoring overhaul #634

Closed

colleenXu mentioned this issue Mar 26, 2024

TRAPI 1.5: set_interpretation/MCQ #800

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

phase 1: aux-graph/result.analyses refactor for basic querying #603

phase 1: aux-graph/result.analyses refactor for basic querying #603

colleenXu commented Mar 29, 2023 •

edited

Loading

colleenXu commented Apr 11, 2023

colleenXu commented Apr 12, 2023

colleenXu commented Apr 12, 2023 •

edited

Loading

colleenXu commented Apr 19, 2023

colleenXu commented Jul 19, 2023

phase 1: aux-graph/result.analyses refactor for basic querying #603

phase 1: aux-graph/result.analyses refactor for basic querying #603

Comments

colleenXu commented Mar 29, 2023 • edited Loading

Overview

Relevant scenarios

no ID/node-expansion was involved

ID/node-expansion was involved

colleenXu commented Apr 11, 2023

colleenXu commented Apr 12, 2023

colleenXu commented Apr 12, 2023 • edited Loading

colleenXu commented Apr 19, 2023

colleenXu commented Jul 19, 2023

colleenXu commented Mar 29, 2023 •

edited

Loading

colleenXu commented Apr 12, 2023 •

edited

Loading