Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

phase 1: aux-graph/result.analyses refactor for basic querying #603

Closed
colleenXu opened this issue Mar 29, 2023 · 5 comments
Closed

phase 1: aux-graph/result.analyses refactor for basic querying #603

colleenXu opened this issue Mar 29, 2023 · 5 comments

Comments

@colleenXu
Copy link
Collaborator

colleenXu commented Mar 29, 2023

READ FIRST:

  • this issue will be changed as we discuss it, get clarification from the TRAPI team, or get changes from the TRAPI team

Overview

TRAPI 1.4 includes major refactoring of the Response structure. This issue covers the desired behavior for basic querying (non-creative-mode).

  • timing: ASAP
  • priority: high
General requirements
  • query_graph cannot be modified, result.node_bindings must only include QNodes from query_graph. From this line in the migration guide
  • removes result.edge_bindings and result.score
  • adds auxiliary_graphs (are basically groups of edges) and result.analyses
    • Auxiliary graphs are graphs that provide support or evidence for results or edges (creative-mode / maybe ID-expansion). From this line in the migration guide
    • If aux-graphs are referenced on the edges, they don't need to be referenced in the results as well. From this line in the migration guide
    • result.analyses.edge_bindings must only include QEdges from query_graph. From this line in the migration guide
TRAPI spec references

Relevant scenarios

There are 2 scenarios for a result, depending on whether its Result Nodes match exactly with the original QNodes (aka no QNode ID/node-expansion involved) or not.

I use this query in my examples below

Two-hop Predict similar to a creative-mode treats template. Takes me about ~2 min to run in my local.

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "ids":["MONDO:0005377"],
                    "categories":["biolink:DiseaseOrPhenotypicFeature"],
                    "name": "noonan"
                },
                "n1": {
                    "categories":["biolink:Gene"],
                    "is_set": true
                },
                "n2": {
                    "categories":["biolink:ChemicalEntity"]
               }
            },
            "edges": {
                "eA": {
                    "subject": "n0",
                    "object": "n1",
                    "predicates": ["biolink:caused_by"]
                },
                "eB": {
                    "subject": "n1",
                    "object": "n2",
                    "predicates": ["biolink:regulated_by", "biolink:affected_by"]
                }
            }
        }
    }
}
My "technical implementation" musings

Before results-assembly, ID/node-expansion behavior changes?

  • don't mutate the "query_graph" in the output TRAPI response (fine to do whatever we need in underlying query execution)
  • "ID" in all the text below refers to entity-IDs/Curies unless otherwise specified
  • During query-execution, when an original QNode's IDs are expanded (descendant IDs are retrieved)...keep track of which original ID(s) each descendant ID comes from
    • in the example query below, there was only 1 original ID....that will expand to a set of 48 IDs (itself and 47 descendants)
    • it may be possible for one descendant ID to correspond to multiple original IDs (the original IDs are ancestors/descendants of each other)....can we handle this?
  • at some point, create subclass_of edges for each descendant ID that's used in KP Edges...
    • subject: descendant ID
    • object: original ID
    • predicate: biolink:subclass_of
    • sources: [ "resource_id": "infores:biothings-explorer", "resource_role": "primary_knowledge_source" ]
      • ideally, the primary element would be the relevant ontology infores....and there'd be a separate element to say this edge comes from BTE....but let's not worry about that right now

Other musings:

  • basically replace result.score -> result.analyses.score
  • BTE will not use result.analyses.support_graphs because we don't add "supporting nodes and edges" for our scoring
  • BTE will reference auxiliary-graphs in "collapsed" Edges that it creates
  • probably all edges in knowledge_graph.edges should be referenced in either a result.analyses object's edge_bindings or an auxiliary_graphs object's edges...

no ID/node-expansion was involved

In this situation, for the result.analyses array, make an object where:

  • edge_bindings property holds the old result.edge_bindings content
  • score property holds the old result.score content
  • resource_id property is infores:biothings-explorer
a current result

notice the node_bindings: the n0 ID is the same as the bound Node ID (so no ID-expansion was done)

            {
                "node_bindings": {
                    "n0": [
                        {
                            "id": "MONDO:0005377"
                        }
                    ],
                    "n1": [
                        {
                            "id": "NCBIGene:3315"
                        }
                    ],
                    "n2": [
                        {
                            "id": "PUBCHEM.COMPOUND:23667548"
                        }
                    ]
                },
                "edge_bindings": {
                    "eA": [
                        {
                            "id": "54d9ed32bec4d12369592709e20c997f"
                        }
                    ],
                    "eB": [
                        {
                            "id": "51a4d02f0097f1ddb6af1d96631e1177"
                        },
                        {
                            "id": "50c2279ba69bc6eb9474133c71e89a6b"
                        }
                    ]
                },
                "score": 2.2933946031476955
            }
new result
            {
                "node_bindings": {
                    "n0": [
                        {
                            "id": "MONDO:0005377"
                        }
                    ],
                    "n1": [
                        {
                            "id": "NCBIGene:3315"
                        }
                    ],
                    "n2": [
                        {
                            "id": "PUBCHEM.COMPOUND:23667548"
                        }
                    ]
                },
                "analyses": [
                    {
                        "resource_id": "infores:biothings-explorer",
                        "edge_bindings": {
                            "eA": [
                                {
                                    "id": "54d9ed32bec4d12369592709e20c997f"
                                }
                            ],
                            "eB": [
                                {
                                    "id": "51a4d02f0097f1ddb6af1d96631e1177"
                                },
                                {
                                    "id": "50c2279ba69bc6eb9474133c71e89a6b"
                                }
                            ]
                        }
                        "score": 2.2933946031476955
                    }
                ]
            }

ID/node-expansion was involved

READ THIS FIRST:

  • This is still being discussed by the TRAPI team / Translator
  • the info below is based on TRAPI 1.4.0-beta3 and the discussions Jackson and I had on this topic

Desired behavior is described with these slides: https://docs.google.com/presentation/d/1OzwQ6yBKOmluvmcOZU7FFf8n7YNrKr21wRz-FHha79s/edit#slide=id.g22b562e9c67_0_163

My "technical implementation" musings

We want a grouping of edges (KP Edges + subclass_of edges for the descendant IDs) that corresponds to 1 QEdge (that involves the expanded QNode) + represents only 1 "MetaPath" (so the KP Edges should probably have the same predicate and their end that doesn't correspond to an expanded QNode should only have 1 ID/entity bound to it)

  • each group of edges should become an auxiliary-graph (object in the auxiliary_graph section). Ideally, the auxiliary_graph section is a unique set of edge groups
    • note the format of the auxiliary-graphs in the migration guide
    • there's a key (autogenerated hash of the IDs in alphanumeric-order?)
    • there's a value: an object {"edges": [ ... ] }. The edge IDs for the group go into that list
  • each group of edges should also be used to create 1 new edge. This edge should correspond exactly with the QEdge. It basically traverses (collapses?) the "MetaPath" represented by this group
    • should match the direction of the QEdge
      • one end is the original ID, corresponding to the original QNode
      • the other end corresponds to the QNode on the other end of the QEdge
    • the predicate....I'm not sure. If the KP Edges have the same predicate, we can just use that.
      • the lowest-common-ancestor of the predicates of the KP edge(s) of the set?
      • the QEdge predicate? This is the most-general we'd want to go...
    • it has 1 edge attribute:
      • one where attribute_type_id is biolink:support_graphs and the value is an array of strings. Each string is a key for the auxiliary-graph corresponding to this group of edges
    • sources: [ "resource_id": "infores:biothings-explorer", "resource_role": "primary_knowledge_source" ]. Should be okay because this edge only exists on the BTE level...

Then we use these new edges to create the "result":

  • the node_bindings use the new edges (so the descendant IDs are NOT used here)
  • 1 result.analyses object is made, similar to the first scenario:
  • its edge_bindings use the new edges
  • score and resource_id are the same as the first scenario
@colleenXu colleenXu changed the title complex changes to TRAPI result format phase 1: aux-graph/result.analyses refactor for non-creative-mode querying Apr 5, 2023
@colleenXu colleenXu changed the title phase 1: aux-graph/result.analyses refactor for non-creative-mode querying phase 1: aux-graph/result.analyses refactor for basic querying Apr 11, 2023
@colleenXu
Copy link
Collaborator Author

Other references:

@colleenXu
Copy link
Collaborator Author

Some added notes:

  • we can generate subclass_of edges in the canonical direction (descendant ID -(subclass_of)-> original ID)
  • for edges with supporting graphs, the attribute_type_id should be biolink:support_graphs and value should be an array of strings like [ "aux_graph_key_hash" ]

@colleenXu
Copy link
Collaborator Author

colleenXu commented Apr 12, 2023

Updated a note and the slides for provenance refactor (for "new" edges)

@colleenXu
Copy link
Collaborator Author

Notes from today's group meeting:

AS made a poll on implementation schedule for consortium: Poll
Jackson implementation time
aux graphs: week
query_id: ~2-4 days

Context: NCATS will present Translator as a whole in Sept. Not clear how important this "subclassing" issue is for that critical path

We don't have either implemented yet, so…do we have a plan on what to do?

  • wait on the poll above
  • we'd prefer NOT to implement query_id, then immediately refactor it for aux-graph work
  • so…maybe we only do aux-graphs

JC and CX may discuss the query_id method in more detail later

@colleenXu
Copy link
Collaborator Author

While it would always be good to have more testing, after several rounds of fixing bugs with nodes / "multiple aux-graphs" / "multiple results" / "multiple edges related to subclassing"... the basic behavior seems fixed.

We also haven't heard of issues related to our implementation of this (TRAPI validation / UI team / Translator Testing efforts).

So...closing this and we can reopen / make a new issue if problems and bugs arise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant