Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate a query that causes BTE to hang #446

Closed
tokebe opened this issue May 12, 2022 · 4 comments
Closed

Investigate a query that causes BTE to hang #446

tokebe opened this issue May 12, 2022 · 4 comments
Labels
bug Something isn't working

Comments

@tokebe
Copy link
Member

tokebe commented May 12, 2022

The following query causes BTE to hang:

{
  "message": {
    "query_graph": {
      "edges": {
        "e0": {
          "subject": "n2",
          "object": "n3",
          "predicates": ["biolink:related_to"]
        },
        "e1": {
          "subject": "n1",
          "object": "n0",
          "predicates": ["biolink:related_to"]
        },
        "e2": {
          "subject": "n1",
          "object": "n2",
          "predicates": ["biolink:related_to"]
        }
      },
      "nodes": {
        "n0": { "is_set": false, "categories": ["biolink:Drug"] },
        "n1": { "is_set": true },
        "n2": { "is_set": true },
        "n3": {
          "ids": ["MONDO:0008753"],
          "is_set": false,
          "categories": ["biolink:Disease"],
          "name": "MONDO:0008753"
        }
      }
    }
  }
}

This is a very general query that causes many many results. Worthy of note is that we'd expect the cap on entities going into an open-ended hop to cause this query to stop early, but for some reason it doesn't appear to.

@tokebe tokebe added the bug Something isn't working label May 12, 2022
@colleenXu
Copy link
Collaborator

colleenXu commented May 12, 2022

Noting that it looks like QNode doesn't have required properties....so theoretically a completely empty object can be given to us....or QNodes that only have the is_set property like what is given here...

@colleenXu
Copy link
Collaborator

colleenXu commented May 13, 2022

Related troubleshooting:

This example query (similar structure / not the same entities) runs in 7 sec when POSTEed to MyDisease only: http://localhost:3000/v1/smartapi/671b45c0301c8624abbd26ae78449ca2/query

It retrieves > 1000 IDs in the first hop, so it cancels execution / returns 501 before it starts the second hop.

example 2-hop Predict
{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "ids": ["NCBIGene:7157", "HGNC:6018"],
		            "categories":["biolink:Gene"],
                    "is_set": false
                },
                "n1": {
                    "categories": ["biolink:Disease"],
                    "is_set": true
                },
                "n2": {
                    "categories": ["biolink:PhenotypicFeature"],
                    "is_set": true
                }
            },
            "edges": {
                "e0": {
                    "subject": "n0",
                    "object": "n1"
                },
                "e1": {
                    "subject": "n1",
                    "object": "n2"
                }
            }
        }
    }
}

@colleenXu
Copy link
Collaborator

colleenXu commented May 13, 2022

My Analysis

It looks like BTE is correctly handling the nodes that don't have categories/ids....which surprises me (treats them as NamedThing). I didn't know we handled that...

I think the problem is that BTE is getting stuck / running long on the 2nd hop (e2, NamedThing -> NamedThing). The first hop (e0, Disease -> NamedThing) returns < 1000 IDs, which is why the 2nd hop is executed.

This is what #363 and the restrict-explosion code #375 was about....how to stop a hop mid-execution if it was taking too long / getting too many records back.


Queries

Here's some queries I did to help me figure this out, that run relatively quickly.

Query 1

POST to MyDisease-only: http://localhost:3000/v1/smartapi/671b45c0301c8624abbd26ae78449ca2/query. This runs in <40 sec.

This is the original query, except the Drug node was changed to SmallMolecule so all QEdges can be executed by MyDisease's smartapiEdges.

BTE correctly stops execution after the second hop, since there are > 1000 IDs going into the last hop. **This shows that the cap #324 code is and will be called after the second hop.

Query 1
{
  "message": {
    "query_graph": {
      "edges": {
        "e0": {
          "subject": "n2",
          "object": "n3",
          "predicates": ["biolink:related_to"]
        },
        "e1": {
          "subject": "n1",
          "object": "n0",
          "predicates": ["biolink:related_to"]
        },
        "e2": {
          "subject": "n1",
          "object": "n2",
          "predicates": ["biolink:related_to"]
        }
      },
      "nodes": {
        "n0": { "is_set": false, "categories": ["biolink:SmallMolecule"] },
        "n1": { "is_set": true },
        "n2": { "is_set": true },
        "n3": {
          "ids": ["MONDO:0008753"],
          "is_set": false,
          "categories": ["biolink:Disease"],
          "name": "MONDO:0008753"
        }
      }
    }
  }
}

Query 2

POST to v1/query endpoint. Took < 1 min 45 sec (one time took 12 min for some reason o_0. probably should stop execution and try again if that happens).

This is the first hop of the original query. It returns 391 nodes for the next hop (I can tell by re-running the query and setting is_set: false for the n2 node. that gives 391 results. this means there's self-edges on the starting disease node).

Query 2
{
  "message": {
    "query_graph": {
      "edges": {
        "e0": {
          "subject": "n2",
          "object": "n3",
          "predicates": ["biolink:related_to"]
        }
      },
      "nodes": {
        "n2": { "is_set": true},
        "n3": {
          "ids": ["MONDO:0008753"],
          "is_set": false,
          "categories": ["biolink:Disease"],
          "name": "MONDO:0008753"
        }
      }
    }
  }
}

@tokebe
Copy link
Member Author

tokebe commented Aug 24, 2022

Closing as the cause is relatively well-understood and implementation/integration of code that would address this issue is now the main topic of discussion (which can be had in relevant PRs)

@tokebe tokebe closed this as completed Aug 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants