Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prematurely hitting edge threshold? #480

Closed
andrewsu opened this issue Aug 9, 2022 · 8 comments
Closed

prematurely hitting edge threshold? #480

andrewsu opened this issue Aug 9, 2022 · 8 comments
Assignees

Comments

@andrewsu
Copy link
Member

andrewsu commented Aug 9, 2022

In the latest update to NCATSTranslator/testing#223, BTE reports 3 results, whereas past runs have returned ~500 results. It looks like BTE is prematurely hitting our edge count threshold. This is an inferred / creative mode query, but I'm not sure this is specific to inferred / creative mode...

Some relevant bits of the log:

{
  "logs": [
    ...
    {
      "timestamp": "2022-08-09T15:38:31.692Z",
      "level": "INFO",
      "message": "Query proceeding in Inferred Mode.",
      "code": null
    },
    {
      "timestamp": "2022-08-09T15:38:32.028Z",
      "level": "INFO",
      "message": "Got 4 inferred query templates.",
      "code": null
    },
    {
      "timestamp": "2022-08-09T15:38:32.617Z",
      "level": "WARNING",
      "message": "[Template-0]: The following APIs were unavailable at the time of execution: Genetics KP, Automat IntAct (trapi v-1.3.0), RTX KG2, Automat Cord19 (trapi v-1.3.0), Automat Chemical normalization (trapi v-1.3.0), Automat Biolink (trapi v-1.3.0), Automat CTD (trapi v-1.3.0), Automat HMDB (trapi v-1.3.0), Automat Covid Phenotypes (trapi v-1.3.0), Automat Uberongraph (trapi v-1.3.0), ICEES KG (trapi v-1.3.0), Automat MyChem (trapi v-1.3.0), Automat molepro-fda (trapi v-1.3.0), Automat Ontological Hierarchy (trapi v-1.3.0), Automat Hetio (trapi v-1.3.0), Automat DrugCentral (trapi v-1.3.0), Automat GTEx (trapi v-1.3.0), Automat Covidkop KG (trapi v-1.3.0), Automat Covidkopkg (ITRB PROD) (trapi v-1.2.0), Automat Textmining KP (trapi v-1.3.0), COHD TRAPI 1.3 - DEVELOPMENT, Automat Foodb (trapi v-1.3.0), Automat Robokop KG (trapi v-1.3.0), Automat Viral Proteome (trapi v-1.3.0), Automat Pharos (trapi v-1.3.0), Automat GWAS Catalog (trapi v-1.3.0), Automat HGNC (trapi v-1.3.0), Automat Gtopdb (trapi v-1.3.0), Automat Panther (trapi v-1.3.0), Automat Human GOA (trapi v-1.3.0), Service Provider TRAPI, SPOKE KP for TRAPI 1.3, MolePro, Knowledge Collaboratory API, Automat MyChem (ITRB PROD) (trapi v-1.2.0)",
      "code": null
    },
    {
      "timestamp": "2022-08-09T15:38:32.737Z",
      "level": "DEBUG",
      "message": "[Template-0]: BTE identified 3 qNodes from your query graph",
      "code": null
    },
    {
      "timestamp": "2022-08-09T15:38:32.737Z",
      "level": "DEBUG",
      "message": "[Template-0]: BTE identified 2 qEdges from your query graph",
      "code": null
    },
    ...
    {
      "timestamp": "2022-08-09T15:38:35.388Z",
      "level": "DEBUG",
      "message": "[Template-0]: Edge manager collected (1745) records!",
      "code": null
    },
    {
      "timestamp": "2022-08-09T15:38:36.112Z",
      "level": "DEBUG",
      "message": "[Template-0]: Successfully scored 3 results, couldn't score 0 results.",
      "code": null
    },
    {
      "timestamp": "2022-08-09T15:38:36.115Z",
      "level": "INFO",
      "message": "[Template-0]: Execution Summary: (5) nodes / (4) edges / (3) results; (0/0) queries (2 cached qEdges) returned results from (0) unique APIs ",
      "code": null
    },
    {
      "timestamp": "2022-08-09T15:38:36.115Z",
      "level": "INFO",
      "message": "[Template-0]: APIs: ",
      "code": null
    },
    {
      "timestamp": "2022-08-09T15:45:57.043Z",
      "level": "INFO",
      "message": "Template 1 exceeds absolute maximum of 1500 (3560). These results will not be included. Skipping remaining 2 templates.",
      "code": null
    },
    {
      "timestamp": "2022-08-09T15:45:57.043Z",
      "level": "INFO",
      "message": "Execution Summary: (5) nodes / (4) edges / (3) results; (0/0) queries (2 cached qEdges) returned results from (0) unique APIs ",
      "code": null
    },
    {
      "timestamp": "2022-08-09T15:45:57.043Z",
      "level": "INFO",
      "message": "APIs: ",
      "code": null
    },
    {
      "timestamp": "2022-08-09T15:45:57.043Z",
      "level": "INFO",
      "message": "Scoring Summary: (3) scored / (0) unscored",
      "code": null
    }
  ]
}
@tokebe
Copy link
Member

tokebe commented Aug 9, 2022

This would appear to be inferred mode-only behavior. Template-0 returns 3 results, and then Template-1 returns 3560 results, which exceeds the "absolute maximum" (Creative-query limit of 1000 + 500 buffer), which triggers behavior to toss out that template's results and cease operations.

Two things here:

  1. The logs for Template-1 are also tossed out, which may be unhelpful in tracking what exactly happened with it. Would it be worth keeping?
  2. The "absolute maximum" behavior is still relatively naive. I think it's worth discussing if there's a more complex behavior. I'll summarize current behavior in my next comment.

@tokebe
Copy link
Member

tokebe commented Aug 9, 2022

Current Behavior of inferred-mode cutoffs:

For the below explanation, I'm using "accepted results" to refer to results which have been added to the final response, whether by concatenation or result-merging.

  1. If at any point a Template's results, plus the currently accepted results, exceeds 1500, BTE discards the template's results and stops execution, returning only the currently accepted results
  2. If at any point after successfully accepting results from a template, if the new total of accepted results exceeds 1000, BTE keeps all currently accepted results and stops execution, returning the currently accepted results.

Given the scenario in this issue, it may be worth discussing if we want to define something more complex to account for more nuanced situations, such as currently-accepted results being a very low number. The numbers 1000 and 1500 were chosen to not overburden the client. We could try splitting results arbitrarily and then trimming the returned KG, or simply changing those numbers, or some other strategy?

@tokebe tokebe self-assigned this Aug 10, 2022
@colleenXu
Copy link
Collaborator

(copied from #449 (comment) but maybe the discussion is in this separate issue?)

From the 8/10 lab meeting on Translator stuff (added feature):

(Jackson's plate of work) "should be technically feasible to "top off" results and trim resulting KG"

  • Once cap is reached (>1500 if we added the incoming-template results to the current set of result)...add the incoming template-results and allow result merging
  • Then cut off the bottom of the results to reach the cap (1500) and trim the KG to match

Also edit the TRAPI logs to make the cap clearer: "the cap of 1500 is reached because the template just run has 3000 results"

@colleenXu
Copy link
Collaborator

Also I plan to look into this some more

@colleenXu
Copy link
Collaborator

This is actually due to the Automat registration change, which was addressed by 2802269 and I think @tokebe deployed earlier today (internal lab Slack link)?


What I did:

This is the ARAX link for the Huntington's creative-mode query 7 days ago: https://arax.ncats.io/?r=16560b7c-fb31-4e1e-ab31-32a8f3e67ded

vs This is the link for the same query 3 days ago: https://arax.ncats.io/?r=00642b42-6199-4877-ab92-06bea730fdae

When reviewing the TRAPI logs, BTE is providing only the results from template-0 both times. But in the more recent run, the Automat KP MetaEdges are not found/used leading to less results (3 vs 502).

For both runs, BTE gets > 3500 results from the next template (template-1), which it doesn't incorporate into the results set (which Jackson described above).

@colleenXu
Copy link
Collaborator

I still think that the feature described above from the lab meeting is still useful, and we'd want it soonish (since code freeze-ish is August 29?)

@tokebe
Copy link
Member

tokebe commented Aug 12, 2022

"top off" behavior implemented here

@tokebe
Copy link
Member

tokebe commented Aug 24, 2022

Marking as done (top-off code implemented in biothings/bte_trapi_query_graph_handler#119 and deployed)

@tokebe tokebe closed this as completed Aug 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants