prematurely hitting edge threshold? #480

andrewsu · 2022-08-09T16:12:23Z

In the latest update to NCATSTranslator/testing#223, BTE reports 3 results, whereas past runs have returned ~500 results. It looks like BTE is prematurely hitting our edge count threshold. This is an inferred / creative mode query, but I'm not sure this is specific to inferred / creative mode...

Some relevant bits of the log:

{
  "logs": [
    ...
    {
      "timestamp": "2022-08-09T15:38:31.692Z",
      "level": "INFO",
      "message": "Query proceeding in Inferred Mode.",
      "code": null
    },
    {
      "timestamp": "2022-08-09T15:38:32.028Z",
      "level": "INFO",
      "message": "Got 4 inferred query templates.",
      "code": null
    },
    {
      "timestamp": "2022-08-09T15:38:32.617Z",
      "level": "WARNING",
      "message": "[Template-0]: The following APIs were unavailable at the time of execution: Genetics KP, Automat IntAct (trapi v-1.3.0), RTX KG2, Automat Cord19 (trapi v-1.3.0), Automat Chemical normalization (trapi v-1.3.0), Automat Biolink (trapi v-1.3.0), Automat CTD (trapi v-1.3.0), Automat HMDB (trapi v-1.3.0), Automat Covid Phenotypes (trapi v-1.3.0), Automat Uberongraph (trapi v-1.3.0), ICEES KG (trapi v-1.3.0), Automat MyChem (trapi v-1.3.0), Automat molepro-fda (trapi v-1.3.0), Automat Ontological Hierarchy (trapi v-1.3.0), Automat Hetio (trapi v-1.3.0), Automat DrugCentral (trapi v-1.3.0), Automat GTEx (trapi v-1.3.0), Automat Covidkop KG (trapi v-1.3.0), Automat Covidkopkg (ITRB PROD) (trapi v-1.2.0), Automat Textmining KP (trapi v-1.3.0), COHD TRAPI 1.3 - DEVELOPMENT, Automat Foodb (trapi v-1.3.0), Automat Robokop KG (trapi v-1.3.0), Automat Viral Proteome (trapi v-1.3.0), Automat Pharos (trapi v-1.3.0), Automat GWAS Catalog (trapi v-1.3.0), Automat HGNC (trapi v-1.3.0), Automat Gtopdb (trapi v-1.3.0), Automat Panther (trapi v-1.3.0), Automat Human GOA (trapi v-1.3.0), Service Provider TRAPI, SPOKE KP for TRAPI 1.3, MolePro, Knowledge Collaboratory API, Automat MyChem (ITRB PROD) (trapi v-1.2.0)",
      "code": null
    },
    {
      "timestamp": "2022-08-09T15:38:32.737Z",
      "level": "DEBUG",
      "message": "[Template-0]: BTE identified 3 qNodes from your query graph",
      "code": null
    },
    {
      "timestamp": "2022-08-09T15:38:32.737Z",
      "level": "DEBUG",
      "message": "[Template-0]: BTE identified 2 qEdges from your query graph",
      "code": null
    },
    ...
    {
      "timestamp": "2022-08-09T15:38:35.388Z",
      "level": "DEBUG",
      "message": "[Template-0]: Edge manager collected (1745) records!",
      "code": null
    },
    {
      "timestamp": "2022-08-09T15:38:36.112Z",
      "level": "DEBUG",
      "message": "[Template-0]: Successfully scored 3 results, couldn't score 0 results.",
      "code": null
    },
    {
      "timestamp": "2022-08-09T15:38:36.115Z",
      "level": "INFO",
      "message": "[Template-0]: Execution Summary: (5) nodes / (4) edges / (3) results; (0/0) queries (2 cached qEdges) returned results from (0) unique APIs ",
      "code": null
    },
    {
      "timestamp": "2022-08-09T15:38:36.115Z",
      "level": "INFO",
      "message": "[Template-0]: APIs: ",
      "code": null
    },
    {
      "timestamp": "2022-08-09T15:45:57.043Z",
      "level": "INFO",
      "message": "Template 1 exceeds absolute maximum of 1500 (3560). These results will not be included. Skipping remaining 2 templates.",
      "code": null
    },
    {
      "timestamp": "2022-08-09T15:45:57.043Z",
      "level": "INFO",
      "message": "Execution Summary: (5) nodes / (4) edges / (3) results; (0/0) queries (2 cached qEdges) returned results from (0) unique APIs ",
      "code": null
    },
    {
      "timestamp": "2022-08-09T15:45:57.043Z",
      "level": "INFO",
      "message": "APIs: ",
      "code": null
    },
    {
      "timestamp": "2022-08-09T15:45:57.043Z",
      "level": "INFO",
      "message": "Scoring Summary: (3) scored / (0) unscored",
      "code": null
    }
  ]
}

The text was updated successfully, but these errors were encountered:

tokebe · 2022-08-09T16:19:49Z

This would appear to be inferred mode-only behavior. Template-0 returns 3 results, and then Template-1 returns 3560 results, which exceeds the "absolute maximum" (Creative-query limit of 1000 + 500 buffer), which triggers behavior to toss out that template's results and cease operations.

Two things here:

The logs for Template-1 are also tossed out, which may be unhelpful in tracking what exactly happened with it. Would it be worth keeping?
The "absolute maximum" behavior is still relatively naive. I think it's worth discussing if there's a more complex behavior. I'll summarize current behavior in my next comment.

tokebe · 2022-08-09T16:28:49Z

Current Behavior of inferred-mode cutoffs:

For the below explanation, I'm using "accepted results" to refer to results which have been added to the final response, whether by concatenation or result-merging.

If at any point a Template's results, plus the currently accepted results, exceeds 1500, BTE discards the template's results and stops execution, returning only the currently accepted results
If at any point after successfully accepting results from a template, if the new total of accepted results exceeds 1000, BTE keeps all currently accepted results and stops execution, returning the currently accepted results.

Given the scenario in this issue, it may be worth discussing if we want to define something more complex to account for more nuanced situations, such as currently-accepted results being a very low number. The numbers 1000 and 1500 were chosen to not overburden the client. We could try splitting results arbitrarily and then trimming the returned KG, or simply changing those numbers, or some other strategy?

colleenXu · 2022-08-10T19:17:46Z

(copied from #449 (comment) but maybe the discussion is in this separate issue?)

From the 8/10 lab meeting on Translator stuff (added feature):

(Jackson's plate of work) "should be technically feasible to "top off" results and trim resulting KG"

Once cap is reached (>1500 if we added the incoming-template results to the current set of result)...add the incoming template-results and allow result merging
Then cut off the bottom of the results to reach the cap (1500) and trim the KG to match

Also edit the TRAPI logs to make the cap clearer: "the cap of 1500 is reached because the template just run has 3000 results"

colleenXu · 2022-08-10T20:46:53Z

Also I plan to look into this some more

colleenXu · 2022-08-12T04:25:20Z

This is actually due to the Automat registration change, which was addressed by 2802269 and I think @tokebe deployed earlier today (internal lab Slack link)?

What I did:

This is the ARAX link for the Huntington's creative-mode query 7 days ago: https://arax.ncats.io/?r=16560b7c-fb31-4e1e-ab31-32a8f3e67ded

vs This is the link for the same query 3 days ago: https://arax.ncats.io/?r=00642b42-6199-4877-ab92-06bea730fdae

When reviewing the TRAPI logs, BTE is providing only the results from template-0 both times. But in the more recent run, the Automat KP MetaEdges are not found/used leading to less results (3 vs 502).

For both runs, BTE gets > 3500 results from the next template (template-1), which it doesn't incorporate into the results set (which Jackson described above).

colleenXu · 2022-08-12T04:26:38Z

I still think that the feature described above from the lab meeting is still useful, and we'd want it soonish (since code freeze-ish is August 29?)

tokebe · 2022-08-12T18:08:30Z

"top off" behavior implemented here

tokebe · 2022-08-24T15:23:43Z

Marking as done (top-off code implemented in biothings/bte_trapi_query_graph_handler#119 and deployed)

tokebe self-assigned this Aug 10, 2022

tokebe mentioned this issue Aug 12, 2022

Keep and trim results after reaching creative-limit biothings/bte_trapi_query_graph_handler#119

Merged

tokebe closed this as completed Aug 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prematurely hitting edge threshold? #480

prematurely hitting edge threshold? #480

andrewsu commented Aug 9, 2022

tokebe commented Aug 9, 2022

tokebe commented Aug 9, 2022

colleenXu commented Aug 10, 2022

colleenXu commented Aug 10, 2022

colleenXu commented Aug 12, 2022

colleenXu commented Aug 12, 2022

tokebe commented Aug 12, 2022

tokebe commented Aug 24, 2022

prematurely hitting edge threshold? #480

prematurely hitting edge threshold? #480

Comments

andrewsu commented Aug 9, 2022

tokebe commented Aug 9, 2022

tokebe commented Aug 9, 2022

Current Behavior of inferred-mode cutoffs:

colleenXu commented Aug 10, 2022

colleenXu commented Aug 10, 2022

colleenXu commented Aug 12, 2022

colleenXu commented Aug 12, 2022

tokebe commented Aug 12, 2022

tokebe commented Aug 24, 2022