Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combinatorial explosion in the number of answers returned to a query #33

Closed
karafecho opened this issue Oct 25, 2022 · 4 comments
Closed
Assignees
Labels
O&O issue ordering & organizing issue

Comments

@karafecho
Copy link

karafecho commented Oct 25, 2022

This issue is to formally report a known Translator issue, namely, a tendency for answer sets to explode combinatorially with certain types of queries.

For instance, during the October 2022 QotM, Translator team members found that moving from connections between ATP1A3 and chemical entities or diseases yields a reasonable number of results; however, when adding in intermediary genes and pathways, the answer sets explode and become unmanageable.

Example from comment posted by @colleenXu here:

"Not sure how to get from ATP1A3 -> related genes -> ChemicalEntity, Procedure, Treatment in a way that doesn't explode / become unmanageable

Pathways / BiologicalProcessOrActivity...caused explosions since they were linked to pathways that had lots of genes"

@karafecho karafecho added the O&O issue ordering & organizing issue label Oct 25, 2022
@sierra-moxon
Copy link
Member

sierra-moxon commented Jan 27, 2023

From TAQA:

  • this is a huge issue, we probably need to break it down
  • UI is working on fixes here, where you could eliminate a node and reduce the hairball.

from Sharat: four issues to be broken down into

  • filter controls to the user. - UI group
  • merging records - already happening in different levels (ARS, agents, etc.).
  • grouping records - travel up the ontology and give it to me above. - is this a UI issue? Andy: will be working on it for sure (we need other input)
  • scoring records - still have some way of bringing the bit to the top. - O&O
  • user workflows, can we help the user refine their query (e.g. if two ARAs return the same result, etc.)
    • have a formal way to communicate this (all ARAs do it the same way) ( information content < x for example)
    • ask the TRAPI folks for a way to return the "cap"
    • ask the UI to be able to return the "cap" to the user
    • pagination to the architecture group for discussion
  • big "hub" nodes are taken into account in the ARA - this is a tunable parameter.

big picture -> deep dive is important

from Chris B: perhaps another issue here is: this is a known query with many results - sorry. Or, can we filter/sort our way out of this one? - is this doable? Suggest to the user that they tighten this up. Here are the common predicates associated with the answers you're getting back, can we try to help the user write a better query?

from Sharat: agree; this is the best we can do, here are ways to tighten it up.
work on user workflows for two big queries.
Andy is interested in more brainstorming on this; UI needs direction.
In the end, there is one place where the quality of the results is the measurement (either UI/O&O or someplace we could get it).

Andrew: Big "hub" nodes are taken into account in the Normalized Google Distance (which is used in scoring by BTE and ARAX) - this is a tunable parameter.

@sharatisrani
Copy link

sharatisrani commented Feb 14, 2023

For the case of grouping records, O&O has a tracking issue at NCATSTranslator/Ordering-Organizing#15, with a few additional comments.
For the case of scoring records, O&O has a tracking issue at NCATSTranslator/Ordering-Organizing#6

@sharatisrani
Copy link

This is a major issue, but how likely is it to bite us for the September release?

@karafecho
Copy link
Author

I think this is being addressed as described here and recorded here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
O&O issue ordering & organizing issue
Projects
None yet
Development

No branches or pull requests

3 participants