-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create creative mode templates for "What chemicals [qualified] affect (increases/decreases) a given protein/gene?" #532
Comments
Note:
Possibly relevant, from Slack:
|
#534 Is now addressed in add-qualifiers PRs, so integration testing is now possible. |
Noting:
|
@colleenXu Presently, the qualifier-matching only supports one qualifier-set per templateGroup. Would you prefer I change the implementation to support multiple qualifier-sets in a templateGroup for OR matching within a single group (presently, this would be covered by making multiple templateGroups)? Examples: Presently:
Proposed:
|
I think using 1 qualifier-set for matching to templateGroup is fine because that's what we see in this issue (and near-future). In the third bullet point of my post above, I'm saying that for OUR individual templates, I'll need to use multiple qualifier-sets to query for "activity" vs "activity_or_abundance" because we don't do qualifier-hierarchy expansion yet. |
Knowledge resources related to this set of creative-mode questions (My analysis)This is a work-in-progress, so edits will be done over time Resources ready and BTE already uses
Want update and BTE already usesSemmeddb
BindingDB
Want new pending APIMyChem
|
Results for the direct-increases.json template with Gene InputsReasonable number of results
Many results (exploding)
No results
|
I found that 2 templateGroup items worked: one for Chem-increases-Gene and one for Chem-decreases-Gene (input ID can be on the Chem QNode or the Gene QNode). I wrote the templateGroups and the templates @andrewsu + I discussed on Monday and did a bit of testing (see next post). These are in this branch biothings/bte_trapi_query_graph_handler#132 template ideas we decided to write up
|
How I testedCheck out fix_issue_532 branch for query-handler and dev branches for all other modules (including bte-trapi-workspace). For the "user"/"from UI/ARS" query...the main skeleton of the query is this. What varies is (1) which QNode also has an ids field with 1 user-submitted ID and (2) whether the object_direction_qualifier value is increased or decreased (I put X there). skeleton
So here's an example of ChemX (sumatriptan) increases genes
DiscussedEDIT: discussed below + during Wednesday 1/11 meeting. Decisions added to each point After testing biothings/bte_trapi_query_graph_handler#132 (it's branched off dev, and I tested it with all my other branches as dev)....overall it looks close to done. However, I have the following questions / issues:
query with no direction qualifier so all (EDIT: 9) templates will be loaded
Starting query for Chem decreases GeneY (GAPDH)
|
Testing[reverted; this records the testing done 1/10] Increases
Decreases
|
|
|
@colleenXu I've fixed the results merging count and logging. As a result, I can confirm that no results are being dropped. Running sumatriptan, I get 433 results added across the templates, and 244 results in the final response. Logging now shows 189 results that were combined. 433 - 189 = 244, everything checks out. I was actually running in circles for a while because I thought I had to account for the number of results merged into, but this is simply a count of results from the 244 that were combined from multiple templates. I'm not sure what's different between our locals that cause a difference in numbers, (perhaps I haven't updated specs recently?), but all results are accounted for, and no code is dropping results. I've also fixed issue 4 -- a piece of code in call-apis was still using pre-qedge-refactor accessors, breaking the counts. Additionally, I've pushed changes so records are truncated to the cutoff, and changed the creative limit to 500. All set to proceed with testing again 👍 |
Questions I have after my second round of testing:
TRAPI query for `decreased` sumatriptan + screenshot of logs
TRAPI query for `increased` estrogen + screenshot of logsFor
(Points 4-5 from the post have been addressed by the recent changes (yay), and point 2 is still pending / doesn't seem to be an issue) |
Second round of testing[EDITED 1/23-1/26 after I tested more chemicals and genes this time, including all chemicals listed in the Translator posts Increasedstarting with chem
starting with gene
Skipping testing GAPDH (NCBIGene:2597). Previously it ran 3 templates in ~ 1 min. Then the last hop of 4th template gets stuck at its end (ID resolution/record intersecting). I stopped after running > 14 min Decreasedstarting with chem
starting with gene
Skipping testing GAPDH (NCBIGene:2597). Previously it ran 3 templates in ~ 1 min. Then the last hop of 4th template gets stuck at its end (ID resolution/record intersecting). I stopped after running 15 min |
@colleenXu Regarding Point 1, I think your confusion comes from assuming that the merge log is supposed to be per-template. It isn't. Records are merged per-template, but the only merge log is a summary at the end of all merged results across all templates. Please pull and run again, and you'll see the log had been updated to show both the number of results merged, the number they were merged into, and the actual result count decrease. If you add up the results for each template, and then subtract the actual result count decrease, the math checks out (I spent a considerable amount of time verifying this last round...). We could log per-template as well, but I don't particularly see the need to do so? Running GAPDH on my local takes 19.9 minutes, which I agree is a little too much. Some of my optimizations may help with this, but I do think there's a case to be made for further decreasing the max records allowed. |
After a meeting with @colleenXu the problem was confirmed. I've investigated the issue and found the reason: Multiple results from the same template can end up being merged if that template is a multi-hop and the results connect the same subject and object via different intermediate nodes. IIRC, this was an intended behavior to keep results relatively well-organized. Such results would not be merged in non-creative execution (which another question worth asking somewhere else). As a side effect of this, merging can show more results merged than what one might expect: instead of the maximum number merged in a step being equal to the smallest of either the current result set or the current template, it is actually the sum of those two. @colleenXu I'm working on a fix to change the logging behavior to explicitly point this out when it occurs, and will push that change to this branch (and main) when it's done. |
@colleenXu I've pushed multiple creative mode logging fixes and improvements and the math appears to check out now. Please run a couple tests and let me know if the logging seems better. |
Feedback:
Otherwise, new logs look good! |
I didn’t miss your update. Regardless of Fix for the API end summary incoming. |
Pushed the fix; yet another fun case of a change somehow not making it into a commit while silently remaining on my local, making me think I'm losing my mind lol. |
Sorry for the late reply. I reran a bunch of queries and I think things look good! I like the new logs. Perhaps we're ready to make a request for ITRB CI? |
Note that templates were changed, replacing |
Deployed to prod 🚀 |
Noting here just in case: Old template ideas that weren't implemented (intended effect is "downregulates"):
|
Translator consortium target for implementation in Feb 2023. Exact TRAPI query templates will be created soon per Architecture meeting 2022-12-06...
EDIT Implementation dates:
The text was updated successfully, but these errors were encountered: