-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scoring overhaul #634
Comments
Another example worth looking into is https://ui.test.transltr.io/results?l=Postaxial%20Acrofacial%20Dysostosis&t=0&q=72fedb01-7ee2-4c7d-8a78-2cab471d4df8, described in NCATSTranslator/Feedback#349. This is a visual summary of the result: From https://arax.ci.transltr.io/api/arax/v1.4/response/3709473e-e00a-40a7-bb4b-37d75ec57c29, this is the second-highest scoring answer with a normalized score of 99.72677595628416. Since "failure to thrive" and "growth delay" are relatively generic phenotypes, I would have expected the paths leading to those nodes to be relatively low scoring. |
Basic principles laid out in 2023-06-21 team meeting: Also need to think about benchmarking. @tokebe please check out https://github.com/TranslatorSRI/Benchmarks as a possible framework for running benchmarks... (NOTE: we may also need to think about excluding certain resources during the benchmark runs to avoid trivial one-hop retrievals of the right answer...) |
Regarding benchmarking, I gave the Benchmarks tool a try and pretty easily came up with this result table (based on this template and this data file):
Looks like it ran 22 TRAPI queries to generate those results. Overall, it seems like a reasonable system on which to base our scoring optimization efforts... |
Another factor that we can use is scoring is opposing evidence, as described in this comment:
|
@ericz1803 Assigning this one to you -- general idea is:
There will be some complications around creative mode, as its score combination takes place after primary scoring. You can generally consider a result being merged as an instance of principle 3 (rather than the present implementation of taking the max score of the merged results). Please let me know if you have any questions on implementation details/issue expectations/etc. |
Just noting that #515 seems relevant, I noticed it while diving through old issues. Not sure if it's something we want to do for this overhaul. |
#515 will likely have some independent challenges associated with it related to identifier mapping, so let's keep it separate from this issue. |
@ericz1803 Just checking, in the case of results merging in creative mode, have any changes been made to this behavior? In current live code, the highest score is taken. |
(brought up in discussion with Jackson and also discussed here (lab Slack link)) @ericz1803 Note that the scoring-related logs may need updating. Perhaps we could remove them or change them so they record something useful regarding how results were scored Examples:
|
@tokebe I didn't make any changes to the way results are merged in creative mode. @colleenXu I'll take a look at the logs. All results should now be getting scored so I'll try to give it some more useful info |
We may want to apply some sort of transformation similar to situation 2 higher score for those cases, though that might take some undue effort and could be something we can address in a future iteration. |
@tokebe you're right, I implemented score addition to the creative queries and it seemed to make a pretty significant positive effect on the results and I added it to the PR. |
From discussion with Jackson right now:
|
Leaving this here in case anyone needs to run benchmarks on BTE in the future: Also, all of the scoring optimization was based on the MostPrescribed_2019 and the DrugCentral_creative datasets, I did try the other datasets (CPIC, DrugMechDB, and IRDiRC), but BTE either came up with no results or not the target results for all the queries. |
Great point, @ericz1803 . The change in the location of the score is actually something that was recently made Consortium-wide. So if you've already updated your local instance to use the new location, please create a PR on the benchmarks repo with that change. |
It's related to TRAPI 1.4:
I think for our tool, in all cases we have only 1 analysis object? But for other tools (ARAs / KPs) they may have multiple analyses objects and scores...so I dunno if that'll cause issues for them trying to benchmark with our PR / changes... |
Scoring could use some changes:
This issue requires further investigation and discussion. At some point I'll work on slides to give a few case examples. It should not be put under active development yet.
The text was updated successfully, but these errors were encountered: