Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement PFOCR-based results annotation in BTE #24

Open
AlexanderPico opened this issue Dec 14, 2020 · 2 comments
Open

Implement PFOCR-based results annotation in BTE #24

AlexanderPico opened this issue Dec 14, 2020 · 2 comments
Assignees
Labels
enhancement New feature or request Group 4
Milestone

Comments

@AlexanderPico
Copy link
Member

Pending the benchmarking study on PFOCR as a resource for enrichment analysis, we will optimize a procedure to calculating score as part of a BTE query, implmenting an API (or Python lib) to perform the calculation.

Background

Andrew Su: "We want to use PFOCR as a bioentity set against which we do enrichment analysis. So, for example, someone does a BTE query that returns 1000 paths -- we could rank those paths based on how enriched the entities in that path are in all the PFOCR sets. As Kevin noted, that would be similar to the Normalized Google Distance functionality (demoed in this notebook). To implement that functionality, we'd need an API that would compute an enrichment score based on multiple inputs. This would be similar in operation to the mrcoc API (which provides the NGD results) which can take two inputs (e.g., https://biothings.ncats.io/mrcoc/query?q=combo:C0008203-C0969679). But since a PFOCR enrichment tool would have to take an arbitrary number of inputs, we'd have to dynamically compute enrichment (rather than precomputing and indexing all pairwise scores as we did in mrcoc)."

Alex: "We are in the middle of a benchmarking study to assess PFOCR content as a resource for enrichment analysis. We are comparing with GO and WikiPathways, and we’ve defined a few benchmarking tests. This study should inform any enrichment analysis use cases. We should have enough results by our mid-Jan meeting to make a clear, detailed plan. An example detail to consider: PFOCR is larger than the Biological Process branch of GO, so standard enrichment algos take a while to run; too long for dynamic queries. So, part of our study is to identify subsets and utilize clustering to make it more efficient."

@AlexanderPico AlexanderPico added this to the Segment 2 milestone Dec 14, 2020
@AlexanderPico AlexanderPico added enhancement New feature or request Group 4 labels Dec 14, 2020
@AlexanderPico
Copy link
Member Author

The benchmarking study is still ongoing. This task is dependent on the results of that study. Will continue in 2022.

@AlexanderPico AlexanderPico changed the title Implement PFOCR-based path ranking in BTE Implement PFOCR-based results annotation in BTE Mar 22, 2022
@AlexanderPico
Copy link
Member Author

Based on our experience so far, including positive feedback from the Translator meeting 7 Feb 2022, let's modify this aim to annotate results rather than influence ranking.

In the same way that scores are calculated and added to results, and then passed on to ARAX and the final UI, we could perform queries on results that include diseases, chemicals and genes, and annotate results with a list of the top n pathway figures. This information would then be available downstream for novel UI/UX ideas, including ilnkouts, visualization, sorting and ranking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Group 4
Projects
None yet
Development

No branches or pull requests

2 participants