Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create new biomedical id resolver package based on SRI node normalizer #249

Closed
andrewsu opened this issue Aug 11, 2021 · 6 comments · Fixed by biothings/biomedical_id_resolver.js#75
Assignees

Comments

@andrewsu
Copy link
Member

For many historical reasons, BTE currently uses BioThings APIs for node normalization and ID synonymization (performed by biomedical_id_resolver.js). Given focused effort by the Translator SRI team to build the their node normalizer service https://nodenormalization-sri.renci.org/, let's implement a new module that is based on that service. When complete, we will evaluate whether BTE should completely switch to this new module or implement a hybrid solution.

I expect that this change will address several open issues related to human-readable names, as well as produce better ID resolution with UMLS and NCIT that is blocking #213 (e.g., https://nodenormalization-sri.renci.org/1.1/get_normalized_nodes?curie=MONDO%3A0005359)

@colleenXu
Copy link
Collaborator

@andrewsu @ericz1803 we might want to wrap https://nodenormalization-sri-dev.renci.org/1.1/docs#/ instead, since this has biolink v2.1 (the link in Andrew's comment uses the non-dev url, which is biolink 1.8...)

The Translator slack post about these two services says that they will move this biolink v2.1 service to the non-dev url once everyone has migrated to biolink v2.1

@colleenXu
Copy link
Collaborator

Note that the migration to a new SRI-based ID resolver will remove our current ability to add node attributes like drug category or type of gene.

We could add this back later as another functionality ("node annotation")

@andrewsu
Copy link
Member Author

Going to reopen this issue while we do additional testing and pending final inclusion in BTE. (I also adjusted our automation settings, which I think were a bit too aggressive in closing issues.)

@andrewsu andrewsu reopened this Aug 27, 2021
@colleenXu
Copy link
Collaborator

Note that SRI-based ID resolution does not have resolution for Transcript (like ENSEMBL ENST IDs) or Procedure. Doesn't affect the major queries we do

Can see the semantic types supported here: https://nodenormalization-sri-dev.renci.org/1.1/get_semantic_types

Can look at semantic types + curie prefixes supported here: https://nodenormalization-sri-dev.renci.org/1.1/get_curie_prefixes

@colleenXu
Copy link
Collaborator

@ericz1803 @andrewsu There is a new prod instance now, with some differences from the dev we've been testing with (gene-protein conflation for example)

I suggest trying this out with a different branch.

However, I think some code may not work well anymore. I'm not sure how the code gets the main "semantic type" from the list of semantic types SRI gives it. If it is taking the top element of the list, I think that approach won't work anymore...

For example, this link: https://nodenormalization-sri.renci.org/1.1/get_normalized_nodes?curie=UniProtKB%3AP05177&conflate=true has the top element in the list as "biolink:PhysicalEssence"....which is not the specific semantic type. Really what we want is "Gene"...

@colleenXu
Copy link
Collaborator

colleenXu commented Sep 11, 2021

It is now deployed on BTE prod and I'm fairly happy with its behavior, described below and in more detail in Slack convos...

A. We had to deal with differences between the operation's semantic type that retrieved the output IDs, the semantic type of the output IDs returned by the SRI service (taking the first item of the list of semantic types for an ID as the primary semantic type), and the query-node's semantic type...

I believe the solution was to use all 3 for the querying process, and show the semantic type returned by the SRI service in the TRAPI knowledge_graph.nodes...

B. The SRI service only needs the CURIE (not an input semantic type). The code was therefore changed to query the ID resolver only once per ID for a given step...


I suggest closing this issue, and opening a new one when the prod SRI service for ID resolution is updated to make it easier to figure out the primary semantic type for an ID...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants