-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create new biomedical id resolver package based on SRI node normalizer #249
Create new biomedical id resolver package based on SRI node normalizer #249
Comments
@andrewsu @ericz1803 we might want to wrap https://nodenormalization-sri-dev.renci.org/1.1/docs#/ instead, since this has biolink v2.1 (the link in Andrew's comment uses the non-dev url, which is biolink 1.8...) The Translator slack post about these two services says that they will move this biolink v2.1 service to the non-dev url once everyone has migrated to biolink v2.1 |
Note that the migration to a new SRI-based ID resolver will remove our current ability to add node attributes like drug category or type of gene. We could add this back later as another functionality ("node annotation") |
Going to reopen this issue while we do additional testing and pending final inclusion in BTE. (I also adjusted our automation settings, which I think were a bit too aggressive in closing issues.) |
Note that SRI-based ID resolution does not have resolution for Transcript (like ENSEMBL ENST IDs) or Procedure. Doesn't affect the major queries we do Can see the semantic types supported here: https://nodenormalization-sri-dev.renci.org/1.1/get_semantic_types Can look at semantic types + curie prefixes supported here: https://nodenormalization-sri-dev.renci.org/1.1/get_curie_prefixes |
@ericz1803 @andrewsu There is a new prod instance now, with some differences from the dev we've been testing with (gene-protein conflation for example) I suggest trying this out with a different branch. However, I think some code may not work well anymore. I'm not sure how the code gets the main "semantic type" from the list of semantic types SRI gives it. If it is taking the top element of the list, I think that approach won't work anymore... For example, this link: https://nodenormalization-sri.renci.org/1.1/get_normalized_nodes?curie=UniProtKB%3AP05177&conflate=true has the top element in the list as "biolink:PhysicalEssence"....which is not the specific semantic type. Really what we want is "Gene"... |
It is now deployed on BTE prod and I'm fairly happy with its behavior, described below and in more detail in Slack convos... A. We had to deal with differences between the operation's semantic type that retrieved the output IDs, the semantic type of the output IDs returned by the SRI service (taking the first item of the list of semantic types for an ID as the primary semantic type), and the query-node's semantic type... I believe the solution was to use all 3 for the querying process, and show the semantic type returned by the SRI service in the TRAPI knowledge_graph.nodes... B. The SRI service only needs the CURIE (not an input semantic type). The code was therefore changed to query the ID resolver only once per ID for a given step... I suggest closing this issue, and opening a new one when the prod SRI service for ID resolution is updated to make it easier to figure out the primary semantic type for an ID... |
For many historical reasons, BTE currently uses BioThings APIs for node normalization and ID synonymization (performed by biomedical_id_resolver.js). Given focused effort by the Translator SRI team to build the their node normalizer service https://nodenormalization-sri.renci.org/, let's implement a new module that is based on that service. When complete, we will evaluate whether BTE should completely switch to this new module or implement a hybrid solution.
I expect that this change will address several open issues related to human-readable names, as well as produce better ID resolution with UMLS and NCIT that is blocking #213 (e.g., https://nodenormalization-sri.renci.org/1.1/get_normalized_nodes?curie=MONDO%3A0005359)
The text was updated successfully, but these errors were encountered: