-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
include ICEES as a BTE resource #213
include ICEES as a BTE resource #213
Comments
Sure thing, Andrew. ICEES wiki page |
From @karafecho (thank you Kara!) via NCATSTranslator/minihackathons#68 (comment)
|
Here is an example query that takes advantage of the ICEES DILI Instance:
In theory, adding the API names above to
But Chunlei made these changes on dev (https://dev.api.bte.ncats.io/v1/query/) and the query above still returns zero results. However, POSTing the same query above to the ICEES DILI Instance directly at (https://icees.renci.org:16341/query?reasoner=true&verbose=false) gives 62 results:
In theory, I think, BTE should just proxy TRAPI queries to APIs in the metakg that accept TRAPI, but that doesn't seem to be working here. |
Also relevant from @newgene: |
Looping in @xu-hao... |
Note to @ericz1803: Besides getting the "output IDs" from the ICEES response... ideally we would also correctly parse/ingest the ICEES edge attributes so BTE keeps them in its edge attributes. This is related to the #209 (and situation A of the provenance ticket #208 (comment)) |
Summary: The current problem:
Test these current changes: |
@karafecho Could you provide example TRAPI queries for ICEES COVID19? Also, I believe the query below works as an example TRAPI query for ICEES Asthma, but please confirm or let us know of another query that you prefer.
|
@colleenXu : Thanks for working on this. The query above looks good to me, although depending on the question, you might want to look at biolink:DiseaseOrPhenotypicFeature or biolink:ChemicalEntity or biolink:ChemicalExposure instead of (or in addition to) biolink:Disease. The structure of the above query should also work for the most recent ICEES COVID instance, which exposes data on patients confirmed to be COVID+ (cases) or COVID- (matched controls). That instance can be found here: https://covid.icees.renci.org/jan2021/apidocs#/. The input CURIE should work, but you might want to use something a bit more relevant to COVID. Here are a few suggestions: Remdesivir Prednisone InvasiveVentilation Hope this helps. |
@karafecho Thank you for your feedback. I have several questions related to ICEES COVID19 API:
|
@colleenXu : Thanks for your questions. WRT (1), the endpoint for the newest instance (JAN2021 data) is https://covid.icees.renci.org/jan2021/query. I didn't realize that this wasn't registered with SmartAPI. I'll fix that. One other issue that I should alert you to is that many of the feature variables that we intended to expose are either not being exposed or are showing empty cells. We are fixing this with the next release. We probably should have held off on the initial deployment, but we were racing to meet a milestone deadline, so...(you know how it goes)... Hope this helps. |
@karafecho I think the server urls in the file https://covid.icees.renci.org/jan2021/openapi.json are incorrect (this was used in the new SmartAPI registration of ICEES COVID19 API). Rather than seeing 1 url with https://covid.icees.renci.org/jan2021/, I see "/jan2021" and "https://covid.icees.renci.org" as two separate urls. I suggest modifying the file and refreshing/updating the SmartAPI registration. It may also be a good idea to remove the older SmartAPI registration. |
I'm seeing that, too. I think this might be related to this issue. Also, I think our SmartAPI registrations should be updated automatically. |
@karafecho Thank you for your help. We have a few questions about the ICEES DILI and ICEES COVID APIs... ICEES DILI:
ICEES COVID:
|
@colleenXu: Thanks, again, for your comments. Greatly appreciated! WRT ICEES DILI, we recently refactored our API config files. As part of this effort, we replaced all illegal prefixes with legal ones. However, we have not yet deployed the API, so the current instance contains some illegal prefixes. Your second point regarding the weird identifiers relates to the approach that we used to map identifiers, which in some cases, pulled in substrings for variables. For instance, the MESH ID for "Gasex" was picked up because it contains the substring "sex". Note that the MESH ID for "Sex" should be mapped to "SexDILI", too. This is also something that should be fixed with the next deployment (ETA end of month). Your third point regarding self-edges is a function of the KG, which relates all entities to each other via a Chi Square-derived P value. So, yes, it is a valid association. WRT ICEES COVID, if I'm understanding you correctly, then this issue relates to the one I commented on previously. Specifically, due to a variety of reasons mostly out of our control, we were unable to capture the variable that flags patients as confirmed COVID+ or confirmed COVID-. We recognize that this is a major issue, but we moved forward with the deployment to meet a milestone deadline. (FWIW, the cohort is designed as a 1:1 case:control study, with roughly 100,000 patients each in the JAN2021 instance.) We have since corrected this issue, which will be reflected in the next deployment (ETA end of month). In the meantime, you might want to focus on the other variables I suggested. Hope this helps... |
@karafecho Thank you, I think this clarifies things a lot! I'm noting a few other data-related issues below that are non-critical - feel free to look when you have the time, and chime in if you have thoughts/news on them.
|
No problem, Colleen! WRT your first and fourth bullets, we've used Athena to pull in identifier systems that are not currently supported by SRI. The idea is to see if there's enough support to justify an expansion of SRI services. WRT your second bullet, these fixes have been made and will be reflected in the next API deployments. WRT your third bullets, we fixed the typo. Thanks for pointing that out! WRT your fifth bullet, you are correct that we mapped to 'accurate but also commonly used' semantic types. :-) |
@colleenXu : I am testing direct three-hop ARA queries as part of Workflow B, the idea being that this will facilitate (1) debugging and (2) answer evaluation for scientific impact while we wait for ARA deployment of TRAPI 1.2, which will support asynchronous queries and (hopefully) resolve the timeout issues that are preventing us from running the queries via the ARS. I don't think any of the issues above should block a direct query of BTE using the TRAPI message below, so I'm hoping that you can test this with both biolink:correlated_with and biolink:has_real_world_evidence_of_association_with. Would this be feasible? Note that I am also using these tests to compare output across ARAs, in terms of ChemicalEntity(ies) that more than one ARA suggests. Thus far, I have results from both queries for ARAX and for one query from ARAGORN. The results are looking pretty interesting thus far, so I'm pretty excited.
|
note that after all PRs from Eric are merged, ICEES Asthma should also work. I've noticed this by running the sample query above for ICEES Asthma in Eric's workspace (fork). We'll wait until after ICEES COVID-19 updates to know what to do there (ingest one or both APIs? Try out queries / integration into BTE). |
Thanks, Colleen. To clarify, the Workflow B query(ies) that I suggested are expected to run after Eric merges the PRs, correct? The queries should return results from ICEES DILI, not ICEES asthma. |
Note to BTE team:Once the new code is merged (the SRI-ID-resolver + new query handling + results assembly)... Query that should have "icees:dili" in the response (PhenotypicFeatures linked to DILI):
Query that should have "icees:asthma" in the response (Diseases linked to Asthma):
|
@karafecho, here is some feedback on the earlier comment about the Workflow/demo query:
Notes for the BTE team:
For the BTE team, this is what I have tried:
This query is modified to be shorter (no chemicals at the end, Disease rather than DiseaseOrPhenotypicFeature). It runs on my local instance in ~4 min and returns something ~7 MB in size. Adding a ChemicalEntity query to the end of this seems like it will definitely take too long to run / be too large in size...
This is another modified query (also shorter). It takes my local instance ~6 min 24 seconds to run and the response is (!!!) ~38.6 MB in size.
|
@karafecho Just to check, the two ICEES COVID SmartAPI registrations look like they are now going to the same API (server url). Previously, we discussed how they included data from different times/cohorts though, and were separate APIs... |
Thanks, @colleenXu . Greatly appreciate the input! You are correct that the data for each cohort/time period should point to a distinct URL. I suspect the issue relates to the way we've restructured the directories. Will post an issue. |
I suggest closing this issue and opening separate issues for demo queries or ingesting ICEES COVID, when needed. As noted above, there are some issues with the demo query that Kara asked us to run. I've confirmed that a query like this returns edges from ICEES DILI that preserve the edge attributes. BTE is also correctly using the SRI-based ID resolver to find different semantic types for the IDs returned (compared to what ICEES DILI provides). One example of that is MESH:C040391, which the ID resolver finds is malignancy-associated nucleolar antigen, human (a ChemicaEntity) -- not a Disease.
|
ICEES has dropped the requirement for an API key, and I believe the ICEES KP has expanded beyond the limited asthma data in previous iterations. We should look at incorporating it as a knowledge source for BTE.
@karafecho, can you give us an example query and result that we should be able to get?
Might also be related to #153, which would allow users to override default inclusion/exclusion settings for APIs
The text was updated successfully, but these errors were encountered: