Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement FDA approval constraint (Aug. 20) #1599

Closed
amykglen opened this issue Aug 4, 2021 · 15 comments
Closed

Implement FDA approval constraint (Aug. 20) #1599

amykglen opened this issue Aug 4, 2021 · 15 comments
Assignees
Labels

Comments

@amykglen
Copy link
Member

amykglen commented Aug 4, 2021

creating this issue to track work for https://togithub.com/NCATSTranslator/minihackathons/issues/164

we're going with a "quick and dirty" approach to start: extract this info from KG2, create a tiny database, use that in Expand

@edeutsch
Copy link
Collaborator

This came up in the Translator mini-hackathon today:
https://togithub.com/NCATSTranslator/minihackathons/issues/208

Any updates? (no is fine. Just trying to annotate visibility)

@amykglen
Copy link
Member Author

nope - planning to start working on it tomorrow.

@amykglen
Copy link
Member Author

amykglen commented Aug 15, 2021

ok - this is working in its preliminary form in master (though mostly only when setting force_local = True at the moment, since relevant code hasn't yet been rolled out to the KG2 API).

right now both the KG2 KP and ARAX respect this constraint; ARAX uses the KG2 approved drugs list to filter other KPs' answers. so even if some KPs are not respecting the constraint, ARAX still does. (downside of this approach is that our approved drugs list might not be all-encompassing, so it's possible a KP is actually respecting the constraint but we filter their answer down even further. would really be ideal if we had some way of knowing whether or not a KP respects a particular constraint.)

I have some uncertainty about how to interpret the FDA-approval constraint, which apparently looks like this (per the A.9_EGFR_advanced workflow query):

                    "constraints": [
                        {
                            "id": "biolink:highest_FDA_approval_status",
                            "name": "highest FDA approval status",
                            "operator": "==",
                            "value": "regular approval"
                        }
                    ]

maybe a good thing to discuss briefly at a mini-hackathon.

@edeutsch
Copy link
Collaborator

This is terrific! I agree to discuss on Wednesday. I added it to the AHM agenda where I think we should start. Maybe continue at the hackathon.

@amykglen
Copy link
Member Author

I guess we never got a chance to really discuss this this week since @edeutsch wasn't at the AHM, but basically the current Expand code considers the id, value, and operator of the constraint when deciding whether it can fulfill that constraint (and it also keeps separate records of supported constraints for QNodes vs. QEdges).

so right now the only constraint Expand can fulfill is this exact one, when used on a QNode:

                        {
                            "id": "biolink:highest_FDA_approval_status",
                            "name": "highest FDA approval status",
                            "operator": "==",
                            "value": "regular approval"
                        }

so if, for instance, the value was something other than regular approval, Expand would say it doesn't know how to handle that constraint and throw an error.

I'm pretty sure this is how this validation should work, but @edeutsch can correct me if I seem to have anything wrong.

I'm not sure if there are supposed to be other possible values/operators for the biolink:highest_FDA_approval_status constraint - struggling to find any such documentation (all I've found is the example query I linked to in my previous comment).

@amykglen
Copy link
Member Author

amykglen commented Aug 20, 2021

example ARAX query using this constraint is here, fyi (this is the A.9_EGFR_advanced workflow query): https://arax.ncats.io/?r=19693

@edeutsch
Copy link
Collaborator

This is terrific!
I don't know the answers to your questions, @amykglen we are in uncharted waters here. Surely the first Txl8r resource to support this, so we have first mover benefits but no way to verify anything.
Note that if you pull up your result https://arax.ncats.io/?r=19693 with the freshly deployed upbeat salmon GUI, and click on the query graph and the small molecules node, you can see the constraint rendered in the details popup box.

@edeutsch
Copy link
Collaborator

I'm curious about the content of our FDA approval database.. how many entries to we have and what status values are there? and how many of each status value?

@amykglen
Copy link
Member Author

amykglen commented Aug 20, 2021

yeah, so apparently these are the possible statuses with counts of KG2 nodes: RTXteam/RTX-KG2#100 (comment)

and to start, our little FDA approval database only contains the first row in that table ("fda approved drug", ~4k nodes), since that seemed to me to be the best mapping to biolink:highest_FDA_approval_status == regular approval? but easy to adapt to include other statuses as well.

@edeutsch
Copy link
Collaborator

very interesting, thanks. I would definitely welcome a conversation about this because I don't know where to go from here. Seems very reasonable so far. but seems like we could do more. But I don't know how. Seems like a good AHM topic.

@amykglen
Copy link
Member Author

amykglen commented Aug 20, 2021

sounds good to me! yeah, I'm not sure who's dictating what the constraint ID/values need to be for these kinds of queries, but to me an arrangement where the constraint ID is something like biolink:drug_status and the possible values are the six Drugbank statuses (fda approved, nutraceutical, experimental, investigational, withdrawn, and illicit) seems sensible.

we could then also support situations where the constraint value is a list containing some combination of those statuses.

(note the Drugbank status definitions are accessible like so: https://dev.drugbank.com/guides/terms/experimental-drug)

@edeutsch
Copy link
Collaborator

the way we do this should be standardized translator-wide. But I'm not certain if that conversation has happened yet. Seems like something that data modeling folks would have tackled? Does anyone recall a result from that call series? I don't usually attend.

@amykglen
Copy link
Member Author

yeah, definitely. can't find a record of it being discussed on Data Modeling, so maybe the discussion hasn't yet happened. this issue seems related, but not sure what's going on with it:

https://togithub.com/NCATSTranslator/minihackathons/issues/163

@amykglen
Copy link
Member Author

amykglen commented Aug 20, 2021

one update: realized we should also handle the not version of this FDA approval constraint, so just added that.

not sure how much it will really be used, but it helps shed light on the quality of our FDA approved drug info.. (looks like we're lacking status info for some drugs that are FDA approved in actuality.. I suppose not surprising since our little database contains only 4,000 nodes, but there are >20,000 approved prescription drug products, according to the FDA)

@amykglen
Copy link
Member Author

amykglen commented Sep 2, 2021

suppose I will close this issue; can open a new one down the road if we want to support constraining on other drug statuses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants