Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate annotation of PharmGKB API #556

Open
andrewsu opened this issue Feb 2, 2023 · 12 comments
Open

Evaluate annotation of PharmGKB API #556

andrewsu opened this issue Feb 2, 2023 · 12 comments
Assignees
Labels
data source external Requires fixes to an external service x-bte

Comments

@andrewsu
Copy link
Member

andrewsu commented Feb 2, 2023

The PharmGKB API is described at https://api.pharmgkb.org/ and https://api.pharmgkb.org/swagger/.

In addition, there is an undocumented API endpoint that exposes "association style" data at https://api.pharmgkb.org/v1/data/connection?object1Id=PA166153416. This endpoint exposes the data in the "Variant, Gene and Drug Relationship Data" on their Downloads page that contains ~117k pairwise relationships.

@colleenXu
Copy link
Collaborator

colleenXu commented Mar 3, 2023

The SmartAPI yaml with x-bte annotations and comments has been written, tested, and saved here.

[EDIT: this is on Andrew's plate to discuss the list below with PharmGKB team]

Things to do before registering it / having BTE use PharmGKB's API:

  • contact PharmGKB team: quote "If you are planning on using the API for more than trivial tasks, please let us know. This helps us anticipate the amount of traffic to expect. Even more importantly, it helps us justify the need to support this API to funding agencies."
    • Also ask them if there's a way to query endpoints with multiple IDs (for /data/connection endpoint, object1Id is A or B) per request. And if there are POST endpoints.
  • set up BTE so it can rate-limit its requests to 2 per sec. Quote " Limitation: please limit requests to 2 per second. Too frequent requests will result in a 429 response." Maybe @tokebe can look into this.

@colleenXu
Copy link
Collaborator

colleenXu commented Mar 3, 2023

Other notes (DON'T NEED TO ADDRESS NOW, for future reference):

@tokebe
Copy link
Member

tokebe commented Mar 3, 2023

Rate limiting support is already implemented, under the x-trapi section of the smartapi specification. If this doesn't work, we could theoretically add additional support via the API_LIST or something like that.

As for improper edge merging, we could possibly add additional features to the edge hash? I'd have to look into this further.

@colleenXu
Copy link
Collaborator

@tokebe

This API isn't TRAPI standard so it doesn't make sense to have an x-trapi section.

I wonder if there is an OpenAPI/SmartAPI way of documenting what the rate limits are. I found something about HTTP response headers, but I'm not sure that this is what we're looking for... @newgene might know more. I agree that maybe we could add support through the API_LIST...


On "edge merging": that was more of a "note to myself / other devs for later"...

Some things I am uncertain about right now (that maybe you could help on) are:

  • how BTE processes the API's response (api-transform module?), since it's a non-BioThings non-TRAPI api. It's not clear to me, for some of the x-bte operations I wrote, what will end up in the same record vs separate records.
  • does our "not merging" method work when some records have the specified field and others don't? For this API, sometimes it looks like records from different subqueries (and different operations) are being merged - and these records don't have a shared set of fields.

@colleenXu
Copy link
Collaborator

colleenXu commented Mar 6, 2023

From today's meeting with Andrew:

The main things to do, before setting up BTE to use this API (API_LIST.include): These are on @andrewsu's plate

  • contact PharmGKB team:
    • note their request for contact here
    • Questions:
      • are there POST endpoints and/or ways to query with multiple IDs per request (example: for /data/connection endpoint, object1Id is A or B) per request
      • is there a strict limit of 2 requests / sec? @tokebe can wait for Andrew to get more info (right now, don't need to work on supporting rate-limits in annotation/config file for non-TRAPI apis)
  • contact SRI (NodeNorm / biolink-model):
    • setting the PharmGKB ID prefix (we use PHARMGKB rather than PHARMGKB.PATHWAY right now)
    • supporting PharmGKB IDs (Chemical, Disease, Gene, Pathway at least)

These two comments are for future reference, and no action is required right now on the items.

@colleenXu
Copy link
Collaborator

colleenXu commented Mar 6, 2023

@tokebe also checked, and registering a SmartAPI yaml with x-bte annotation (tagged translator) means it will be accessible to the smartapi-specific endpoints like v1/smartapi/{smartapi_id}/query. But BTE will not use it (general endpoints) and it won't be accessible through team-specific endpoints like the Service Provider endpoints other ARAs use.

That's what we want right now - PharmGKB to be accessible only through the api-specific endpoints so we can show it to the PharmGKB team / SRI team (see comment above). I've registered it under my github right now, and it's here (ID: bde72db681ec0b8f9eeb67bb6b8dd72c). So we can query it through v1/smartapi/bde72db681ec0b8f9eeb67bb6b8dd72c/query

@andrewsu
Copy link
Member Author

andrewsu commented Mar 8, 2023

Adding a quick validation note that the following query for entities related to imatinib PA10804 returns the expected results when posted to https://bte.transltr.io/v1/smartapi/bde72db681ec0b8f9eeb67bb6b8dd72c/query

{
  "message": {
    "query_graph": {
      "nodes": {
        "n0": {
           "ids": ["PHARMGKB:PA10804"],
           "categories": ["biolink:ChemicalEntity"]
        },
        "n1": {
          "categories": [
            "biolink:NamedThing"
          ]
        }
      },
      "edges": {
        "e01": {
          "object": "n0",
          "subject": "n1"
        }
      }
    }
  }
}

@colleenXu
Copy link
Collaborator

Note: when we do add PharmGKB to the API_LIST for the ARA/Service-Provider-team endpoint to use...we'll want to mark it as a primarySource like this (so it will be marked as the primary knowledge source):

        {
            id: 'bde72db681ec0b8f9eeb67bb6b8dd72c',
            name: 'PharmGKB REST API',
            primarySource: true
        },

@andrewsu
Copy link
Member Author

quick note that the addition of PharmGKB IDs to Node Normalizer is slated for their April 30 release. I also emailed the pharmgkb admins about loosening the API limits vs us creating a biothings API instance.

@colleenXu
Copy link
Collaborator

colleenXu commented Jul 7, 2023

Based on the testing of some x-bte annotations today (Pharmgkb ID prefixes), it doesn't seem that PharmGKB IDs are in Node Normalizer...at least the prod version

Note that the issues for this tool are still open

TranslatorSRI/NodeNormalization#171
TranslatorSRI/Babel#116

@colleenXu
Copy link
Collaborator

We may want to put this issue up for discussion after the code freeze / Sept release prep.

This is my view of the barriers:

  • learning more about batch queries + rate limits for this API (previous comments here, here, here). I don't remember if we did learn more, but it looks like we didn't record any decisions on how to proceed
    • discussion of how to set rate-limits for non-TRAPI APIs here and here
  • investigating edge-merging. Discussed here, here, here.
    • I notice that I didn't add input_name / output_name to many operations' response-mapping, probably because of this issue.
    • maybe this is related to post-processing, and could be helped with JQ-related stuff? support JQ for API response transformation #489
  • NodeNorm adds support for PHARMGKB vocabs. discussed here, here, and here

Another thing to note: the biolink:publications urls -> CURIEs work may need adjustments, so the urls for this API are converted to CURIEs while NOT converting the PFOCR-related urls to CURIEs. See here and here for details.

@colleenXu colleenXu added the external Requires fixes to an external service label Oct 16, 2023
@andrewsu
Copy link
Member Author

no further work on this until TranslatorSRI/Babel#116 is resolved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data source external Requires fixes to an external service x-bte
Projects
None yet
Development

No branches or pull requests

3 participants