Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

semmeddb: generating x-bte annotations using metatriple file #644

Closed
colleenXu opened this issue May 24, 2023 · 4 comments
Closed

semmeddb: generating x-bte annotations using metatriple file #644

colleenXu opened this issue May 24, 2023 · 4 comments
Assignees

Comments

@colleenXu
Copy link
Collaborator

colleenXu commented May 24, 2023

@erikyao has code to create a meta-triple table file (original ask) during the semmeddb api deployment process.

Yao got me the meta-triple file for the data used in semmeddb/semmeddb2 during Jan/Feb 2023.

My task is to refactor the x-bte-annotation generating notebook to use this file.

Benefits:

  • will add operations for the ncbigene namespace FINALLY
  • get code that can be reused for auto-generating x-bte annotations! (but note that one needs to tweak / test query structure for other APIs)

Notes:

  • will be setting an arbitrary count cutoff for creating operations from meta-triples. This cutoff will need adjusting because semmddb2 will have a diff number of records for sure
@colleenXu colleenXu self-assigned this May 24, 2023
@colleenXu colleenXu changed the title semmeddb: generating x-bte annotations using metatriple file semmeddb2: generating x-bte annotations using metatriple file May 24, 2023
@colleenXu colleenXu changed the title semmeddb2: generating x-bte annotations using metatriple file semmeddb: generating x-bte annotations using metatriple file Aug 8, 2023
@colleenXu
Copy link
Collaborator Author

colleenXu commented Aug 8, 2023

Done and deployed (registration refreshed): NCATS-Tangerine/translator-api-registry@b08b9b3

It uses a new metatriple + counts file generated by Yao Monday evening. However, it still uses predication counts (not the record / triple counts of the current API). I still use a cutoff of predication counts > 100

I still kept the original notebook (renamed to AutoGen_SEMMEDDB_Old.ipynb).

The custom processing we still do (outside of Translator-curated exclusions) is written up in the collapsed section of:
#669 (comment)


EDIT: added comment

I think this effort added 739 operations (from 3936 to 4675), which would all be Gene-related (have NCBIGene ID-namespace as either input, output, or both).

It's a little hard to tell because the ordering of operations also changed when using the metatriples file.

@colleenXu
Copy link
Collaborator Author

Going to leave this open, since I'm waiting for a response from @erikyao on generating a metatriple file with:

  • number of API records (basically unique triple count), rather than what seems like predication count
  • number of API records where pmid_count > 3 (helpful given the current constraint)

If this isn't possible, then this issue can be closed as completed.
If this can be done, then maybe this issue can be kept open until that file is made, I adjust the notebook to use it, and generate / deploy x-bte annotations based on this data...

@colleenXu
Copy link
Collaborator Author

Related to #645

@colleenXu
Copy link
Collaborator Author

Yao generated the file with the counts, and I saved a copy of this file to the same repo as the notebook. NCATS-Tangerine/translator-api-registry@205e32d

I also adjusted the notebook to use this file, and with discussion from Andrew (recorded in the notebook), we set the cutoff to metatriples w/ >20 documents with pmid_count > 3. See the commit link for details.

I've deployed the updated SmartAPI yaml (refreshed registration).

So we can close this issue as complete. We can open new issues when we want to update the metatriples+counts file, notebook, and x-bte annotations / SmartAPI yaml...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant