-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Come up with a plan for documenting Babel provenance/versioning #205
Comments
The only provenance information that Babel currently has is various "intermediate" files (e.g. https://stars.renci.org/var/babel_outputs/2024mar24/intermediate/), each of which includes the mappings that we use to construct our cliques. It is generally true that no mapping information outside the intermediate files is used, and it is generally true that every intermediate file comes from a single source. It is not guaranteed that a particular mapping is only present in a single file, and the intermediate files include identifiers that are left out of the final cliques because those prefixes are not allowed by the Biolink model. This provides a potential way to provide both a quick-and-dirty and a more long term way of assembling metadata:
|
The key to understanding Babel provenance at the moment are the concord files: these provide mappings from one (or more) identifier systems to other identifier systems, and it's generally true that every mapping in Babel originates in one or more of these files. So here is a simple data model I propose for slowly introducing provenance information into Babel:
The goal is for download metadata information to flow into the concord metadata information, and from there into the compendium metadata information, so that the final compendium metadata information will become an increasingly accurate report on the data that went into it. This approach will also allow us to test out this approach and develop provenance data formats on the simpler compendia before working our way up to the larger and more complex compendia. It also allows us to do this work piecemeal, which could be helpful if our work is interrupted by other priorities. |
It may be possible to use Snakemake reporting to make this happen: https://snakemake.readthedocs.io/en/stable/snakefiles/reporting.html
Essentially, figure out how to embed either explicit version number (e.g.
Download UMLS release 2023AB
) or implicit version number (e.g.Downloaded the latest version of XYZ as of Nov 26, 2023
) into the reports.This may also help with Babel documentation (#148).
The text was updated successfully, but these errors were encountered: