Update GOLD ecosystem classification path terms #173
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Re: #154 (will close the issue once a new version is released and integrated into
nmdc-server
)Summary
src/nmdc_submission_schema/datamodel/gold.py
module with a function and CLI to extract GOLD ecosystem classification path terms from GOLD's JSON file and inject them as enum permissible values into a schema. Currently it makes 5 enums (one for each classification path level) representing all possible terms at each level and a second set of 5 enums that represent a reduced set of terms that NMDC has identified (through manual curation) as applicable to the soil template.project.Makefile
to download the GOLD JSON file and perform the enum injection as part of building the finalsrc/nmdc_submission_schema/schema/nmdc_submission_schema.yaml
file.nmdc-server
which dynamically controls the dropdowns for the 5 ecosystem classification pathway columns to ensure you can only choose values that make valid combinations. It uses its own copy of the GOLD JSON file to drive that logic. Since we want to ensure that that JSON file agrees with the schema I've bundled it here (inproject/thirdparty
) andnmdc-server
will pick up that version.schemasheets/tsv_in/enums.tsv
and update the slot ranges insheets_and_friends/tsv_in/modifications_long.tsv
to use the new enum names.Comment
With this setup whenever we do a clean build (which should happen at least before each release) we'll get a new copy of the GOLD JSON file and build new enums based on it. That means that there isn't any explicit step to sync with GOLD; it should just happen transparently.
cc: @aclum @mslarae13