Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update GOLD ecosystem classification path terms #173

Merged
merged 5 commits into from
Jan 25, 2024
Merged

Conversation

pkalita-lbl
Copy link
Collaborator

Re: #154 (will close the issue once a new version is released and integrated into nmdc-server)

Summary

  • Add new src/nmdc_submission_schema/datamodel/gold.py module with a function and CLI to extract GOLD ecosystem classification path terms from GOLD's JSON file and inject them as enum permissible values into a schema. Currently it makes 5 enums (one for each classification path level) representing all possible terms at each level and a second set of 5 enums that represent a reduced set of terms that NMDC has identified (through manual curation) as applicable to the soil template.
  • Update project.Makefile to download the GOLD JSON file and perform the enum injection as part of building the final src/nmdc_submission_schema/schema/nmdc_submission_schema.yaml file.
  • The schema itself does not have any concept of enforcing valid pathway component combinations (it never did so nothing has changed in that respect). There is front-end code in nmdc-server which dynamically controls the dropdowns for the 5 ecosystem classification pathway columns to ensure you can only choose values that make valid combinations. It uses its own copy of the GOLD JSON file to drive that logic. Since we want to ensure that that JSON file agrees with the schema I've bundled it here (in project/thirdparty) and nmdc-server will pick up that version.
  • Remove the hardcoded enums in schemasheets/tsv_in/enums.tsv and update the slot ranges in sheets_and_friends/tsv_in/modifications_long.tsv to use the new enum names.

Comment

With this setup whenever we do a clean build (which should happen at least before each release) we'll get a new copy of the GOLD JSON file and build new enums based on it. That means that there isn't any explicit step to sync with GOLD; it should just happen transparently.

cc: @aclum @mslarae13

@pkalita-lbl pkalita-lbl requested a review from turbomam January 19, 2024 19:20
@pkalita-lbl pkalita-lbl merged commit 6cfd782 into main Jan 25, 2024
2 checks passed
@pkalita-lbl pkalita-lbl deleted the issue-154 branch January 25, 2024 00:41
@mslarae13 mslarae13 added the Prod code release needed Updates made, production not currently using this version. Code push required to see in prod label Jan 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Prod code release needed Updates made, production not currently using this version. Code push required to see in prod
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants