-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GOLD ecosystem pathway enumerations are out of date #154
Comments
I agree. See comment in microbiomedata/nmdc-schema#1108 (comment) |
We are missing Bulk Soil which is the 'specific_ecosystem` that Hugh wants to list the NEON samples as. |
@turbomam do you have time update the GOLD pathway enumerations? current values in GOLD can be found here https://gold.jgi.doe.gov/ecosystem_classification |
Where can I find a textual representation of the GOLD pathway elements? |
Maybe here? GOLD's 5-Level Ecosystem Classification Paths Excel Last generated: 11 Jan, 2024 Clicking the link downloaded this file: GOLDs5levelEcosystemClassificationPaths.xlsx This should be noted in the schema |
Are we adding all values form all five categories into the enums? Here's list of all five, ranked by the number of paths they appear in. I could report it some other way if you want. Deleting long list for now. Will post somewhere else soon. |
@turbomam yes please |
How are the GOLD path elements modeled in the nmdc-schema and the submission schema?Here's the definition of SpecificEcosystemEnum in the compiled submission schema And the other four enums, which are contiguous at this point in time. An example value for EcosystemSubtypeEnum is Floodplain and is currently modeled in this style Floodplain:
text: Floodplain
description: placeholder PV descr Floodplain doesn't appear anywhere in the nmdc-schema I think these enumeration origiante in https://github.com/microbiomedata/submission-schema/blob/main/schemasheets/tsv_in/enums.tsv which has been hand-curated up until now. |
@pkalita-lbl can you please help me think about the GOLD path enum lifecycle?
|
For all practical purposes, we're just asserting the enum name and the permissible value name in |
Fetching ecosystem path data from GOLD
assets/GOLDs5levelEcosystemClassificationPaths.xlsx:
curl -o $@ https://gold.jgi.doe.gov/download?mode=ecosystempaths GOLD's source file calls the path elements
We are calling the enums
|
I started working on this out of nmdc-schema. We can move this later if it does what you want. |
Right There is also custom code in the submission portal that alters the behavior of those five columns so that you only get suggestions for valid paths. The logic is driven in part by this file: https://gold.jgi.doe.gov/download?mode=biosampleEcosystemsJson (we bake a copy into the submission portal code; we don't constantly re-fetch it). So for example, when you go to to fill in the I see two options going forward:
|
Also a long time ago I tried generating a LinkML schema that encoded the valid pathways as |
Thanks @pkalita-lbl ! I have implemented at least half of option 2. from above as I don't mind if you decide to go with option 1. instead |
@pkalita-lbl (or anyone): Are subsets of the GOLD paths being created for the the different environmental contexts like soil or water? |
I'm not sure but I think that's another thing that will influence how we implement the long-term process for keeping us in sync with GOLD. So I'm not sure we're ready to jump into implementing anything quite yet. |
We did intentionally limit this. That said, how it's limited will vary from sample type to sample type (environmental extension to extension)
The missing 'lower level' ecosystem terms are cuz GOLD updated and we didn't get the updates.
Yes, we don't want to lose this because it should build the same way the GOLD ecosystem tree does: https://gold.jgi.doe.gov/ecosystemtree
@turbomam pretty sure that's a yes. But we haven't done it. it's really just identifying where in the tree we would limit.. |
I'm not sure when this was last updated but GOLD's last release of ecosystem pathways was in Sept 2023. I noticed this because value of
Peat
for column specific ecosystem does not validate and confirmed it is not listed in the enumeration SpecificEcosystemEnumWe should
@turbomam @pkalita-lbl @mslarae13 @shreddd
The text was updated successfully, but these errors were encountered: