-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Determine if and how rel_to_oxygen
will be used in the submission schema
#58
Comments
Can we run a query that asks "of the samples captured in NMDC (mongoDB), do any of the Biosample objects have this slot (or oxy_stat_samp) filled out? If so, what is there?" |
Once @turbomam has made a query from NMDC mongoDB, reassign to Montana to check |
Only keep rel_to_oxygen. Note in rel_to_oxygen that this is applicable to "Column: oxygenation status of sample". |
db.getCollection("biosample_set").find( { part_of : { $exists : true } } );
db.getCollection("biosample_set").find( { rel_to_oxygen : { $exists : true } } );
db.getCollection("biosample_set").find( { part_of : { $exists : true } } );
|
Neither |
|
@mslarae13 I agree that we should only use one of After deciding that, we should combine the values from the |
Here are the select
value,
count(1)
from
all_attribs aa
where
aa.harmonized_name = 'oxy_stat_samp'
group by
value
order by
count(1) desc ;
|
If we use
The other |
rel_to_oxygen
will be used in the submission schema
Adding to current sprint per Mark. Need feedback from @mslarae13 |
|
@mslarae13 I'm starting this now. I will provide the list of enumerated values soon. |
src/schema/mixs.yaml alredy has this rel_to_oxygen_enum:
from_schema: http://w3id.org/mixs/terms
permissible_values:
aerobe: {}
anaerobe: {}
facultative: {}
microaerophilic: {}
microanaerobe: {}
obligate aerobe: {}
obligate anaerobe: {} and oxy_stat_samp_enum:
from_schema: http://w3id.org/mixs/terms
permissible_values:
aerobic: {}
anaerobic: {}
other: {} |
Let's leave the
I guess if we found some decisive cutoffs between different oxygenation states, we could update |
Based on recent update will move to new sprint to be closed |
@turbomam I'm good with that. Will we leave the 'other' option? |
Yes, I included 'other'. This should be in nmdc-schema 7.6.0 and submission-schema 7.6.0 now. I'll confirm in a few minutes. |
confirmed: submission schema 7.6.0 updated as described |
Thanks @turbomam @pkalita-lbl can we get this change propagated to the submission schema? |
If I'm reading Mark's comments correctly these changes went into submission schema v7.6.0. A later version of the submission schema (v7.6.5) is already used by the portal codebase but it hasn't been released to production yet. So I would expect you'd be able to see this in dev right now. |
Schema updates have been done since so closing this issue. |
This illustrates approaches for repairing columns with enumerations of permissible values, also known as controlled vocabularies. Look for 'enumeration' in the MIxS'
Expected value
column, or a range of '*_enum' in the LinkML model. See reference material below.Related code: sample_annotator/rel_to_oxygen_example.py
Permissible values
Reference material
rel_to_oxygen
rel_to_oxygen
Observed, with matches
Easy fixes:
Trickier!
Probably not justified when the count is really low, like 1
Gotchas:
aerobe
is a noun: a microorganism that requires the presence of oxygenaerobic
is an adjective which can be applied to an organismoxic
is an adjective that describes the water an organism lives in, not the organism itselfThe text was updated successfully, but these errors were encountered: