Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate climate_environment MIXS:0001040 #591

Open
mslarae13 opened this issue Jun 6, 2023 · 11 comments · May be fixed by #820
Open

Deprecate climate_environment MIXS:0001040 #591

mslarae13 opened this issue Jun 6, 2023 · 11 comments · May be fixed by #820
Assignees
Labels
1-TermUpdate Update suggestion for existing term, including bugs. Issues from "cig-bug" label moved here. 3-CIG Issues that should be handled by the CIG

Comments

@mslarae13
Copy link
Contributor

Current term details
Please supply the current details of the term that you would like to update:

Term name - climate environment
Term ID - MIXS:0001040
Structured comment name - climate_environment
Definition - 
Expected value - 
Value syntax -
Example -
Preferred unit - 
Package(s) - agriculture, plant-associated, 

Suggested update(s)
Please supply the new suggestions for any of the details listed below (only insert text to those details that should be updated):

deprecate / retire term

Additional context
Add any other context about the update request here, e.g. why you think this needs to be updated.

Term is confusing and not used appropriately. After a query of NCBI, the term is often used to describe the biome which should be captured in another field.
Discussed at TWG 2023-06-06

@mslarae13 mslarae13 added the 1-TermUpdate Update suggestion for existing term, including bugs. Issues from "cig-bug" label moved here. label Jun 6, 2023
@mslarae13
Copy link
Contributor Author

@only1chunts only1chunts added the 3-CIG Issues that should be handled by the CIG label Oct 19, 2023
@mslarae13
Copy link
Contributor Author

@ramonawalls commented on NMDC

microbiomedata/nmdc-schema#586 (comment)

Can we check in and confirm we can deprecate this term?

@only1chunts

@mslarae13 mslarae13 self-assigned this Jul 9, 2024
@mslarae13 mslarae13 moved this to In Progress in TWG Activity Track Jul 9, 2024
@mslarae13 mslarae13 linked a pull request Jul 10, 2024 that will close this issue
@mslarae13 mslarae13 linked a pull request Jul 10, 2024 that will close this issue
@turbomam
Copy link
Member

turbomam commented Sep 23, 2024

I think It's good to move forward with this issue's PR.

If anybody want substantiating data:

As of this month, there are 8991 INSDC Biosamples, out of roughly 40 million, with a climate_environment annotation.

Here's the breakdown of annotations that were used at least twice.

Query
SELECT
	CONCAT(SUBSTRING(content, 1, 72),
	CASE
		WHEN LENGTH(content) > 75 THEN '...'
		ELSE ''
	END) AS shortened_content,
	count(1)
FROM
	main.attributes a
WHERE
	harmonized_name = 'climate_environment'
GROUP BY
	shortened_content
HAVING
	count(1) > 1
ORDER BY
	count(1) DESC;
Results (`climate_environment` contents have been truncated to 72 characters. Scroll to right to see counts if necessary.)
shortened_content count(1)
Mediterranean, subtropical 4120
not applicable 1669
NA 782
not collected 494
Humid subtropical 392
Lab microcosm 192
greenhouse 113
Warm temperate (Cfb) 95
Boreal (Dfb) 88
none 72
temperate 65
Temperate, subalpine 60
Regular_Normal 46
Early_Chilling 43
control conditions 40
controlled conditions, branch partially covered by plastic bag 36
freeze-thaw 35
continental with Mediterranean influences 27
Humid continental climate 25
subalpine 24
riparian zone 23
controlled conditions 18
freeze 17
thaw 17
moderate 16
drought 14
submediterranean 14
Dry, Hot 14
tropical wet and dry climate 12
Controlled 12
In winter temperatures drop to 3 degrees 11
1000ppm CO2 concentration and 15℃ 9
1000ppm CO2 concentration and 10℃ 9
400ppm CO2 concentration and 10℃ 9
cold 8
heat 8
missing 8
Tropical 8
5 weeks cold storage 6
2000ppm CO2 concentration and 10℃ 6
2 weeks cold storage 6
Plants were kept in a greenhouse with a temperature of 26 ± 5.98 °C and ... 6
Orchard at harvest 6
Agricultural environment 6
4 weeks cold storage 6
3 weeks cold storage 6
Dry 6
Temperate 5
Common Garden, Control 3
Greenhouse, Heat, CO2 3
Greenhouse, Heat 3
Greenhouse, Drought, Heat, CO2 3
Moderate temperature 3
Greenhouse, Flooding, Heat, CO2 3
Greenhouse, Flooding, Heat 3
Greenhouse, Drought, Heat 3
Common Garden, Flooding 3
Common Garden, Drougth 3
https://kare.ucanr.edu/Weather_Physical_-_Biological_Data/ 3
temperate climate 3
400ppm CO2 concentration and 15℃ 3
desertic 2
watered 2
S biological replicates 2 2
KG biological replicates 2 2
CK biological replicates 1 2
S biological replicates 3 2
Not applicable 2
S biological replicates 1 2
subtropical 2
ambient conditions 2
CK biological replicates 3 2
common garden setup 2
very hot 2
KG biological replicates 3 2
CK biological replicates 2 2
KG biological replicates 1 2

@turbomam
Copy link
Member

turbomam commented Sep 23, 2024

I uploaded mixs-slots-enums-no-MixsCompliantData-domain.json from external-metadata-awareness
into Claude 3.5 sonnet and asked

what other terms overlap with climate_environment semantically?

It replied

Based on the provided information, several other terms in the schema overlap semantically with "climate_environment". These include:

  1. env_broad_scale - Describes the major environmental system the sample came from, which could include climate-related categories.
  2. env_local_scale - Describes more specific environmental features that could be influenced by climate.
  3. env_medium - Describes the environmental material surrounding the sample, which may be affected by climate.
  4. season - Directly related to climate patterns.
  5. season_environment - Very similar to climate_environment, describing seasonal exposure.
  6. season_precpt - Describes seasonal precipitation, a key climate factor.
  7. season_temp - Describes seasonal temperature, another key climate factor.
  8. annual_precpt - Related to overall climate patterns.
  9. annual_temp - Also related to overall climate patterns.
  10. weather - Current weather conditions are closely related to climate.
  11. temp - Temperature is a key component of climate.
  12. humidity - Humidity is another important climate factor.
  13. air_temp_regm - Describes temperature patterns, which are part of climate.
  14. rainfall_regm - Describes rainfall patterns, another key climate component.
  15. extreme_event - Could include climate-related events.
  16. solar_irradiance - Related to climate patterns and energy input.

These terms all relate to environmental conditions, weather patterns, or long-term climate factors that overlap semantically with the concept of "climate environment".

@turbomam
Copy link
Member

I also asked

Analyzing the definition of "climate_environment" against the other terms, are there any aspects that aren't fully covered by the other terms?

  1. Treatment aspect:
    • The concept of "treatment" or intentional exposure to a climate condition isn't explicitly covered by the other terms.
  2. Repetition of treatment:
    • The idea of repeating the climate exposure multiple times isn't captured in the other terms.
  3. Duration of each treatment:
    • Specific information about how long each climate exposure lasted isn't addressed by the other terms.
  4. Start and end times:
    • Precise timing information for the entire treatment period isn't covered.
  5. Multiple climates:
    • While some terms might imply different climate conditions, the explicit mention of exposing to multiple climates isn't captured.
  6. Regimen structure:
    • The structured approach of a "treatment regimen" with specific parameters isn't reflected in the other terms.
  7. Intentionality:
    • The implication that this is a controlled, intentional exposure to climate conditions, rather than natural environmental conditions, isn't covered by the other terms.

@mslarae13
Copy link
Contributor Author

Well that's not correct.. because treatment columns do capture repetition, duration, and times. Depending on the slot. Regardless, IF there's something not captured it should have a specific slot, shouldn't it have a more specific slot to capture? Not this general catch all.

What does "multiple climates" even mean?

Intentionality is never captured in slots.

@turbomam
Copy link
Member

Thanks for reviewing all that new information/inferences, @mslarae13

treatment columns do capture repetition, duration, and times

I agree! I think the point here is that climate_environment is a treatment slot, which differentiates it from the other 16 slots that Claude found to be semantically similar. Furthermore, removing climate_environment may leave us in a position where we don't have another slot to capture ways in which the experimenter manipulated the climactic dimension of the environment experienced by the sample.

If you think climate_environment should be removed, I support that. I just want to have a record of the potential consequences.

I have mixed feeling about whether this should be considered a catch-all slot, but I do agree that catch-all slots should be availed.

What does "multiple climates" even mean?

I think this is highlighting the fact that climate_environment is multivalued, so a MIxS record could have a climate_environment value of

'tropical climate;R2/2018-05-11T14:30/2018-05-11T19:30/P1H30M|monsoon climate;R2/2019-05-11T14:30/2019-05-11T19:30/P1H30M'

Intentionality is never captured in slots.

That may be one of the deepest things I have heard anybody say about MIxS! I asked Claude to elaborate, and it takes the position that all of the ..._regm slots have a quality of intentionality, in the sense that an experimenter has intentionally manipulated something about the sample, rather than just reporting the natural state of things. It's a language model and may be equating "intentionality" with treatment or manipulation.

If you think the word intentional is a tar pit, then we certainly don't have to use it amongst ourselves.

Is one of your points that MIxS doesn't provide a mechanism to capture "I intended to do X, but Y is what really happened?" I.e., that MIxS should only be used to report things that can be confirmed to be true after the fact? I think that's a good plan but maybe we should make sure that is communicated in the documentation.

@lschriml
Copy link
Member

Have the developers of these terms been contacted ?
What was their feedback?

@mslarae13
Copy link
Contributor Author

Discussed with CIG on 2024-09-24.
Need to reach out to the groups that included this term in their extension, ag team.

@mslarae13
Copy link
Contributor Author

Sent an email on Nov 26th

@mslarae13
Copy link
Contributor Author

Sent a followup email today. Term will be deprecated in January.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1-TermUpdate Update suggestion for existing term, including bugs. Issues from "cig-bug" label moved here. 3-CIG Issues that should be handled by the CIG
Projects
Status: In Progress
Development

Successfully merging a pull request may close this issue.

4 participants