-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Standard names for selected climate indices/indicators based on thresholds #14
Comments
Dear @larsbarring Thank you for these detailed and thoughtful proposals. I agree with introducing the distinction between I appreciate the need to support number of occurrence and spell lengths in other than days, but I'm not quite convinced about the generalisation for various reasons. The main one is that "days" and "hours" in this connection might not mean just durations of time, but might refer to date-times. That is, "days" when X is true might mean calendar days, not any period of 24 hours, and similarly "hours" when X is true might mean hours on the clock. If X is true between 1245 and 1315, is that one occurrence (because 30 min is less than 1 hour) or two occurrences (because it was true at some time in both 1200-1300 and 1300-1400)? Please could you clarify what is meant in common use cases? Apart from that, I'm a bit uneasy about relying on the comment " For the first and last occurrences, and beginning and end of spells, it seems to me that we could generalise this nicely using cell methods. Could we have a standard name of Best wishes Jonathan |
Hello Lars, Sorry that it has taken me so long to look at this. I fully agree with the need for these changes, and with the essence of your proposed solutions. I would like to suggest some alternative encodings for the same artefacts, if I may:
As an alternative, I wonder if we could get round this with a name like
How about, e.g.
In a similar vein to 1. and 2., how about, e.g.
In a similar vein to 1., 2., and 3., how about, e.g. All the best, |
Dear Jonathan and David, Let me in this comment start with Jonathan's first question that directly relates back to the conversation in #31:
Can we in the old standard name definitions add something like This can, I am sure, be written more elegantly. I will return to the other points in the following comments. |
And now over to Jonathan's second point (first question as he writes):
Indeed, I agree. In principle this problem is also present in the current standard names based on days. In the world of model data, reanalyses and similar I imagine a day is the same as a calendar day (12-12 -- 12+12). But when it comes to daily data based on manual observations it is/was common to use the morning reading of the raingauge to define daily total precipitation of the day before, e.g. the day is 12-06 -- 12+18 instead of the calendar day. Similarly, the daily maximum and minimum temperature might have been read at the afternoon reading (day is 12-18 -- 12+06). While this is rooted in the practices during the era of manual observations, it still is common practice to use the same definitions for automatic stations. This is something that I have recently been dealing with when working with model data and surface reanalyses. As you point out, this becomes more complicated when going to higher resolution because there is no natural cycle to use as a baseline. But the reference to the natural diurnal cycle is mainly valid for maximum and minimum temperature. In case of (e.g.) precipitation one shower around midnight (or 6 o'clock in the morning) might lead to that two days reach above the threshold (e.g. 10 mm/day), whereas the same shower occurring at another time of the day would result in just one day exceeding the threshold. So, I believe this is something that depends on the definition of the climate index as such. One concrete practical use case for hourly precipitation comes from a presentation at a workshop couple a of years ago, pdf (1.5Mb) available here, where the aim to have hourly counter-parts to many common indices was clearly expressed (slides 8, 13-17). In addition to this use case there are high-resolution radar data, e.g. at 5-minute and 15-minute resolution, which I do not myself have much practical experience from (maybe someone from the radar community could share some insights?). I think that all this points towards the temporal resolution as a central property to describe, rather than the timing of the intervals. I realise that this may be a complication for the Edit: All this relates to my first point, on |
Dear @larsbarring Thanks for your responses. On the first one, I agree that it's fine to keep the original names in their own right and not make them aliases. I would suggest recommending the old names in the case when the distinction is undefined or irrelevant (rather than recommending one of the new precise names, as you propose). On the second one, I see that we all agree on the problem. In that case, I would prefer @davidhassell 's solution of The bounds of the time coordinate variable will imply the boundaries of the periods or intervals. For example, for number of occurrences of maximum temperature in days starting at 0600, you'd expect the time bounds to be for 0600 on the first and last days considered. Best wishes Jonathan |
On the second one, and as we agree on the general problem, let's focus on the more technical details, where I think we agree on several points:
Now, going back to the first point: I think that it might be more confusing than helpful to keep the old standard name for two reasons. Firstly, someone is producing new datasets (an analyst manually, or more automatically in a workflow). Somewhere at this stage the decision has to be made whether to use a strict or non-strict comparison. For new datasets I can see no reason why not being precise about this decision. True, for some datasets it does not make much of a difference, in which case the recommendation should be to use one of the precise ones (I suggested the strict alternative), not to us an imprecise one. Secondly, if we now introduce a set of more general standard names (wrt intervals) it would be more confusing to keep the old ones (that additionally are less precise as per previous point). As far as I understand CF is always trying to avoid overlaps and duplication of different elements. Is there a strong use case for keeping the old ones? |
Dear @larsbarring I agree that "period" is an attractive word, but I preferred "interval" because "period" also refers to the physical idea of a recurrent phenomenon in existing standard names (waves especially), which doesn't seem like quite the same thing to me. For myself, using the same word for the same concept as in cell methods comments is an advantage, rather than a possible confusion. I accept your argument for recommending use of the precise threshold-comparing names in future, and corresponding to deprecate the existing vague ones (although they will remain in their own right, and not as aliases). Best wishes Jonathan |
Dear @JonathanGregory In fact I was thinking about the same interpretation of "period" as you point at; a recurring phenomena. But checking some online English dictionaries lead me to think that |
Dear @larsbarring OK, I accept your argument about possible confusion with "interval". I admit to another discomfort with Best wishes Jonathan |
Hi @JonathanGregory and @larsbarring , Being involved in metadata standards for climate indices and indicators for quite a while, I am following your discussions but did not interact yet. I agree with the decisions taken so far :) A short comment about time period, it is true it is identified generally as a redundancy! https://brians.wsu.edu/2016/05/25/time-period/ All the best |
Hi @pagecp , @JonathanGregory as time period is out I honestly do not know which is better:
After concluding which one to use, I think that we might have covered all aspects related to the first group, ( |
Looking up synonyms for "period", I see I think |
Hello, I've just caught up with the discussion on word choice. I think including the word "time" in the name is good, making it explicit that we are talking about a temporal phenomenon, rather than it being implicit in another stand-alone word (like "interval" or "period"). However "number_of_time_spans_with_X" doesn't sound right to me, possibly part of my mind want to read I currently prefer Thanks, |
As I wrote before, I initially thought that As a next step I created a mockup description/definition of one of the new standard names. Basically I used the existing descriptions and just changed phrases and elements according to what we have discussed so far. The intention is not to be precise and produce a polished text, but rather to see if there are any major issues to address before moving on. number_of_time_intervals_with_air_temperature_strictly_below_threshold The original sentence specifically use "days" , which I changed to "time_interval". As I understand it, this would require a extension of which time intervals are allowed together with I do not want this issue, which focusses on new standard names, to be totally diverted into a conversation about climatological time axis, |
Hi Lars, I am worried that I've missed a big point somewhere down the line, but I thought that the introduction of the "time_interval" auxiliary coordinate variable meant that we didn't need to overload the climatological cell methods, rather that the new formulation would work well with the existing climatological cell methods. Given that (?) I suggest a definition of: number_of_time_intervals_with_air_temperature_strictly_below_threshold Here's an example of the number of 6 hour time intervals for which air temperature is below a threshold during April 1960:
Here's an example of the number of occerences of air temperature being below 0 degreesC for each of the four 6 hour time intervals of the diunral cycle during April 1960:
All the best, |
Hi David, Both your very illustrative examples (thanks for them, they are most helpful!) are about air temperature during 6 hour time intervals. The crux is what does "air temperature" mean in this context -- is it the average or something else (minimum, maximum...)? And then sampled, or simulated, at what frequency? |
Dear @larsbarring and @davidhassell I agree with David that Do we need to require the time span to be a multiple of the time interval? Maybe we could say instead that the minimum time bound is the start of the first time interval, and the last time interval could be incomplete. I can't think of another synonym to try for "interval" yet! Best wishes Jonathan |
Hi Lars, Jonathan, I don't see the problem with non-instantaneous quantities. If the quantity is "the number of 6-houly intervals for which the mean air temperature is strictly below the threshold for each 6-hour interval in the diurnal cycle of April", and each 6 hour mean was calculated from 1/2 hourly data: couldn't we just have the second example the same apart from a new cell methods of:
Am I missing something? |
Dear @davidhassell I was too hasty in agreeing with your second example, actually. I don't think we ought to use Best wishes Jonathan |
Dear Jonathan, Perhaps there may have been slightly crossed purposes - I was using the climatology formulation in encode an actual climatology, rather than the original spacing. I agree with your example ( All the best, |
Dear David Yes, I see. Thanks. The use of the climatological If we made this generalisation, we would no longer need the climatological day convention of cell methods (although we couldn't remove it, because of backward compatibility). The climatological year would still be needed, however, because years are of varying length, and the sub-annual periods of interest are often based on months, which are not constant time-intervals. Best wishes Jontathan |
I wrote:
but perhaps I shouldn't have made that remark! Although this could be done, I think we agreed we thought it was unsatisfactory to record this essential information in a comment. That's why we wanted the auxiliary coordinate to do it. |
With apologies for so many postings, I would like to add a suggestion, continuing the above train of thought. We could define a new cell methods syntax "name However, this change would possibly break our principle that we shouldn't add a new way of doing something, even if it's better, when we've already got a way. That principle suggests that, after all, we should depend on the |
During the CF2021 workshop breakout discussion on climate indices we concluded that it would be useful to pull out the second point of the first suggestion in the initial post:
and refer it back to #31 to have this specific suggestion implemented to the relevant existing standard names independent of the outcome of this more complex issue. |
This issue has had no activity in the last 30 days. This is a reminder to please comment on standard name requests to assist with agreement and acceptance. Standard name moderators are also reminded to review @feggleton @japamment |
This issue has had no activity in the last 30 days. Accordingly:
Standard name moderators are also reminded to review @feggleton @japamment @efisher008 |
Proposer's name
Lars Bärring
Date
2021-05-25
Background
Over the years the prospects for using the CF Conventions to describe various types of derived statistics (aka climate indices or climate indicators) have been recurrently discussed in CF email list threads after the extensive conversation back in 2006-2007 (cf. relevant starting point). Since back then the concept of climate indices/indicators has evolved substantially. The many CF email list threads is a sign of the recurring want to express these new concepts using the CF Conventions. However, the conversation often spread out into discussions of many different aspects with few concrete conclusions with respect to general guidance regarding how to apply the CF Conventions. In this issue I will try to collect some of the ideas and suggestions from several of these email threads.
As a result of the initial conversation in 2006-2007 the following two groups of standard names were introduced:
number_of_days_with_X_above|below_threshold
(canonical unit: 1)spell_length_of days_with_X_above|below_threshold
(canonical unit: day (sic))While these two groups may seem rather disparate and connected only in that they employ thresholds, they are in some sense connected. This will become more clear in the following discussion regarding generalizations and extensions.
Suggested generalizations/changes and extensions
number_of_days_with_X_above|below_threshold
(deprecation)These two suggestions point towards standard names following the pattern
number_of_occurrences_with_X_strictly_above|below_threshold
ornumber_of_occurrences_with_X_at_or_above|below_threshold
.However, from a user perspective there is still a problem with these constructs: the canonical unit is
1
(and not day or hour). While the1
is semantically consistent with the phrasenumber of....
users are confused when confronted with this unit in automatically labeled graphs or other output, which was previously touched upon in this email list conversation, and recently resurfaced on an off-line conversation. Hence, the following suggestion:total_duration_
. A "duration" is clearly associated with a time unit, and "total" indicates that several separate events may be joined together. A data variable having such a standard name would normally have unitdays
orhours
etc. according to context and resolution of input data. But during further processing this may (accidentally) change to any other unit of duration (e.g .the canonical unitsecond
). The temporal resolution, i.e. the unit used for discretisation of the duration, must therefore be recorded in the cell_method construct(interval: T)
. This 'discretisation unit' is what basically transforms the counting operation to a summation.Based on this I would like to suggest five currently existing standard names (v.77) should be deprecated in favour of standard names following the pattern
total_duration_of_X_strictly_above|below_threshold
, canonical unitsecond
, andtotal_duration_of_X_at_or_above|below_threshold
, canonical unitsecond
.alternatively
total_duration_of_intervals_with_X_strictly_above|below_threshold
, canonical unitsecond
, andtotal_duration_of_intervals_with_X_at_or_above|below_threshold
, canonical unitsecond
.first|last_occurrence_of_X_....
orfirst|last_interval_with_X_....
(new)Related to summing the duration above/below some threshold, there are a range of use cases or recording the first or last date/time (in the year, season, month, day,...) when the threshold was exceeded. Referring the the original standard names
number_of_days_with_X_...
the date/time would typically be recorded as day_of_year or similar, cf. this conversation that as far as I can judge did not arrive at a conclusion or recommendation with respect to the CF Conventions. A related earlier thread focus more the reference time, which is an important aspect for what is discussed here. The climate index/indicator data is calculated per period (year, season or month), where this period is defined in the bounds of the time coordinate of the data variable. Framed this way the date/time of the first/last occurrence is a duration since the time specified by the lower bound of the corresponding time coordinate. As such the canonical units issecond
(in practice it might beday
orhour
). In the context of climate indices/indicators the lower bound of the time coordinate is a natural 'reference time' which should be stated in the explanation of the standard name. As was suggested in the previous point the temporal resolution must be recorded in the cell_method construct(interval: T)
.Based on this I would like to suggest the following new standard name patterns
first|last_occurrence_of_X_strictly_above|below_threshold
, canonical unitsecond
, andfirst|last_occurrence_of_X_at_or_above|below_threshold
, canonical unitsecond
.alternatively
first|last_interval_of_X_strictly_above|below_threshold
, canonical unitsecond
, andfirst|last_interval_of_X_at_or_above|below_threshold
, canonical unitsecond
.spell_length_of_days_with_X_above|below_threshold
(deprecation)A spell is a contiguous period of T above|below threshold (such as wet/dry spell or a heat/cold wave), which in the case of climate indices typically is the longest spell during a period (year, season, month), even though one could of course think of other methods like minimum or mean, where the method is specified in the
cell_method
attribute.second
. A spell length is per definition a duration and irrespective of whether the standard name is changed as suggested or not the canonical unit for a duration is seconds.Similar to the previous two points the temporal resolution must be recorded in the cell_method construct
(interval: T)
.Based on this I would like to suggest that the currently existing four standard names (v.77) following the pattern
spell_length_of days_with_X...
should be deprecated in favour of standard names following the patternspell_length_of_X_strictly_above|below_threshold
, canonical unitsecond
, orspell_length_of_with_X_at_or_above|below_threshold
, canonical unitsecond
.beginning|end_of_spell_with_X_....
(new)Analogous to the second point there are use cases for analysing when during a period the spell begins/ends. The technical details given under point 2 applies here, thus I move directly to suggest these new standard name patterns
beginning|end_of_spell_with_X_strictly_above|below_threshold
, canonical unitsecond
, and *beginning|end_of_spell_with_X_at_or_above|below_threshold
, canonical unitsecond
.After that we have discussed the standard name patterns suggested here and reached consensus (hopefully we do) I will look into the existing standard names and use cases to suggest specific standard names and explanations/definitions. These explanations will contain technical details regarding cell_methods, how to specify the temporal resolution, and the relationship between unit used for duration and the reference time. In all four points above I suggest to distinguish between strict and non-strict comparisons, as well as include both "above" and "below". However, we should not add specific standard names until there is a concrete use case.
Finally, I should mention that there are two other groups of climate indices/indicators that share some aspects of those presented here. But they are sufficiently different (and more complex) in their technical details to not include them here. Instead they will be covered in separate issues (later), but I mention them here for reference. The first group is in some sense similar to those in point 1, with two important differences:
unit
is "fraction_of_year", and the threshold is a spatially varying threshold calculated as a percentile value based on a reference period. The second group is the count of all days belonging to spells of at least a certain duration, where the spell is based on a percentile threshold calculated in the same ways as for the previous group.Ping (previous conversations) @huard, @aulemahal, @zklaus, @pagecp, @japamment, @martinjuckes, @davidhassell
The text was updated successfully, but these errors were encountered: