Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarifying the temperature unit associated with some cell methods and standard names #125

Closed
larsbarring opened this issue Apr 6, 2021 · 224 comments · Fixed by cf-convention/cf-conventions#480
Labels
enhancement New feature or request

Comments

@larsbarring
Copy link

larsbarring commented Apr 6, 2021

Associated with all standard names is a canonical unit, which in case of temperature is Kelvin. And Appendix E explains how these canonical units are transformed by the different cell methods. For temperature this currently becomes fragile for the combined effect of two aspects of CF:

  • CF does not make a (clear) distinction between absolute temperature (irrespective of unit) and relative temperature (cf. Standard names: temperature differences (not anomalies) #90). Instead this has to inferred from a detailed interpretation of standard names and/or cell methods.
  • CF does not mandates use of the canonical unit, but instead 'delegates' unit handling to udunits (cf.
    1.3 Overview and 3.1 Units

And udunits, that accepts Kelvin and degree_Celsius (and others) temperature units, can not make a distinction between a measure of absolute temperature and relative temperature. This means that software do need to make a rather involved interpretation of phrases of the standard name in combination with cell methods (in fact this has to be done irrespective of whether udunits is used or not) and whether or not to apply the additive constant 273.15 when converting between Kelvin and degree_Celsius.

While I would have liked to CF to make a clear distinction between measures of absolute and relative temperature, this may not be viable given the overarching goal to not break backward compatibility. As a second best alternative I suggest that text is added to clearly spell out which phrases of standard names and cell methods are to be used to make the distinction between absolute and relative temperature. Collecting this information at one place alerts and helps users general and in particular software designers. This text could be added under Section 3.1 (new subsection?) or in Appendix E, cross-referenced where appropriate.

So far I have come across the following phrases that I believe indicate a relative temperature:

  • standard names: difference, anomaly, change, tendency, bias
  • cell methods: range, standard_deviation, variance. , and perhaps root_mean_square, sum_of_squares
    For some of the standard names it may be obvious from the context that in practice the unit is always Kelvin, but for several I would say that either Kelvin or degree_Celsius is equally probable.

EDIT 2021-10-21: After more careful consideration root_mean_square and sum_of_squares do not involve differences, so deleted. /LB

@martinjuckes
Copy link

@larsbarring : on the issue of backward compatibility, I believe that the protocol allows us to replace existing standard names with more precise terms when a use case emerges that needs more precision that provided by existing names.

The references to Udunits really needs to be fixed. Udunits2 (Udunits version 1, which is cited in the convention, is no longer supported) assumes that quantities measured in Kelvin and Celsius are absolute on the respective scales, and quantities in powers of Kelvin and Celsius are relative amounts.

In the ISO 80000 standard on Quantities and Units has distinct quantities for thermodynamic temperature and Celsius temperature, not considered as interchangeable. The ISO 80000 units kelvin and degree Celsius are equivalent -- the specification of the offset occurs in the definition of the two distinct quantities. SI takes the same approach: thermodynamic temperature is a fundamental quantity, Celsius temperature is a derived quantity (see p. 138 of the SI Brochure).

I can see a number of options:

  1. The status quo: undesirable and open to confusion if software is not aware of the need to treat temperature units as a special case;
  2. Conform to Udunits2: this should be considered, because it is implied by the existing text --- but it does not look like a good option to me.
  3. Provide some guidance to reduce the scope for confusion, as suggested in Lars's post.
  4. Conform to SI and ISO 80000 by introducing some new standard names.

Conforming to UDUNITS does not look advisable, because there is no guarantee that features such as this will remain unchanged in future releases.

I like the idea of conforming to SI and ISO 80000. In other areas, such as the discussion of the use of kg CO2e as a unit by the climate impacts and adaptation community, we insist on a pure interpretation of units. Allowing an ad hoc interpretation of units for air_temperature, in which the units value is also interpreted as specifying the scale, is inconsistent.

@larsbarring
Copy link
Author

larsbarring commented Oct 8, 2021

I have come to realise that maybe I used the wrong words: what I meant with a relative temperature is really a temperature difference where Δ°C = ΔK. And with absolute temperature I meant temperature where 1°C = 274.15K.

Now over to @martinjuckes comments and suggestions.

  • On backwards compatibility: yes we can (and should) update standard names when new more precise terms are needed. But what I meant was more related to Appendix E, which details the unit conversions associated with cell methods. Here we have, for example cell method standard_deviation where it is is specified that no unit transformation take place. Yet, for temperature we are transforming a temperature to a temperature difference. As far as I have read the Conventions documents I have not seen any distinction between these.
  • I believe the references to UDUNITS have been fixed in CF 1.9. In the initial comment I have now changed links to this version.
  • Yes, Kelvin and degree_Celsius are equivalent, but they are 'equivalent in different ways' depending on whether it is a temperature or a temperature difference. As Martin writes, SI acknowledges this difference, but UDUNITS does not to my knowledge.

Regarding Martin's four options:

  1. Status quo is not good. I do agree with Martin's conclusion that this would be inconsistent.
  2. Neither is conforming to UDUNITS2 a solution, because UDUNITS2 does not know whether a quantity represents a temperature or a temperature difference.
  3. Yes, this is the minimal solution. But maybe I was, for reasons of imagined backward incompatibility, too early ruling out the best solution (as I now see it): Clearly spell out in Table E.1 that certain methods involves a transformation from u to Δu, and make Δu the canonical unit for the relevant standard names.
  4. Maybe we do not need new standard names if we are just making a more precise statement regarding the canonical unit?

@japamment: maybe part of this (point 3 & 4) has some bearing on the outcome of CF workshop discussion on canonical units

@JonathanGregory
Copy link
Contributor

Dear @larsbarring and @martinjuckes

I think it would be sufficient to describe the issue in the definitions of standard names, where appropriate. The important practical point is that a different rule is needed for units conversion when it's a temperature difference, as Lars says. I would say that's a practicality rather than a "fundamental" distinction. There are similar issue with other quantities. For example, you can't convert degree_east to degree_north, although they are canonically equivalent. A metre is always the same size, but a vertical coordinate of 1 m for height (above the surface) and altitude (above the geoid) are not the same.

I do not think we need new standard names, however. The existing standard names recognise that a temperature difference is a different quantity from a temperature, using various phrases, as Lars has identified. That's fine. I think they are canonically the same. Equilibrium climate sensitivity is a temperature difference, for example, which is sometimes written in K and sometimes in degC. It depends on the author's preference and the intended audience. Temperature are also temperature differences, really; temperatures in degC are differences from freezing point, temperatures in K are differences from absolute zero.

Best wishes

Jonathan

@larsbarring
Copy link
Author

larsbarring commented Oct 8, 2021

Dear @JonathanGregory

I am not sure that I follow what you are aiming at with the examples in the first paragraph:

  • degrees_north and degrees_east are units for quantifying different "things". That is they are associated with different variables having different standard names. True, one can convert between them by adding/subtracting 90 degrees, in which case it becomes obvious that 1 degree_north only is canonically the same as 1 degree_east in a 'relative' sense, i.e. as a difference from their individual origins.
  • A metre is always the same, and thus I argue that a vertical coordinate of 1 m for height (above the surface) and altitude (above the geoid) are in fact the same quantity but related to different variables. This is clear as they are defined as measuring the difference from their individual origins that are defined by the standard name. That is, by definition they do not represent the same position in 'absolute' space, e.g. when measured as a distance from the centre of the earth (however that is defined).

Both these examples focuses on units/variables that have different uses/meanings that are not directly comparable to the case for temperature. Moreover, they both happens to be (spatial) coordinate variables rather than 'ordinary' data variables. The key thing in these examples is, as with temperature, that if the data is transformed to a common origin it becomes obvious that the origin is important. To my knowledge, for 'data variables' this is only relevant for temperature, which has units where one alternative involves an additive constant.

All this tells me that Table E.1 should be updated to specify if a method involves a transformation u -> Δu. As it stand now it is not correct to state that all methods involves a unit transformation u -> u, at least as long as CF delegates unit handling to UDUNITS.

Regarding the second paragraph, I fully agree with Jonathan that no new standard names are needed. I should be enough with more details. One part of this is adding text to the description. But given the fact that the xml version of this table is increasingly used directly in automated workflows and imported into analysis software I suggest adding a new column ('to the right') flagging whether a standard name's unit is associated with an 'absolute' quantity or a difference quantity. If it turns out to be complicated to draw the line this flag could be set only for standard names related to temperature (and any other units where an additive constant may be important).

To resolve this issue by just making text changes to relevant standard name descriptions would be missing a good opportunity to make the Conventions more clear. It would be very easy for a reader to miss that a standard name description has changed, especially if we do not deprecate the standard name. And in automated use of the the xml version such a change would almost certainly go unnoticed.

Kind regards,
Lars

@JonathanGregory
Copy link
Contributor

Dear @larsbarring

I think the example with height and altitude is interesting to compare with temperature in K and degC. The only difference is the reference, as you say, in both cases, but with the vertical coordinate we give them different standard names. I think that is because the difference in reference is not a numerical constant. I don't think we would have different standard names for instance for height above the geoid (i.e. altitude) and height above a surface 100 m above the geoid.

Time is like temperature in that it is sometimes a difference (as in an interval of time) and sometimes a coordinate (meaning a difference wrt a reference time). In the latter case, we always specify since the reference time in the units, and that makes the distinction. For temperature, the units don't help us, because K and degC imply their reference temperature (when it's not a difference).

I would also remark that any quantity which serves as a coordinate might sometimes be a data variable and vice-versa.

The above points are just a discussion! They don't help us to decide what to do to indicate the distinction between temperature and temperature difference in a machineable way. I think your question is relevant of whether temperature is the only such case. That will help us to decide whether we need a general solution.

Best wishes

Jonathan

@larsbarring
Copy link
Author

Dear @JonathanGregory
I think that there are two aspects of temperature that neither height nor altitude capture:

  1. It is physically meaningful to have negative values i.e. all these are always differences (leaving out the issue of what happens when the negative value is large enough to pass beyond the centre of the earth). Thus these quantities are already by definition Δu. Only if height is measured for the centre of the earth there is correspondence with the absolute Kelvin scale.
  2. Your example geoid and geoid+100m seems to work very well, but that is only because we just have one and the same unit, meter, for both. If there was a separate unit for the latter, say meter_100 it would be closer to the problem with temperature.

Let me try to capture both aspects in one example based on height: we define the standard name elevation. For historic reasons we have in common use the unit meter_Celsius that use as origin the floor of Anders Celsius' workplace. However modern science have established that earth centre is is exactly 6.3781×10^6 meter below this floor, and in sciences it has become common to use earth's centre as origin for elevation. And to not mix things up the new unit is called meter_Kelvin to distinguish it from meter_Celsius, where Δmeter_Celsius = Δmeter_Kelvin.

I believe this captures the essence of the problem. One units is absolute, i.e. it has the origin at 0 and it is not physically meaningful to have negative values, the other is relative an origin ≠ 0, which means that both positive and negative values are possible. And Δu is exactly the same for both units.

Let me come back to your earlier examples

Equilibrium climate sensitivity is a temperature difference, for example, which is sometimes written in K and sometimes in degC. It depends on the author's preference and the intended audience. Temperature are also temperature differences, really; temperatures in degC are differences from freezing point, temperatures in K are differences from absolute zero.

-- It would indeed be a mistake if one were to use UDUNITS to harmonise the units of ECS data having mixed units K and degC.
-- degC is indeed a temperature difference.
-- I only half agree (literally) that temperatures in K are differences from absolute zero. To me this does not capture that the Kelvin scale does not have negative values. To make it clear I would rather say that temperatures in K quantifies the distance from the absolute zero.

Now, returning to your most recent comment:

Time is like temperature in that it is sometimes a difference (as in an interval of time) and sometimes a coordinate (meaning a difference wrt a reference time). In the latter case, we always specify since the reference time in the units, and that makes the distinction. For temperature, the units don't help us, because K and degC imply their reference temperature (when it's not a difference).

-- Yes, as we now use time it is always a relative measure (i.e. difference). But in principle we could find an absolute starting time, e.g. when the Big Bang happened (BB) . According to Wikipedia the best estimate is 13.772 * 10^9 years ago. Although it is only an estimate, we could in principle adopt this as an absolute zero for a time scale to be used in CF. Or we could be less general and fix a time when the earth was formed (EF). Both would be suitable as absolute zeros for geophysical applications. I am definitetly not suggesting this, but it shows that in principle we could have units second_BB or second_EF as an absolute scale (no negative numbers) and then relative scales having units second_BCE, second_19700101000000 and what not.

@martinjuckes
Copy link

@larsbarring , the suggestion that "1°C = 274.15K" (above) in CF whereas "1°C = 1K" is the rule in SI and ISO standards illustrates what I would consider to be a fundamental difference between the two approaches. In the one case, the "unit" is being used to define an amount, in the other it is being used both to define an amount and a reference value. As it stands, our "degree Celcius" has a very different meaning to the SI unit degree Celcius.

I'm not sure what to make of the discussion of degrees_east and degrees_north -- there appears to be some uncertainty about what is meant by "conformance". From a UDUNITS perspective they are interchangeable, which makes sense if you accept that "east" and "north" are directions rather than units ... the units of measure in both cases are degrees (though the rationale for 1 degrees_west = -1 degrees_north is a little harder to unpick). The problem appears to be that here, as with temperature, CF is following an approach which pack additional information into the unit string which goes beyond the unit of measure, and is doing it in a way which has not been clearly explained. The discussion above would appear to indicate the degrees_north is a valid coordinate for longitude (and this is accepted by the cf-checker), but I don't think it should be. The problem here is that additional information is bundled into the units string and the assumption that UDUNITS will do something sensible is not valid (because UDUNITS has no way of dealing sensibly with this situation).

There appears to be a consensus that conforming to UDUNITS for temperature (i.e. the rule that a quantity expressed in units of °C is a temperature on the Celcius scale, but any power of °C is directly equivalent to Kelvin: "1°C = 274.15K" and "1°C2 = 1K2") is not a good idea. UDUNITS does not know anything about standard names or qualifiers, so following UDUNITS, as implied, I believe, by the current convention, would require us to adopt this interpretation consistently and disallow "°C" for temperature differences. @JonathanGregory , @larsbarring : do you agree with the last statement?

@JonathanGregory
Copy link
Contributor

Dear @martinjuckes

Yes, I agree that degree_north should be disallowed by the checker as a unit of longitude. That is a specific rule we ought to include. As I'm sure you know, the special units for longitude and latitude were inherited from COARDS. The north/east distinction is redundant with standard names, but standard names are optional. This is not an ideal convention, but it is how it is, I would say.

I don't think we should disallow degC for temperature differences because it is commonly used as a unit for temperature differences in practice, such as in the example of equilibrium climate sensitivity that I mentioned. Historical and future global warming are also often stated in degC, not K, for example in IPCC reports. I don't think data-writers would respect our prohibition of it.

Rather, I think we have to add some special flag for quantities which have units of temperature. To decide what we should do, I feel that a critical question is whether temperature is the only quantity which has this kind of problem (apart from time, which we handle in another way already).

Best wishes

Jonathan

@martinjuckes
Copy link

Dear @JonathanGregory ,

I'm glad you are not proposing to disallow degC, though I'm not sure why you the feel the need to comment on this idea. Has anyone proposed it?

Backward compatibility with the 1995 COARDS conventions has been important, but, given the current interpretation of the backward compatibility policy, are there any use cases of requiring that new versions of the CF Conventions maintain equivalence with all aspects of the COARDS convention? To me, this level of constraint appears disproportionate.

Identifying which variables have "this kind of problem" is certainly a good idea. My view, and I'm not sure whether you disagree or whether I haven't communicated it clearly, is this this kind of problem relates to use of units strings which are not units of measure in the conventional sense. That is, units strings such as degC, degrees_east, days since ... which convey information about the quantity being measures as well as the measurement unit.

@larsbarring
Copy link
Author

larsbarring commented Oct 12, 2021

@martinjuckes: I might not have read the SI Brochure you referred to careful enough, but

"1°C = 1K" is the rule in SI and ISO standards illustrates

in the Table on p.138 carries a footnote that states

The degree Celsius is used to express Celsius temperatures. The numerical value of a temperature
difference or temperature interval is the same when expressed in either degrees Celsius or in kelvin.

which basically is what I wrote above; Δ°C = ΔK. On the preceding pages (that I only had a quick look at) of the SI Brochure, the text elaborates the relationship between the units and their different origins. While I intuitively like your wording ""unit" is being used to define an amount" I am not sure it helps us here, because as Jonathan writes both units K and degC carries implicit information about their individual origins. The key thing is that one scale is absolute, i.e. does not allow negative numbers for physical reasons. So, after all, maybe I was onto something when writing "absolute" and "relative":

  • absolute scale (Ua): a scale that has zero as its origin and where negative values are impossible for physical or otherwise principal reasons.
  • relative scale (Ur): a scale that has an origin k (k≠ 0), which allows both positive and negative numbers. It is related to the absolute scale through Ua = Ur × m + k, where m is a multiplicative scale factor. Values below -k/m are not physically meaningful.
  • differences : differences expressed in the respective scales, then ΔUa = ΔUr × m.

This should cover any reasonable temperature unit conversion. For the purpose of CF I think that at least K, °C and °F have real use cases, the other ones are maybe more of historical interest (I came across °Réaumur in an earlier project on long observational timeseries).

@JonathanGregory
Copy link
Contributor

Dear @martinjuckes

I thought you were proposing to ban degC for temperature differences! I misunderstood you, sorry. Actually you were asking whether we agreed that to ban it would be inevitable if we chose to follow udunits to its logical conclusion - is that right?

I do agree with you that units should not contain the definition of the quantity, in an ideal convention. CF is mostly ideal in this respect, but we have this exception for degrees, and some for dimensionless vertical coordinates, described in section 3.1. These were all originally included for COARDS compatibility. I don't know whether COARDS is obsolete now, but since we built them into CF at the start it would be hard to remove them now, certainly the one about degrees.

I don't think that @larsbarring's issue is exactly that one, though. It's about having to use a different units conversion for differences and "absolute" values (meaning different wrt to an implicit reference value).

Best wishes

Jonathan

@Armin-RS
Copy link

As a physicist following this discussion, I would like to mention that there are even negative Kelvin temperatures in systems with an unintuitive distribution of energy, see https://en.wikipedia.org/wiki/Negative_temperature.
The most common of these systems probably are lasers which are in wide use in the atmospheric sciences (one could think of instrument housekeeping data).
So even Kelvin temperatures can be below 0.0.

@larsbarring
Copy link
Author

@Armin-RS, thanks for this interesting perspective, it was all news to me. In an earlier comment I was almost writing something like "Kelvin scale is always positive until future physics tells us otherwise" but now see that the future arrived already back in 1949. Anyway, and irrespective of whether we have any real use case for negative kelvin values or not, do you think that this has any material impact on the discussion above (except my characterisation of the absolute scale)?

@Armin-RS
Copy link

Dear @larsbarring,
no, except for that you should not rely on the condition that a temperature < 0.0 necessarily has to be in degree Celsius/Fahrenheit/Reaumur.

@martinjuckes
Copy link

Dear @JonathanGregory :

On degC : yes, I was pointing out the consequences of adopting UDUNITS as a pseudo-standard (i.e. referring to it as if it were a standard even though it is clear to anyone who looks beneath the surface that it is not a standard in any meaningful sense).

@larsbarring : I agree that the footnote you quote above is consistent with your approach, but the table on page 138 also states without ambiguity that °C = K. It is a simple factual observation. I won't bother repeating it again if it is unwelcome.

@larsbarring
Copy link
Author

larsbarring commented Oct 16, 2021

@martinjuckes: thanks for drawing attention to this wider context. I do agree that the frequent references to the since quite some time non-existing file udunits.dat need to be replaced with references to an existing standard, like SI. But this is a more general question that should be dealt with in a separate issue, which I will be happy to contribute to.

But, as @JonathanGregory writes the question here is how the CF Conventions handle the distinction between 'absolute' temperature and temperature differences. If we follow the SI definition, °C = K that Martin points out is not confined to temperature differences only, what does it mean for CF? This is not clear to me.

What I however want to emphasize is that CF needs to make a clear distinction between 'absolute' temperatures and temperature differences. And this needs to be done irrespective of whether UDUNITS remains as a "pseudo-standard" or the SI system is introduced, or something else. To me, the immediate question is whether progress on the current issue materially depends on this choice.

@JonathanGregory
Copy link
Contributor

The definition of the SI system by the International Bureau of Weights and Measures (cited previously by Martin) recognises temperature and temperature difference as two different uses of the units of degC and K, and gives the different rules for conversion in the two cases. I suggest that we add a new field to the standard name table, defined for every case where the canonical unit contains K and not in other cases, to indicate whether K refers to temperature or temperature difference in each case. In the conventions we should also add a statement that udunits should not be used to convert units containing K of temperature difference.

@taylor13
Copy link

I haven't read this thread carefully, but it appears that we are ruling out use of the standard name "air_temperature" when it represents some difference (say between two pressure levels or two surface stations). I note that we have a standard name "sea_water_temperature_difference" but not "air_temperature_difference". It seems that the property being measured is the temperature and the fact that it is a difference shouldn't change necessarily change that, should it? Anyway, is it clear which variables when representing a "difference" require new standard names? Do we need to double the size of the standard name table?

@larsbarring
Copy link
Author

larsbarring commented Oct 19, 2021

@taylor13, I am not sure what you mean when you write

... standard name "air_temperature" when it represents some difference (say between two pressure levels or two surface stations). <....> It seems that the property being measured is the temperature and the fact that it is a difference shouldn't change necessarily change that, should it?

When we have temperature measured at two surface stations, or two pressure levels, that is what is measured. The difference as such is not measured but calculated. And this calculation have implications for how to interpret the units. As @martinjuckes and @JonathanGregory points out, this is made explicit in the SI system.

For example use the temperature in London and Reykjavik, and starting with degree Celsius:

Reykjavik London Δ
6 17 -11 or +11

If the sign is not important (sometimes it is!) it could be called the "absolute_difference", which is the equivalent to the cell method range. Now, if we naively use UDUNITS to translate these numbers to Kelvin we get after rounding

Reykjavik London Δ
279 290 262 or 284

where the values for Δ do not make sense as differences. That is, the temperature difference is not the same thing as an 'ordinary' temperature. This becomes even more clear if one calculates the difference in Kelvin the proper way:

Reykjavik London Δ
279 290 -11 or +11

Here the variant "-11K" is perfectly reasonable, but still something completely different from the negative absolute temperatures that @Armin-RS was referring to.

And the same problem occurs if we use UDUNITS to naively translate to degree Fahrenheit:

Reykjavik London Δ
42.8 62.6 12.2 or 51.8

which properly should be

Reykjavik London Δ
42.8 62.6 -19.8 or +19.8

EDIT: This example highlights the need for a standard name air_temperature_difference, but this a conversation for another issue.
EDIT(2): From the comments below it is clear to me that this was not a good idea.

@larsbarring
Copy link
Author

Dear Jonathan @JonathanGregory,

I agree with your suggestion to

add a new field to the standard name table, defined for every case where the canonical unit contains K and not in other cases, to indicate whether K refers to temperature or temperature difference in each case.

But I have come to realise that this is not enough. As you have pointed out at several occasions the standard name do no always give enough detail for interpreting the content of the data variable. Additional necessary information may be found in the cell methods attribute and/or standard name modifiers. So, I think something more is needed, probably in relation to Appendix E, Appendix C and in Section 3.1 Units

@JonathanGregory
Copy link
Contributor

Dear Karl @taylor13 and @larsbarring

We wouldn't be doubling the size of the standard name table, because this issue refers only to temperature. The great majority of quantities with K in their units mean it as a temperature difference (with a positive or negative power, often K-1 i.e. per degree). I think it is just the quantities in pure K which are ambiguous. I agree that we would need to define new standard names of temperature difference for every existing temperature quantity (in pure K) that appears in the table. There are about 50 of them (excluding ones which say they are difference, change or anomaly), so perhaps this is not the best solution.

An alternative, which might address Lars's point, would be to add something to the units string to distinguish units of temperature from units of temperature difference. We could, for instance, introduce a new unit of delta_K which could be used as an alternative unit for those quantities whose canonical unit is pure K. Thus the distinction would be made in practice using the units: temperatures are in K, temperature differences in delta_K. Since delta_K is not a udunit, it couldn't get incorrectly converted to degC. Experimenting with udunits, I see that it only suggests adding or subtracting 273.15 for pure K. It knows, for instance, that 1 K s-1 = 1 degC s-1.

Best wishes

Jonathan

@sebvi
Copy link

sebvi commented Oct 19, 2021

Dear everyone,

it is a very interesting discussion I was quietly following in the background.

I think the discussion goes beyond Celsius vs Kelvin or the units of variable difference or how to relate a difference in Celsius to a difference in Kelvin. My feeling is that this particular case needs to be seen in a more general context. There is lots of confusion and misunderstanding in the discussion that mostly comes from simplifications from the general case.

To define a unit, one need a reference, a scale and a direction that represents "increasing/decreasing" (not necessarily positive/negative if the reference is not set to be 0). At the end, it is just like ticks an “axis”.

The definition of the reference is commonly omitted or forgotten in the literature because the units are mostly expressed as “Δ" so that the reference vanishes: (T1-Tref) – (T2-Tref) = T1-T2. Then you choose T1 and T2 to be exactly 1 unit of scale: it is the 1°C/1K/1m/1J/etc. used everywhere in unit conversion formulas and definitions.

For °C, the reference is the freezing point temperature of pure water and when I say 25°C, I, in fact, mean 25°C above 0°C or the freezing point temperature of pure water.
It is the same mechanism with any units: assume I have a pencil and I want to measure its length using a ruler: I could choose to put the 0cm (reference) of the ruler at one end of the pencil and read 12cm at the other end. But, I could just as well put the 3cm mark at one end and read 15cm at the other end. In both cases the pencil is 12cm and in both case it is a difference with respect to a reference. When the reference is 0 (99.99% of the time), The reference is simply dropped.

When doing units conversion you need to do 2 things: 1) apply an offset of reference from one unit to the other unit and 2) rescale.
The general formula is:
X(i) = a(i/j) ( X(j) + b(j) )
Where i and j are the units of X you convert to/from, a(i/j) is the scaling factor and b(j) is the offset between the references expressed in the same unit than you are converting from.
Note that you could define it as X(i) = a(i/j) * X(j) + b(i) and in that case b(i) is in the unit you are converting to.

For most units, the reference is the identical, i.e. the reference in one unit does not need an offset to convert to the other unit and you are simply left with a rescaling. For instance:
E(J) = 6.242e+18 J/eV *E(eV)
Where 6.242e+18 J/eV is the scale factor

In the case of Celsius <-> Kelvin, there is no rescaling needed (well… the scaling is 1) but the reference is not the same:
T(°C) = 1°C/K * (T(K) + 273.15K) = 1 °C/K * T(K) + 273.15°C

In the scientific literature and in textbooks, this is commonly “simplified” by dropping the units of the scale factor (or in the specific case °C<->K, dropping the whole scale factor) and dropping the units of the reference offset.

An interesting case is Celsius <-> Fahrenheit because it is one of these cases where the reference changes AND you need to rescale:
T(°C) = 5/9 °C/°F * (T(°F) – 32°F)
Or the other way around:
T(°F) = 9/5 °F/°C * (T(°C) + 160/9 °C)

Now if you want to look at differences of Temperature in Celsius translated to differences in Kelvin:
T1(°C) – T2(°C) = 1 °C/K * (T1(K) - 273.15K) – (1 °C/K * (T2(K) - 273.15K))
T1(°C) – T2(°C) = 1 °C/K * (T1(K) – T2(K))

The offset between the references vanishes and it is where most of the confusion in this discussion comes from! But another problem arises, briefly mentioned by Lars: T1-T2 = -(T2 – T1), i.e. a difference is an anti-commutative operation so it requires a convention…

But, writing:
°C = K is wrong
1°C = 1K is wrong
ΔT(°C) = ΔT(K) is better but not rigorously precise

The correct way to write it would be: ΔT(°C) = 1 °C/K * ΔT(K)

It is similar with altitude, elevation and height: they share the same scale but have different references.

All that to say that the units of temperature and temperature difference are identical in terms of scale but a temperature difference does not require a reference (what Lars refers to as absolute vs relative). We can make an analogy with time vs period, they can be expressed in hours, one wrt a reference (time), the other one being a difference between 2 times (period). The ambiguity is lifted by the use of “time” and “period” in the standard names and an explicit reference in the case of a time. A similar approach could be used.

If CF wanted to adopt a generic approach to do the distinction between temperature and temperature difference, I would say Temperature should have units “°C wrt pure water freezing point temperature” and temperature difference simply “°C”. But “°C” implies that the reference is the pure water freezing point temperature so it is redundant!
With this in mind the only option left is using meaningful standard names. Lars’s use cases are not any temperature differences but very specific, well defined ones so I would suggest to create new standard names for these.

Regarding the problem with UDUNITS conversion, I don’t think it is a CF problem but more an UDUNITS problem (or a user problem). The solution is probably to ask UDUNITS to create new pseudo units, maybe “°C relative”, to distinguish relative vs absolute temperatures. I must say I am totally against it, I already think that degree north, degree east and hours since are pure heresy from a scientific point of view.

@taylor13
Copy link

I had earlier today come up with the same idea as @JonathanGregory that we might define a unit for temperature that included "change" as an essential qualifier. I decided to sit on it for a couple of days before suggesting that unsettling "solution" to our problem to see if I could come up with a good argument for not proposing it. Now that it has been suggested, I do think it merits some consideration. The only problem with the standard name air_temperature seems to be that it doesn't differentiate between a difference in air temperatures and an absolute temperature, which is important when doing units transformation. Otherwise temperature is just like air wind speed, humidity, or pressure, and no one is proposing that we need to distinguish in these cases between differences and absolute values.

Perhaps the best argument in favor of a unit that is not widely recognized is that software designed to convert from one unit to another will be stumped and won't produce a wrong answer.

@larsbarring
Copy link
Author

larsbarring commented Oct 20, 2021

Dear all,

@sebvi, Thank you for the excellent clarification of the general principles of unit conversion.

I think that we now are beginning to converge towards a common understanding of what the problem is.


@JonathanGregory, @taylor13, I, too, was considering to introduce delta-units. There have to be one such for each temperature unit (°C, K, °F, °Re, °R, ....), but this is not a problem as such. And this is the route taken by another unit conversion package, pint, that I have heard of. But I felt a bit reluctant towards this idea; something that was reinforced by @martinjuckes' comment:

...this kind of problem relates to use of units strings which are not units of measure in the conventional sense. That is, units strings such as degC, degrees_east, days since ... which convey information about the quantity being measures as well as the measurement unit.

And @sebvi's comment, and in particular his conclusion:

... to ask UDUNITS to create new pseudo units, maybe “°C relative”, to distinguish relative vs absolute temperatures. I am totally against it, I already think that degree north, degree east and hours since are pure heresy from a scientific point of view.

reinforces this even further. It not clear to me that merely introducing delta-variants of the temperature units will solve the problem. I think that it would be more like a "quick hack" that likely will bite us back in the long run.


@JonathanGregory comments that UDUNITS handles K s-1 = 1 degC s-1, which is the canonical unit of e.g. tendency_of_air_temperature_due_to_advection (i.e. it is a difference). This also works for degF. Thus, it seems that UDUNITS somehow has at least some 'knowledge' of when to apply the additive constant and not. Taking this one step further and looking at powers, which @martinjuckes comments UDUNITS does this: 1degC² = 1K², and 1K² = 3.84F². Again, UDUNITS makes assumptions about the intent behind these units.


In an attempt at disentangling all this, let's go back to the general unit transformation formula from @sebvi

X(i) = a(i/j) ( X(j) + b(j) )
Where i and j are the units of X you convert to/from, a(i/j) is the scaling factor and b(j) is the offset between the references expressed in the same unit than you are converting from.

we note that UDUNITS has the required constants a(i/j) and b(j) (and thus b(i)) for a large number of unit conversions. To me this is an excellent starting point!

What however is missing are two things:

  • The CF Conventions do not have a clear and simple way to distinguish whether b(j) vanishes or not. Neither do they have a mechanism for distinguishing whether the a(i/j) vanishes or not. Instead the different situations are encoded in a mixture of standard names, cell methods and standard name modifiers.
  • Neither does UDUNITS have a mechanism to distinguish between these situations. It seems however that there are some assumptions regarding the quantities the units are 'intended' to represent, but these assumptions are not obvious (to me at least).

I will return to these two in follow up posts.

@martinjuckes
Copy link

@sebvi writing "°C = K" is not wrong, it is a convention which you may disagree with, but not wrong. Please check the references to ISO and SI given above.

@sebvi
Copy link

sebvi commented Oct 20, 2021

With all due respect, @martinjuckes , I will have to disagree,"°C = K" is wrong from a pure mathematical point of view, it is simply a fact. You can elude it by side-stepping the problem and call it "a convention", it is still wrong as demonstrated rigorously above.
I had checked the SI brochure and it is a shame that an authoritative body is not rigorous and does not lead by example. Their table of derived units is at best imprecise and for unit conversion involving an offset, it is wrong. I think each time a unit conversion involves an offset, they should define the conversion for both the absolute and relative quantity.
Maybe the primary audience of the SI Brochure is "Joe Public" and "Joe Public" will certainly not see or understand the difference.
It is a shame because it is the reason why we produce cohorts of students or scientists not understanding properly what are the mechanisms behind unit conversion. They simply use formulas blindly.

@larsbarring : to give a different perspective, I should mention that the "a and b approach" is the chosen mechanism we are going to use in the next iteration of GRIB (GRIB3) that we (WMO ET-data team) are developing. You are probably aware that a long-standing problem of GRIB2 is that it defines canonical units for each parameter and do not let you chose the one you prefer, typically Celsius vs Kelvin, fraction vs %, etc. .
GRIB3 will let you define how your data values can be converted back to the canonical units (or you can simply use the values in your preferred units if you don't want to convert back). In the case of an absolute temperature in Celsius, a=1 and b=-273.15 (conversion back to canonical units, not from, so Celsius to Kelvin) and in the case of a relative temperature in Celsius, a=1 and b=0.

That said, I don't think it is a good idea here, it is not in the spirit of CF to prescribe units. :)

@sebvi
Copy link

sebvi commented Apr 27, 2023

  1. Would you like to join the working group? If so then please say so in a comment below. Anyone is welcome, whether or not you have contributed to this issue before. If you have been already a contributor, please do consider taking part, and likewise if you haven't yet - a fresh perspective would be valuable.

Hi,

I am happy to contribute, provided it does not clash with other commitments I have.

@davidhassell
Copy link
Collaborator

Hi,

Many thanks to @larsbarring, @taylor13, @JonathanGregory and @sebvi for volunteering to take part in the working group (which I have also done).

This is a last a call for interested parties - please say by Friday if you'd like to take part, when I'll be contacting the group off-line to arrange a time for the first video call.

Thanks,
David

@sethmcg
Copy link

sethmcg commented May 10, 2023

HI @davidhassell,

I would like to take part, but I likely won't be able to participate until mid-June, due to upcoming commitments both personal and professional. If that's not prohibitive, sign me up.

Cheers,
Seth

(Thanks @larsbarring for the ping!)

@davidhassell
Copy link
Collaborator

davidhassell commented May 18, 2023

Hello, just to let you know that the working group is now formed, and we're in the process of arranging our first off-issue meeting. When we have something to report, I will do so here - be it at the end of the process or earlier.

The working group comprises:

Thanks,
David

@JonathanGregory
Copy link
Contributor

Following Andrew @DocOtak's comment during the CF meeting today, I'm adding this note to record that we should consider making clearer in the conventions document that CF does not assume that UDUNITS is used for units conversion. We may have discussed this already in this issue or during the meetings - I don't remember.

@semmerson
Copy link

@JonathanGregory The important thing is to codify the syntax and semantics of the unit representation. https://www.nist.gov/pml/special-publication-330 and https://www.bipm.org/en/publications/si-brochure/ are good places to start.

@JonathanGregory
Copy link
Contributor

JonathanGregory commented Nov 6, 2023

Dear all

The working group met four times in June and agreed on the changes to be made in the CF convention to deal with this issue. Unusually for CF, even after long and detailed discussion, it was not possible to reach a consensus on the best option, so the decision was made by majority vote. However, the working group members all agreed that deciding in this way was the best way to proceed.

When the working group was set up, anyone was invited to join in. It was stated on this issue that the proposal of the working group should be treated by the community as one which has already been agreed in principle. Further review on this issue should be concerned with clarity, style, and so on, not with content. No-one objected to this procedure.

The proposed changes are set out below. The main proposal is a new recommended attribute units_metadata with possible values of temperature: on_scale, temperature: difference, temperature: unknown. We propose the form keyword: value because many CF attributes are like that, and it allows for future generality without any additional complication.

If you have any suggestions for ways to improve the clarity and detail of the proposal, or if you see any conceptual difficulty that invalidates it, please comment on this issue within the next three weeks (on or before Monday 27th November). Thanks.

Best wishes

Jonathan, on behalf of the working group (@davidhassell @larsbarring @taylor13 @sebvi @semmerson @sethmcg @ethanrd)


Proposed changes to the text of the CF convention

Most of the proposed changes to the convention are in Sect 3.1 on "Units". Here's the first paragraph of that section, with text we propose to add shown in bold and text we propose to delete shown struck-through, and split into two paragraphs.

The units attribute is required for all variables that represent dimensional quantities (except for boundary variables defined in Section 7.1, "Cell Boundaries" and climatology variables defined in Section 7.4, "Climatological Statistics"). The units attribute is permitted but not required for dimensionless quantities (see Section 3.1.1).

The value of the units attribute is a string that can be recognized by the UDUNITS package [UDUNITS], with a few the exceptions that are given below in Section 3.1.1, "Dimensionless units" and Section 3.1.3, "Scale factors and offsets". Note that case is significant in the units strings. Note also that CF depends on UDUNITS only for the definition of legal units strings. CF does not assume or require that the UDUNITS software will be used for units conversion. In most units conversions, the sole operation on the data is multiplication by a scale factor. Special treatment is required in converting the units of variables that involve temperature (Section 3.1.2, "Temperature units") and the units of time coordinate variables (Section 4.4, "Time coordinate").

The changes to this paragraph are:

  • Insert a sentence about dimensionless quantities, which currently begins a later paragraph, and then begin a new paragraph to describe the role of UDUNITS.

  • Add a sentence to emphasise that CF depends on the UDUNITS syntax but not the software, following this point being made in the CF annual meeting.

  • Add two sentences introducing two other sections which describe special treatment of units. The first of these (Section 3.1.2) deals with our main issue of temperature units.

The next paragraph is unchanged from the existing text:

The COARDS convention prohibits the unit degrees altogether, but this unit is not forbidden by the CF convention because it may in fact be appropriate for a variable containing, say, solar zenith angle.
The unit degrees is also allowed on coordinate variables such as the latitude and longitude coordinates of a transformed grid.
In this case the coordinate values are not true latitudes and longitudes which must always be identified using the more specific forms of degrees as described in <<latitude-coordinate>> Section 4.1, "Latitude Coordinate" <<longitude-coordinate>> Section 4.2, "Longitude Coordinate".

Next we have a new subheading, and the existing text of the paragraphs about dimensionless quantities, except for deleting one sentence that now appears earlier, and moving one sentence.

Section 3.1.1, Dimensionless units

Units are not required for dimensionless quantities.
A variable with no units attribute is assumed to be dimensionless.
However, a units attribute specifying a dimensionless unit may optionally be included.
The canonical unit (see also <<standard-name>> Section 3.3, "Standard Name") for dimensionless quantities that represent fractions, or parts of a whole, is 1.
The UDUNITS package defines a few dimensionless units, such as percent, ppm (parts per million, 1e-6), and ppb (parts per billion, 1e-9).
When a dimensionless quantity is a ratio of dimensional quantities, CF suggests that it may be informative to users of data if the units are given as ratio of dimensional units, for instance mg kg-1 for a mass ratio of 1e-6, or microlitre litre-1 for a volume ratio of 1e-6.

The UDUNITS package defines a few dimensionless units, such as percent, ppm (parts per million, 1e-6), and ppb (parts per billion, 1e-9).
The CF convention supports dimensionless units that are UDUNITS compatible, with one exception, concerning the dimensionless units defined by UDUNITS for volume ratios, such as ppmv and ppbv.
These units are allowed in the units attribute by CF only if the data variable has no standard_name.
These units are prohibited by CF if there is a standard_name, because the standard_name defines whether the quantity is a volume ratio, so the units are needed only to indicate a dimensionless number.

Information describing a dimensionless physical quantity itself (e.g.
"area fraction" or "probability") does not belong in the units attribute, but should be given in the long_name or standard_name attributes (see <<long-name>> Section 3.2, "Long Name" and <<standard-name>> Section 3.3, "Standard Name"), in the same way as for physical quantities with dimensional units.
As an exception, to maintain backwards compatibility with COARDS, the text strings level, layer, and sigma_level are allowed in the units attribute, in order to indicate dimensionless vertical coordinates.
This use of units is not compatible with UDUNITS, and is deprecated by this standard because conventions for more precisely identifying dimensionless vertical coordinates are available (see <<dimensionless-vertical-coordinate>> Section 4.3.2, "Dimensionless Vertical Coordinate").

Next we propose to insert a new subsection, as follows. Since this is all new text, it's not shown bold.

Section 3.1.2, Temperature units

The units of temperature imply an origin (i.e. zero point) for the associated measurement scale. When the temperature value is the degree of warmth with respect to the origin of the measurement scale, we call it an on-scale temperature. When units of on-scale temperature are converted, the data may require the addition of an offset as well as multiplication by a scale factor, because the physical meaning of a numerical value of zero for an on-scale temperature depends on the unit of measurement. On-scale temperature is unique among quantities in this respect; for all other quantities, zero means the same whatever the unit of measurement. For example (using bold to indicate a numerical data value), 0 kilogram is the same mass as 0 pound, but 0 degC is not the same temperature as 0 degF (= -17.8 degC), because these two temperature units implicitly refer to measurement scales which have different origins.

On the other hand, when the temperature value is a temperature difference, which compares two on-scale temperatures with the same origin, the value of that origin is irrelevant as it cancels out when taking the difference. Therefore to change the units of a temperature difference requires only multiplication by a scale factor, without the addition of an offset.

The units attribute does not distinguish between on-scale temperatures and temperature differences. This ambiguity also affects units of temperature raised to some power e.g. K^2 or multiplied by other units e.g. W m-2 K-1, degF/foot or degC m s-1. A standard_name (Section 3.3) or standard_name modifier (Appendix C) may clarify the intention, but they are optional. Some statistical operations described by the cell_methods attribute (Section 7.3, Appendix E) imply that temperature must be interpreted as temperature difference, but this attribute is optional too.

In order to change the units correctly, it is essential to know whether a temperature is on-scale or a difference. Therefore this standard strongly recommends that any variable whose units involve a temperature unit should also have a units_metadata attribute to make the distinction. This attribute must have one of the following three values: temperature: on_scale, temperature: difference, temperature: unknown. The units_metadata attribute, standard_name modifier (Appendix C) and cell_methods attribute (Appendix E) must be consistent if present.

Example of units_metadata to distinguish temperature quantities.

variables:
  float Tonscale;
    Tonscale:long_name="global-mean surface temperature";
    Tonscale:standard_name="surface_temperature";
    Tonscale:units="degC";
    Tonscale:units_metadata="temperature: on_scale";
    Tonscale:cell_methods="area: mean";
  float Tdifference;
    Tdifference:long_name="change in global-mean surface temperature relative to pre-industrial";
    Tdifference:standard_name="surface_temperature";
    Tdifference:units="degC";
    Tdifference:units_metadata="temperature: difference";
    Tdifference:cell_methods="area: mean";

With temperature: unknown, correct conversion of the units cannot be guaranteed. This value of units_metadata indicates that the data-writer does not know whether the temperature is on-scale or a difference. If the units_metadata attribute is not present, the data-reader should assume temperature: unknown. The units_metadata attribute was introduced in CF 1.11. In data written according to versions before 1.11, temperature: unknown should be assumed for all units involving temperature, if it cannot be deduced from other metadata. We note (for guidance only for temperature: unknown, not as a CF convention) that the UDUNITS software assumes temperature: on_scale for units strings containing only a unit of temperature, and temperature: difference for units strings in which a unit of temperature is raised to any power other than unity, or multiplied or divided by any other unit.

With temperature: on_scale, correct conversion can be guaranteed only for pure temperature units. If the unit is an on-scale temperature multiplied by some other quantity, it is generally not possible to convert the data correctly from the units given, whether the canonical or some other, to any other units, because the units does not give sufficient information. For example, the standard_name of integral_wrt_depth_of_product_of_conservative_temperature_and_sea_water_density specifies a canonical unit of kg degree_C m-2. A numerical value of 1 with units="kg degree_C m-2" could mean 1 degree_C multiplied by 1 kg m-2, or 10 degree_C multiplied by 0.1 kg m-2. If temperature: on_scale, these convert to 274.15 kg K m-2 and 28.315 kg K m-2, respectively, and there are infinitely many other possibilities.

Section 3.1 will conclude with a new subsection heading, following by text based on the existing. The first sentence of this paragraph is a replacement and an expanded version of the sentence that currently appears at the end of the text, and the second sentence and changes to the third are made for clarity.

Section 3.1.3, Scale factors and offsets

UDUNITS recognises the SI prefixes shown in Table 3.1 for decimal multiples and submultiples of units, and allows them to be applied to non-SI units as well. UDUNITS offers a syntax for indicating arbitrary scale factors and offsets to be applied to a unit. (Note that this is different from the scale factors and offsets used for converting between units, as discussed for temperature in Section 3.1.2.) This UDUNITS syntax for arbitrary transformation of units is not supported by the CF standard, except for the case of specifying reference time, see section <<time-coordinate>> Section 4.4 [this section number does not currently appear in the rendered version; that's a bug to be corrected], Time Coordinate. The application of any scale factors or offsets to data should be indicated by the scale_factor and add_offset attributes. Use of these attributes for data packing, which is their most important application, is discussed in detail in <<packed-data>> Section 8.1, "Packed Data".

Table 3.1 is currently called "Supported Units". We propose to rename it as "Prefixes for decimal multiples and submultiples of units", which is how the SI standard describes them.

As well as the above changes in Section 3.1, we propose to add a footnote to Table C1, which defines the standard name modifiers, applying to the standard_error modifier, as follows: The definition of this modifier implies that if u is a either unit of temperature, or a unit of temperature multiplied by some other unit, the temperature in u must be interpreted as a temperature difference. Therefore the units_metadata attribute, if present, must have the value temperature: difference, even if the corresponding data variable without the modifier would have units_metadata="temperature: on_scale". See Section 3.1.2, Temperature units, for explanation.

We propose to add a similar footnote to Table E1, which defines the cell methods, applying to the range, standard_deviation and variance entries, as follows: The definition of this method implies that if u is a either a unit of temperature, or a unit of temperature multiplied by some other unit, the temperature in u must be interpreted as a temperature difference. Therefore the units_metadata attribute, if present, must have the value temperature: difference. See Section 3.1.2, Temperature units, for explanation.

Proposed changes to the conformance document

Recommended: Any variable whose units involve a temperature unit should also have a units_metadata attribute.

Required: If present, the units_metadata attribute must have one of these values: temperature: on_scale, temperature: difference, temperature: unknown.

Required: If the standard_name attribute includes the standard_error modifier, the units_metadata attribute, if present, must have the value temperature: difference.

Required: If the cell_methods attribute includes any entry with any of the methods range, standard_deviation or variance, the units_metadata attribute, if present, must have the value temperature: difference.

Other changes

We will append standardised text to the description of the standard name of quantities, such as air_temperature_anomaly, which have K in their canonical unit and by definition refer to a temperature difference. The proposed text is: It is strongly recommended that any variable with this standard name should have the attribute units_metadata="temperature: difference", in order to enable correct units conversions (see Section 3.1.2 of the CF convention).

We will append standardised text to the description of all other standard names whose whose canonical unit includes a temperature unit, as follows: It is strongly recommended that any variable with this standard name should have the attribute units_metadata with the value temperature: on_scale or temperature: difference, in order to enable correct units conversions (see Section 3.1.2 of the CF convention).

We propose also to create a new page on the CF website which lists the affected standard names. We can do that as soon as this proposal is agreed, at least as a temporary measure until the standard name descriptions are updated.

@larsbarring
Copy link
Author

As a member of the working group I fully support the suggested changes. Many thanks @JonathanGregory for putting together this text proposal.

@JonathanGregory
Copy link
Contributor

Dear all

I've just opened issue 481 in the conventions repo to implement the above proposal, with linked pull request 480. Reviews are welcome!

Since there's a new example 3.1, all the existing examples in section 3 have had their numbers incremented.

In preparing the PR, I noticed that UDUNITS does not currently support the mostly recently defined SI prefixes:

prefix symbol factor
quetta Q 1e30
ronna R 1e27
ronto r 1e-27
quecto q 1e-30

Because they're not in UDUNITS, CF can't support them. Should we request these to be added to UDUNITS?

Jonathan

@semmerson
Copy link

semmerson commented Nov 16, 2023 via email

@JonathanGregory
Copy link
Contributor

Dear Steve @semmerson

Thanks for promising the new prefixes. That's great. Will the new release of UDUNITS come very soon? If so, we might add them to CF now, in anticipation. What do you think?

Best wishes

Jonathan

@semmerson
Copy link

semmerson commented Nov 17, 2023 via email

@JonathanGregory
Copy link
Contributor

Oh dear, I'm sorry to hear that. CF may be a worthy cause, but it's not a vital one! I think we can leave out the new prefixes for the moment and put them in later.

@larsbarring
Copy link
Author

@JonathanGregory, a small comment on the proposed text in 3.1.2. The fourth paragraph now begins

In order to change the units correctly, it is essential to know whether a temperature is on-scale or a difference. Therefore this standard strongly recommends that any variable whose units involve a temperature unit should also have a units_metadata attribute to make the distinction.

I think that it would be more consistent to not introduce the new word "change", which might be interpreted as a having a different, and broader or more diffuse meaning than "convert". I.e. to write

In order to correctly convert the units, it is essential to know ...

@JonathanGregory
Copy link
Contributor

JonathanGregory commented Nov 20, 2023 via email

@taylor13
Copy link

I think the edit will avoid misunderstanding. Thanks, Lars and Jonathan.

@davidhassell
Copy link
Collaborator

Very nice job, Jonathan, thank you.

I have a couple more suggestions, and a deeper question ...


Looking ahead to make the transition to BCP 14 easier, could we change "suggests" for "recommends" in this line from chapter 3, which also prompts a reordering of the sentence (old, new):

When a dimensionless quantity is a ratio of dimensional quantities, CF suggests recommends that it may be informative to users of data if the units are given as a ratio of dimensional units, for instance mg kg-1 for a mass ratio of 1e-6, or microlitre litre-1 for a volume ratio of 1e-6, as this may be informative to users of the data.


We say:

"On-scale temperature is unique [PR emphasis] among quantities in this respect; for all other quantities, zero means the same whatever the unit of measurement.".

Is this uniqueness true, though? We already note that special treatment is needed to for time quantities - and those suffer similar zero-related problems. I haven't checked back in the above discussion, but recall other that units were mentions where zero-related concerns could exist. Would it be fairer to say:

"On-scale temperature is unusual among quantities in this respect; for nearly all other quantities, zero means the same whatever the unit of measurement."


We say:

"For example, the standard_name of integral_wrt_depth_of_product_of_conservative_temperature_and_sea_water_density specifies a canonical unit of kg degree_C m-2. A numerical value of 1 with units="kg degree_C m-2" could mean 1 degree_C multiplied by 1 kg m-2, or 10 degree_C multiplied by 0.1 kg m-2.
If temperature: on_scale, these convert to 274.15 kg K m-2 and 28.315 kg K m-2, respectively, and there are infinitely many other possibilities."

I'm afraid I still don't get this example - what am I missing? I would say that there are not infinite possibilities (the numerical values of the kg, degree_C and m-2 quantities are moot once we've multiplied them), rather that, in the on-scale case, it's simply impossible to apply a non-zero conversion offset when the unit is multiplied by other others. I.e. kg degree_C m-2 -> kg (K - 273.15) m-2 = kg K m-2 - k 273.15 m-2, which is nonsense. Not sure yet on how I'd explain that in formal text.

@JonathanGregory
Copy link
Contributor

Dear @davidhassell

Thanks for your comments.

The sentence

When a dimensionless quantity is a ratio of dimensional quantities, CF suggests that it may be informative to users of data if the units are given as ratio of dimensional units, for instance mg kg-1 for a mass ratio of 1e-6, or microlitre litre-1 for a volume ratio of 1e-6.

was inserted not long ago (I don't remember exactly when). We didn't mean "recommend" here, but something milder. We are inviting the data-writer to consider this point, rather than suggesting what they should do about it. We could leave this until it's considered for BCP, since it isn't a matter relevant to the present issue.

On-scale temperature is unique among quantities in this respect; for all other quantities, zero means the same whatever the unit of measurement.

Your point is that on-scale temperature is one of several quantities for which physical meaning of zero depends on the choice of origin, as for time and longitude, for example. In these other cases, the origin and the unit of measurement are two different things, separately chosen. Therefore in those cases the numerical value of zero does not depend on the unit of measurement. 0 seconds since 1970-1-1 means the same as 0 days since 1970-1-1, and an easting (map coordinate) of 0 metre means the same as 0 kilometer. However, for on-scale temperature, the origin and the unit of measurement are linked. If you change the unit of measurement, you also change the origin (and vice-versa), in general.

Would this be clearer:

On-scale temperature is unique among quantities in the respect that the origin and the unit of measurement are both defined by the units and therefore cannot be chosen independently. For all other quantities, the origin and the unit of measurement are independent; hence, converting the unit of measurement does not change the meaning of zero. For example (using bold to indicate a numerical data value), 0 kilogram is the same mass as 0 pound, and 0 seconds since 1970-1-1 means the same as 0 days since 1970-1-1, but 0 degC is not the same temperature as 0 degF (= -17.8 degC), because these two temperature units implicitly refer to measurement scales which have different origins.

In the third case, you say

it's simply impossible to apply a non-zero conversion offset when the unit is multiplied by other others. i.e. kg degree_C m-2 -> kg (K - 273.15) m-2 = kg K m-2 - kg 273.15 m-2, which is nonsense.

and I wrote

If the unit is an on-scale temperature multiplied by some other quantity, it is generally not possible to convert the data correctly from the units given, whether the canonical or some other, to any other units, because the units does not give sufficient information.

We are talking about the same problem in different ways, I think. I would say that the conversion would be possible if you knew more than the units tell you. I don't think the conversion is nonsense, but I do think this kind of quantity is generally nonsensical. In order to convert z kg degree_C m-2 into kg K m-2, where z = x kg m-2 * y degree_C, you have to know y, which could have any value. That's why I said there are infinitely many possible answers.

Best wishes

Jonathan

@davidhassell
Copy link
Collaborator

Dear Jonathan,

Thank you considering my points. Carrying on the discussion ...

We could leave it to the BCP discussion, but the intention may not be obvious to those working on the conversion at that time. Given that this change is going to happen, I don't think we should make it more difficult than necessary. When we say "suggest", do we mean "RECOMMENDED" or "MAY"? We should be able to answer this, and if we don't choose one of these now (but not in capitals), then I suspect that the BCP converters won't be able to, either.

In the case of uniqueness, what about seconds since 1970-1-1 and days since 1970-1-2. These are both convertible to each other and have different meanings of zero, that are a day apart. Or am I missing something?

On the third point, I think I know why I'm confused ... You say that the conversion would be possible if you knew more than the units tell you, but I would counter that by noting that we never know more than the units tell us, so I don't agree with the analysis based on that assumption.

I would like to suggest replacing that whole paragraph with:

  • With temperature: on_scale, correct conversion can be guaranteed only for pure temperature units. If the unit is an on-scale temperature multiplied by some other quantity, it is not possible to convert the data from the units given to any other units that involve a temperature with a different origin. For instance, when temperature is on-scale, values of kg degree_C m-2 can not be converted to values of kg K m-2, but that conversion would be possible for a temperature differences.

All the best,
David

@JonathanGregory
Copy link
Contributor

Dear @davidhassell

At the moment, this phrasing is so mild that it's off the bottom of the BCP14 scale:

When a dimensionless quantity is a ratio of dimensional quantities, CF suggests that it may be informative to users of data if the units are given as ratio of dimensional units, for instance mg kg-1 for a mass ratio of 1e-6, or microlitre litre-1 for a volume ratio of 1e-6.

How about

As an alternative to the canonical units of "1" or some other unitless number, the units for a dimensionless quantity may be given as a ratio of dimensional units, for instance mg kg-1 for a mass ratio of 1e-6, or microlitre litre-1 for a volume ratio of 1e-6. Data-producers are invited to consider whether this alternative would be more helpful to the users of their data.

Here, "may" is a BCP14 word i.e. specifying dimensionless units this way is optional, not recommended.

In the case of uniqueness, what about seconds since 1970-1-1 and days since 1970-1-2. These are both convertible to each other and have different meanings of zero, that are a day apart.

Yes, because they have different origins. The point applies to units which have different units of measure but the same origin. Is this any clearer:

On-scale temperature is unique among quantities in the respect that the origin and the unit of measurement are both defined by the units and therefore cannot be chosen independently. For all other quantities, the origin and the unit of measurement are independent. Converting the unit of measurement alone, without changing the origin, does not change the meaning of zero. For example (using bold to indicate a numerical data value), 0 kilogram is the same mass as 0 pound, and 0 seconds since 1970-1-1 means the same as 0 days since 1970-1-1, but 0 degC is not the same temperature as 0 degF (= -17.8 degC), because these two temperature units implicitly refer to measurement scales which have different origins.

On the third point, I don't think you can say it's not possible to do the conversion. If you look only at the units attribute, of course you will only know the units, but the program or person that's doing the conversion may well have more information about how the data was calculated. I imagine that my numerical example caused dismay or confusion, rather than clarity, as I intended, and I'd be happy to omit the numerical example. Here's a modified version of your paragraph:

With temperature: on_scale, correct conversion can be guaranteed only for pure temperature units. If the quantity is an on-scale temperature multiplied by some other quantity, it is not possible to convert the data from the units given to any other units that involve a temperature with a different origin, given only the units. For instance, when temperature is on-scale, a values in kg degree_C m-2 can be converted to a value in kg K m-2 only if we know the individual values in degree_C and kg m-2 of which it is the product.

Best wishes

Jonathan

@davidhassell
Copy link
Collaborator

Dear Jonathan,

Thank you for you patience and for humouring my questions and suggested changes. It may just be me, but I find your proposed new text suggestions much easier to understand (and I also accept that that original text was not wrong!), and would be happy for those to go in.

It'd be nice to hear some other opinions on these points, though, in case my tribulations are not shared.

All the best,
David

@JonathanGregory
Copy link
Contributor

Dear @davidhassell

Thanks for the collaboration. I have made those changes in the pull request and the rendered versions (after a struggle with rebasing - we live and learn).

Best wishes

Jonathan

@larsbarring
Copy link
Author

Thanks @davidhassell and @JonathanGregory for this fine-tuning of the text. I agree that with the recent changes the clarity is much improved.

@davidhassell
Copy link
Collaborator

Thank you, @JonathanGregory. With these points resolved, I've finished my review and am more than happy with the PR you have prepared.

@taylor13
Copy link

Not sure it will be a quick read for anyone, but in the end it is clear, so good to go, I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.