Inconsistent calculation of standard deviation between preprocessors #1024

stefsmeets · 2021-02-26T09:23:37Z

Describe the bug

Hi everyone, I noticed that iris.analysis.STD_DEV returns a different result than np.ma.std. Both are used throughout the code, but the result is inconsistent.

Although iris uses the same underlying function (np.ma.std) to calculate the standard deviation, they use a default setting for ddof=1, instead of 0, which is the default in numpy. More info: https://numpy.org/doc/stable/reference/generated/numpy.std.html

For example, the implementation of the multimodel_statistics uses uses np.ma.std, wheras climate_statistics uses np.iris.analysis.STD_DEV. This means that they will return a different value when the user specifies operator: std in their recipe, which may be confusing.

I would just like some input on what to do with this in view of #968. Since we seem to be aligning our code more and more with iris, do we also align our implementation of std with the iris defaults?

Example:

In [1]: import numpy as np

In [2]: data = np.ma.array(
    data=[[  1,   1,   1],
          [999,   5,   5],
          [  9,   9, 999]],
    mask=[[False, False, False],
          [ True, False, False],
          [False, False,  True]])

In [3]: # numpy default behaviour

In [4]: np.ma.std(data, axis=0)
Out[4]:
masked_array(data=[4.0, 3.26, 2.0],
             mask=[False, False, False],
       fill_value=1e+20)

In [5]: import iris.analysis

In [6]: # iris default behaviour

In [7]: iris.analysis.STD_DEV.aggregate(data, axis=0)  
Out[7]:
masked_array(data=[5.65, 4.0, 2.82],
             mask=[False, False, False],
       fill_value=1e+20)

In [8]: # behaviour reproduced in numpy (with `ddof=1`, the default in iris 3.0.1 `STD_DEV`)

In [9]: np.ma.std(data, axis=0, ddof=1)
Out[9]:
masked_array(data=[5.65, 4.0, 2.82],
             mask=[False, False, False],
       fill_value=1e+20)

The text was updated successfully, but these errors were encountered:

valeriupredoi · 2021-03-01T12:58:06Z

wait - you saying that there could be differences of about 50% in stdev calculations? You should probably raise this first with iris folk eg @bjlittle and see why they are not using the default degs of freedom, they rewrite freedom in another way 🇨🇳 😁

schlunma · 2021-03-01T13:29:50Z

@valeriupredoi you need to be careful here. it's hard to define a "default" setting in this context.

To quote numpy here: "In standard statistical practice, ddof=1 provides an unbiased estimator of the variance of the infinite population. ddof=0 provides a maximum likelihood estimate of the variance for normally distributed variables. The standard deviation computed in this function is the square root of the estimated variance, so even with ddof=1, it will not be an unbiased estimate of the standard deviation per se."

So if you want to estimate a population's standard deviation with only a sample, you need this "Bessel correction". In my opinion this feels like the correct choice when we deal with a multi-model ensemble of climate models, so I agree with iris here.

valeriupredoi · 2021-03-01T13:59:51Z

and that is a very good explanation @schlunma - many thanks for it! Sounds to me like we should follow Chairman Iris in this case then 😁

schlunma · 2022-01-18T09:18:14Z

I think we can close this now, after the refactoring of multi_model_statistics I can't find a reference to np.std or np.ma.std in the code anymore. Please re-open if necessary.

stefsmeets mentioned this issue Feb 26, 2021

Update tests for multimodel statistics preprocessor #1023

Merged

10 tasks

schlunma closed this as completed Jan 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent calculation of standard deviation between preprocessors #1024

Inconsistent calculation of standard deviation between preprocessors #1024

stefsmeets commented Feb 26, 2021 •

edited

Loading

valeriupredoi commented Mar 1, 2021

schlunma commented Mar 1, 2021

valeriupredoi commented Mar 1, 2021

schlunma commented Jan 18, 2022

Inconsistent calculation of standard deviation between preprocessors #1024

Inconsistent calculation of standard deviation between preprocessors #1024

Comments

stefsmeets commented Feb 26, 2021 • edited Loading

valeriupredoi commented Mar 1, 2021

schlunma commented Mar 1, 2021

valeriupredoi commented Mar 1, 2021

schlunma commented Jan 18, 2022

stefsmeets commented Feb 26, 2021 •

edited

Loading