-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent calculation of standard deviation between preprocessors #1024
Comments
wait - you saying that there could be differences of about 50% in stdev calculations? You should probably raise this first with iris folk eg @bjlittle and see why they are not using the default degs of freedom, they rewrite freedom in another way 🇨🇳 😁 |
@valeriupredoi you need to be careful here. it's hard to define a "default" setting in this context. To quote numpy here: "In standard statistical practice, ddof=1 provides an unbiased estimator of the variance of the infinite population. ddof=0 provides a maximum likelihood estimate of the variance for normally distributed variables. The standard deviation computed in this function is the square root of the estimated variance, so even with ddof=1, it will not be an unbiased estimate of the standard deviation per se." So if you want to estimate a population's standard deviation with only a sample, you need this "Bessel correction". In my opinion this feels like the correct choice when we deal with a multi-model ensemble of climate models, so I agree with |
and that is a very good explanation @schlunma - many thanks for it! Sounds to me like we should follow Chairman Iris in this case then 😁 |
I think we can close this now, after the refactoring of |
Describe the bug
Hi everyone, I noticed that
iris.analysis.STD_DEV
returns a different result thannp.ma.std
. Both are used throughout the code, but the result is inconsistent.Although
iris
uses the same underlying function (np.ma.std
) to calculate the standard deviation, they use a default setting forddof=1
, instead of0
, which is the default innumpy
. More info: https://numpy.org/doc/stable/reference/generated/numpy.std.htmlFor example, the implementation of the
multimodel_statistics
uses usesnp.ma.std
, wherasclimate_statistics
usesnp.iris.analysis.STD_DEV
. This means that they will return a different value when the user specifiesoperator: std
in their recipe, which may be confusing.I would just like some input on what to do with this in view of #968. Since we seem to be aligning our code more and more with
iris
, do we also align our implementation ofstd
with theiris
defaults?Example:
The text was updated successfully, but these errors were encountered: