-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How should multi-model statistics handle daily data on different calendars? #1210
Comments
My own suggestion would be to implement the first alternative here, i.e. an overlap mode and a full mode as closely following the current implementation behavior as sensible for now (the bugfix release). But I am also interested to have a discussion on the long-term best scientific strategy. If that differs from the bugfix strategy, we may tackle that after the 2.3.0 release of the tool. In any case, the documentation should be updated if we support daily data. |
We had an implementation at some point where we did something much closer to the original behaviour: ESMValCore/esmvalcore/preprocessor/_multimodel.py Lines 130 to 164 in a43ecb9
I thought the implementation was quite elegant from a readability point of view, but we weren't happy that we had to use an interpolation method for iris to extend a cube (it doesn't actually interpolate, but just masks missing values). We abandoned it because it didn't play well with our lazy aspirations. However, this might be something we could use as a "bugfix" for now. |
And the problem with the lazy aspirations is the lack of a |
Don't really remember.. realizing only the values of the time arrays shouldn't be a problem I suppose. I think it had to do with the interpolation. |
Ok, sounds good to me. Could you go ahead with this approach? |
I can give it a try, but I'm still not quite sure if we have agreed what the desired behaviour is. Effectively what this will do is:
|
I think the goal for now is to emulate the behavior of the old code. For |
Yes, we worked around this, because it is not lazy. |
Ok, most importantly, since this is really urgent, let's do as discussed above with the previous code, even though it is not lazy. Long-term, it is understandable that there are no general, lazy dask intersect1d and union1d routines since for both one needs to take all of the data into account, these are in some sense fancy sorting routines, and dask doesn't do sorting. However, I think we can exploit the pre-sorted nature of the time axis to build a custom lazy thing around that. |
see #1212
I was a bit too quick there. Actually, for a leap day, #1212 uses nearest-neighbour lookup to fill the missing data. Masking only happens outside the original date range. ATM I don't see an easy way to mask the missing days in the interior (okay perhaps through xarray...). |
From discussions in #1212 it seems a bit more work is needed and probably also #744. Furthermore, the only recipe that had been using the multi-model statistics on daily data doesn't do so anymore. Since our documentation says that (sub-)daily data isn't supported and a warning to that effect is issued as well, we will bump this to 2.4.0. |
Moving this to v2.6 since there is not open PR yet. |
More comprehensive metadata handling in the multi-model statistics recently turned up a conceptual issue for multi-model statistics on daily data with different calendars.
The issue is that in this case there are days, for which only a subset of models may provide data. This happens usually for datasets that contain leap days (gregorian or standard calendar, all_leap) vs those that don't (noleap), or with more unusual calendars (360_day, 30 days in every month); see CF conventions 1.7, Sect. 4.4.1 for the full list of supported calendars).
The old version of the multi-model statistics has two modes (selected with the parameter
span
). Inoverlap
mode, it discards all days that don't appear in all datasets, producing a result that leaves out those days. Infull
mode, [missing until I have done a run in full mode].However, the documentation states that
The new version of the multi-model statistics follows the documentation by throwing an exception, albeit a cryptic one.
Alternative strategies could be
overlap
/full
strategiesSome aspects of this have been discussed in #1201 and #1198.
This issue has previously been mentioned in #937, #744.
It is also connected to #781.
The text was updated successfully, but these errors were encountered: