-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use native iris functions in multi-model statistics #1150
Conversation
@zklaus as promised I've split off the refactoring from the lazy multi-model stats. There's 2 regression tests failing now with some annoying rounding errors due to numpy changing the dtype on masked data. If you agree I'll re-generate the reference data for the regression tests so we get green lights from CircleCI. |
Masks are stored as fill values in netcdf files, so it seems unlikely that regenerating the reference data will change the data type of the resulting masks. |
The point is that whenever the input data is a masked array, (even if mask is set to false everywhere) |
cheers @Peter9192 - thanks to @bouweandela nudging me, I am now looking at this 🍺 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very nice implementation @Peter9192 - I left a few comments here and there, not anything serious. I'll have a look at the CI tests now then approve 👍
|
||
result_slices.append(collapsed_slice) | ||
|
||
result_cube = iris.cube.CubeList(result_slices).merge_cube() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this fail now? or we've already processed those cubes to a point where it will never fail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say it shouldn't, but apparently it can fail indeed. E.g. if you calculate the mean of integer arrays [1, 2]
and [3, 3]
one index at a time, you'll end up concatenating int(2)
and float(2.5)
, which fails. Normally all input should be dtype float32
and if dtype is preserved in the calculations, that shoudn't happen.
We already correct for this by explicity casting astype
, but just in case, I'll add a try/except clause here as well, with a nice message.
ah gotcha - so the tests fail coz of the same issue I pointed out in the code, time points are not quite equal, tests can be fixed by upping the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pre-emptive approval, but please fix the 2 tests and have a look at me comments too. Apart from that - really nice work! 👍
Thanks for the review V, I'm on it |
I think I covered all your comments @valeriupredoi. It's just those 2 regression tests that keep failing with precision errors. As I said before, I think we should just regenerate the sample data and merge. I don't think an unsurprising 0.00001% difference in a test result is worth our time and effort. |
@zklaus @valeriupredoi Note that I just regenerated the sample data for the failing tests. They both reported a If you agree that's okay, this PR can be merged. Otherwise, feel free to suggest/commit a different solution. |
Thanks, I am happy with this. Could you just make a quick pass through the checklist, tick the things that are present, and strike out those you deem inapplicable? I started with this, but wanted explicit confirmation from you on backwards compatibility and potential changes in dependencies. |
excellent work, Peter! Yes, let's get this in - I'd like to carry on the discussion about (almost) equal time (and not only) coords - but defo not here, and in a different issue 🍺 |
@zklaus done, there's no changes to the user interface, and while scipy is removed as a direct dependency for mm-stats, it's still used in many other places in the code, so I think we're good there 👍 |
We already discussed the problem introduced with too strict iris checks when averaging near-surface air temperature 'tas' as it is reported by some models at 1.5 m, by other models at 2.0 m. From the scientific point of view, regarding this as an error makes no sense as @ruthlorenz nicely explained at our last monthly meeting. Here is another example why these iris checks can be too strict and yet not useful from a scientific point of view: Variables on hybrid vertical levels (e.g. clw) have to be converted to pressure or height levels to be processed by most diagnostics. For this conversion, some models provide an auxiliary coordinate 'p0'. Calculating the multi-model mean over such datasets converted to pressure or height levels fails because "p0" is kept after converting the vertical levels but (as expected) not identical in all models. p0 contains no useful information any more after vertical coordinate transformation has been done. Yet, this results in a failure to calculate the multi-model mean over such 3-dim variables (e.g. 3-dim cloud variables). |
@axel-lauer Is this the same issue as described in #1213 and addressed in #1220? |
The I think the vertical levels are a different issue. @axel-lauer, could you please open a new issue for that, including the failing recipe, and perhaps also add it to ESMValGroup/ESMValTool#2218? |
Description
Split off from #968 to separate the lazy and eager evaluation pathways.
This PR re-implements the multi-model statistics preprocessor. This implementation delegates more of the work to native iris functions. This enables, among others, lazy evaluation for some common operators. The actual implementation of that lazy pathway is implemented in #968.
Related to #781
Checklist
It is the responsibility of the author to make sure the pull request is ready to review. The icons indicate whether the item will be subject to the 🛠 Technical or 🧪 Scientific review.
Documentation is availableAny changed dependencies have been added or removed correctlyTo help with the number pull requests: