Use native iris functions in multi-model statistics #1150

Peter9192 · 2021-05-28T13:12:17Z

Description

Split off from #968 to separate the lazy and eager evaluation pathways.

This PR re-implements the multi-model statistics preprocessor. This implementation delegates more of the work to native iris functions. This enables, among others, lazy evaluation for some common operators. The actual implementation of that lazy pathway is implemented in #968.

Related to #781

Checklist

It is the responsibility of the author to make sure the pull request is ready to review. The icons indicate whether the item will be subject to the 🛠 Technical or 🧪 Scientific review.

🧪 The new functionality is relevant and scientifically sound
🛠 This pull request has a descriptive title and labels
🛠 Code is written according to the code quality guidelines
🧪 and 🛠 ~~Documentation is available~~
🛠 Unit tests have been added
🛠 Changes are backward compatible
🛠 ~~Any changed dependencies have been added or removed correctly~~
🛠 The list of authors is up to date
🛠 All checks below this pull request were successful

To help with the number pull requests:

🙏 We kindly ask you to review two other open pull requests in this repository

Peter9192 · 2021-05-28T15:28:59Z

@zklaus as promised I've split off the refactoring from the lazy multi-model stats.
@valeriupredoi perhaps you can have a quick look at this, since you've already seen all these changes in #968?

There's 2 regression tests failing now with some annoying rounding errors due to numpy changing the dtype on masked data. If you agree I'll re-generate the reference data for the regression tests so we get green lights from CircleCI.

bouweandela · 2021-06-04T07:33:28Z

There's 2 regression tests failing now with some annoying rounding errors due to numpy changing the dtype on masked data. If you agree I'll re-generate the reference data for the regression tests so we get green lights from CircleCI.

Masks are stored as fill values in netcdf files, so it seems unlikely that regenerating the reference data will change the data type of the resulting masks.

Peter9192 · 2021-06-04T08:12:33Z

There's 2 regression tests failing now with some annoying rounding errors due to numpy changing the dtype on masked data. If you agree I'll re-generate the reference data for the regression tests so we get green lights from CircleCI.

Masks are stored as fill values in netcdf files, so it seems unlikely that regenerating the reference data will change the data type of the resulting masks.

The point is that whenever the input data is a masked array, (even if mask is set to false everywhere) iris.analysis.MEAN will call np.ma.mean, and this one changes the dtype on the data (so also where it's not masked). This is what causes the precision error.

esmvalcore/preprocessor/_multimodel.py

valeriupredoi · 2021-06-07T11:52:58Z

cheers @Peter9192 - thanks to @bouweandela nudging me, I am now looking at this 🍺

valeriupredoi

very nice implementation @Peter9192 - I left a few comments here and there, not anything serious. I'll have a look at the CI tests now then approve 👍

esmvalcore/preprocessor/_multimodel.py

valeriupredoi · 2021-06-07T12:13:43Z

esmvalcore/preprocessor/_multimodel.py

+
+        result_slices.append(collapsed_slice)
+
+    result_cube = iris.cube.CubeList(result_slices).merge_cube()


can this fail now? or we've already processed those cubes to a point where it will never fail

I would say it shouldn't, but apparently it can fail indeed. E.g. if you calculate the mean of integer arrays [1, 2] and [3, 3] one index at a time, you'll end up concatenating int(2) and float(2.5), which fails. Normally all input should be dtype float32 and if dtype is preserved in the calculations, that shoudn't happen.

We already correct for this by explicity casting astype, but just in case, I'll add a try/except clause here as well, with a nice message.

esmvalcore/preprocessor/_multimodel.py

valeriupredoi · 2021-06-07T12:26:46Z

ah gotcha - so the tests fail coz of the same issue I pointed out in the code, time points are not quite equal, tests can be fixed by upping the rtol and I'd do the same in the code too, use allclose with some higher rtol but maybe not 1 coz that's be they really are different 😁

valeriupredoi

pre-emptive approval, but please fix the 2 tests and have a look at me comments too. Apart from that - really nice work! 👍

Peter9192 · 2021-06-07T12:50:29Z

Thanks for the review V, I'm on it

Peter9192 · 2021-06-07T14:13:10Z

I think I covered all your comments @valeriupredoi. It's just those 2 regression tests that keep failing with precision errors. As I said before, I think we should just regenerate the sample data and merge. I don't think an unsurprising 0.00001% difference in a test result is worth our time and effort.

Peter9192 · 2021-06-07T14:30:09Z

@zklaus @valeriupredoi Note that I just regenerated the sample data for the failing tests. They both reported a
Max relative difference: 2.4267516e-07 which is almost certainly due to a back and forth conversion to dtype float64 that's introduced because we switched from explicitly allocating the output array to having iris generate it for us.

If you agree that's okay, this PR can be merged. Otherwise, feel free to suggest/commit a different solution.

zklaus · 2021-06-07T14:38:13Z

Thanks, I am happy with this. Could you just make a quick pass through the checklist, tick the things that are present, and strike out those you deem inapplicable? I started with this, but wanted explicit confirmation from you on backwards compatibility and potential changes in dependencies.

valeriupredoi · 2021-06-07T14:41:05Z

excellent work, Peter! Yes, let's get this in - I'd like to carry on the discussion about (almost) equal time (and not only) coords - but defo not here, and in a different issue 🍺

Peter9192 · 2021-06-07T14:44:27Z

@zklaus done, there's no changes to the user interface, and while scipy is removed as a direct dependency for mm-stats, it's still used in many other places in the code, so I think we're good there 👍

axel-lauer · 2021-07-12T10:27:36Z

We already discussed the problem introduced with too strict iris checks when averaging near-surface air temperature 'tas' as it is reported by some models at 1.5 m, by other models at 2.0 m. From the scientific point of view, regarding this as an error makes no sense as @ruthlorenz nicely explained at our last monthly meeting. Here is another example why these iris checks can be too strict and yet not useful from a scientific point of view:

Variables on hybrid vertical levels (e.g. clw) have to be converted to pressure or height levels to be processed by most diagnostics. For this conversion, some models provide an auxiliary coordinate 'p0'. Calculating the multi-model mean over such datasets converted to pressure or height levels fails because "p0" is kept after converting the vertical levels but (as expected) not identical in all models. p0 contains no useful information any more after vertical coordinate transformation has been done. Yet, this results in a failure to calculate the multi-model mean over such 3-dim variables (e.g. 3-dim cloud variables).

Peter9192 · 2021-07-12T11:20:33Z

@axel-lauer Is this the same issue as described in #1213 and addressed in #1220?

zklaus · 2021-07-12T13:33:07Z

The tas issue is discussed at ESMValGroup/ESMValTool#2218 (comment) and #1204. Turns out there never was a problem; it was just a misdiagnosis.

I think the vertical levels are a different issue. @axel-lauer, could you please open a new issue for that, including the failing recipe, and perhaps also add it to ESMValGroup/ESMValTool#2218?

Peter9192 added 4 commits May 28, 2021 12:28

Branch off lazy mmstats PR

9811ebe

Remove code related to lazy evaluation

81032a7

Turn off tests for lazy data

0f47ed6

Enforce dtype float32 in test cubes

2e214bf

Peter9192 changed the title ~~Eager mmstats~~ Use native iris functions in multi-model statistics May 28, 2021

Peter9192 added 2 commits May 28, 2021 16:15

Fix dtype issue

a5e0e9f

Better note on dtype fix

223acd7

Peter9192 marked this pull request as ready for review May 28, 2021 15:29

bouweandela added this to the v2.3.0 milestone May 28, 2021

bouweandela requested a review from valeriupredoi May 28, 2021 18:58

bouweandela reviewed Jun 4, 2021

View reviewed changes

esmvalcore/preprocessor/_multimodel.py Show resolved Hide resolved

valeriupredoi reviewed Jun 7, 2021

View reviewed changes

valeriupredoi approved these changes Jun 7, 2021

View reviewed changes

Peter9192 added 4 commits June 7, 2021 15:00

improve warning about std_dev

b1b1248

Catch errors in cube alignment with nice error message

6bfc754

Catch concat errors in compute_eager

58bd700

Remove doc statement about lazy evaluation

a0a4d1f

Regenerate sample data for failing regression tests

68a53b6

zklaus added enhancement New feature or request preprocessor Related to the preprocessor labels Jun 7, 2021

zklaus approved these changes Jun 7, 2021

View reviewed changes

zklaus merged commit 9e27908 into main Jun 7, 2021

zklaus deleted the eager_mmstats branch June 7, 2021 14:47

This was referenced Jun 11, 2021

Preprocessor multimodel_statistics fails when data have no horizontal dimension #891

Closed

demote time coordinate in climate_statistics #1097

Closed

sloosvel mentioned this pull request Jun 29, 2021

Multimodel fails for daily data with different calendars #1198

Closed

bettina-gier mentioned this pull request Jun 29, 2021

Test esmvalcore=2.3.0 with current batch of ESMValTool recipes ESMValGroup/ESMValTool#2198

Closed

bouweandela mentioned this pull request Jul 1, 2021

Monthly ESMValTool meeting July ESMValGroup/ESMValTool#2189

Closed

axel-lauer mentioned this pull request Jul 13, 2021

Problem with new multi-model statistics and variables on hybrid levels #1222

Closed

bouweandela mentioned this pull request Sep 2, 2021

Speed up multimodel statistics and fix bug in peak computation #1301

Merged

8 tasks

schlunma mentioned this pull request Feb 7, 2022

Use std instead of std_dev in recipe_esacci_lst.yml ESMValGroup/ESMValTool#2522

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use native iris functions in multi-model statistics #1150

Use native iris functions in multi-model statistics #1150

Peter9192 commented May 28, 2021 •

edited by zklaus

Loading

Peter9192 commented May 28, 2021

bouweandela commented Jun 4, 2021

Peter9192 commented Jun 4, 2021

valeriupredoi commented Jun 7, 2021

valeriupredoi left a comment

valeriupredoi Jun 7, 2021

Peter9192 Jun 7, 2021 •

edited

Loading

valeriupredoi commented Jun 7, 2021

valeriupredoi left a comment

Peter9192 commented Jun 7, 2021

Peter9192 commented Jun 7, 2021

Peter9192 commented Jun 7, 2021

zklaus commented Jun 7, 2021

valeriupredoi commented Jun 7, 2021

Peter9192 commented Jun 7, 2021

axel-lauer commented Jul 12, 2021

Peter9192 commented Jul 12, 2021

zklaus commented Jul 12, 2021


		result_slices.append(collapsed_slice)

		result_cube = iris.cube.CubeList(result_slices).merge_cube()

Use native iris functions in multi-model statistics #1150

Use native iris functions in multi-model statistics #1150

Conversation

Peter9192 commented May 28, 2021 • edited by zklaus Loading

Description

Checklist

Peter9192 commented May 28, 2021

bouweandela commented Jun 4, 2021

Peter9192 commented Jun 4, 2021

valeriupredoi commented Jun 7, 2021

valeriupredoi left a comment

Choose a reason for hiding this comment

valeriupredoi Jun 7, 2021

Choose a reason for hiding this comment

Peter9192 Jun 7, 2021 • edited Loading

Choose a reason for hiding this comment

valeriupredoi commented Jun 7, 2021

valeriupredoi left a comment

Choose a reason for hiding this comment

Peter9192 commented Jun 7, 2021

Peter9192 commented Jun 7, 2021

Peter9192 commented Jun 7, 2021

zklaus commented Jun 7, 2021

valeriupredoi commented Jun 7, 2021

Peter9192 commented Jun 7, 2021

axel-lauer commented Jul 12, 2021

Peter9192 commented Jul 12, 2021

zklaus commented Jul 12, 2021

Peter9192 commented May 28, 2021 •

edited by zklaus

Loading

Peter9192 Jun 7, 2021 •

edited

Loading