Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[URGENT] Rename zonal_means to zonal_statistics #433

Merged
merged 4 commits into from
Jan 17, 2020
Merged

Conversation

mattiarighi
Copy link
Contributor

In Table 1 of the paper, we mention zonal_statistics and meridional_statistics, but in the preprocessor this is called zonal_means and covers both the zonal (longitude stats) and the meridional (latitude stats) case.

This PR attempts to quickly solve the problem.

recipe_flato13ipcc.yml and recipe_collins13ipcc.yml should be adjusted accordingly.

Closes #24.

Copy link
Contributor

@ledm ledm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be tested with irregular grids as I'm not convinced that the iris mean operator is able to do it.

@ledm
Copy link
Contributor

ledm commented Jan 16, 2020

If we really need to get this pushed through urgently, I'd recommend removing the merdional statistics preprocessor altogether (from here and from the paper), and then we need to put an raise fatal error in the zonal_statistics when the latitude data is two dimensional.

@mattiarighi
Copy link
Contributor Author

There are probably other preprocessor features that might have problems with irregular grids (like NE masking), so I do not think this is a reason against the renaming.

@mattiarighi
Copy link
Contributor Author

If we really need to get this pushed through urgently, I'd recommend removing the merdional statistics preprocessor altogether (from here and from the paper), and then we need to put an raise fatal error in the zonal_statistics when the latitude data is two dimensional.

Fine with me.

Note however that the two functions are basically identical, they only differ by the target coordinate.

@ledm
Copy link
Contributor

ledm commented Jan 16, 2020

Note however that the two functions are basically identical, they only differ by the target coordinate.

Unfortunately, the cell descriptions are fundamentally different in the latitude and longitude directions and a function that works in one direction won't work in the other.

@mattiarighi
Copy link
Contributor Author

mattiarighi commented Jan 16, 2020

Unfortunately, the cell descriptions are fundamentally different in the latitude and longitude directions and a function that works in one direction won't work in the other.

Yes, but zonal_means was designed to work with both, by taking coordinate as an argument.
What I'm doing here is just to rename _mean to _statistics and to split the function into zonal and meridional, since by definition zonal refers to the longitude coordinate only.

@mattiarighi
Copy link
Contributor Author

  • Option 1: leave everything as is and remove zonal_statistics and meridional_statistics from Table 1.
  • Option 2: just rename zonal_means to zonal_statistics and remove meridional_statistics from Table 1.
  • Option 3: this PR, split zonal_means into zonal_statistics and meridional_statistics to be consistent with Table 1.

I'm not happy with 1, since the functionality is there and is used by some diagnostics, so it should be documented.
Option 2 is not fully consistent, since zonal statistics means "along longitude".

@mattiarighi mattiarighi added preprocessor Related to the preprocessor help wanted Extra attention is needed labels Jan 16, 2020
@ledm
Copy link
Contributor

ledm commented Jan 16, 2020

From your list, my preference is option 2.

I guess what I'm trying to say that that both these functions are wrong and neither will work as intended in the real world. The calculations will output invalid numbers, or break, except for in the case of a zonal mean of an dataset with a regular grid.

Most CMIP6 ocean models have iregular grids (ie 2d arrays for latitude and longitude coordinates.) Iris isn't clever enough to do these calculations along the latitude or longitude directions. It is only able to do the calculation in the direction of the cube dimensions (x, y, or z). These directions do not coincide with the latitude or longitude direction. It's a non-trivial calculation to produce the mean of an irregular grid in the latitude or longitude direction. (My hack was to re-grid the data to a regular grid before running this calculation.)

In theory, both meridional and zonal means should require the fx data for volume. However, in an regular grid, we're lucky that the cell volume is the same for a given cell depth for a given latitude band. This means that we can calculate the zonal mean for a regular grid without the cell volume (or more correctly: the zonal cross section). This isn't the case for irregular grids or in the meridional direction, as the cell volume changes in the North-South direction, so the meridional mean will always need the cell volume as a weight.

Hope this makes sense - I spend quite a bit of time working on this issue and wasn't able to come to a satisfying solution.

@mattiarighi
Copy link
Contributor Author

Thanks for clarifying!

I guess what I'm trying to say that that both these functions are wrong and neither will work as intended in the real world. The calculations will output invalid numbers, or break, except for in the case of a zonal mean of an dataset with a regular grid.

In the two recipes where this function is used, the data are first regridded to a 1x1 grid.

What about option 3 with a check on grid type and error if irregular?

@mattiarighi
Copy link
Contributor Author

Something similar to what is done in _mask.py:

if cube.coord('longitude').points.ndim < 2:
cube = _mask_with_shp(cube, shapefiles[mask_out], [
0,
])
logger.debug(
"Applying land-sea mask from Natural Earth"
" shapefile: \n%s", shapefiles[mask_out])
else:
msg = (f"Use of shapefiles with irregular grids not "
f"yet implemented, land-sea mask not applied.")
raise ValueError(msg)

@valeriupredoi
Copy link
Contributor

yes, placing that check would probably the best bet (for longer term too)

@mattiarighi
Copy link
Contributor Author

Done.

Copy link
Contributor

@valeriupredoi valeriupredoi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good tome, who's gonna change the preprocessors in recipes?

@axel-lauer
Copy link
Contributor

I am in favor of Mattia's option 3 with an additional error message in case of irregular grids. No need to trash potentially useful preprocessor functions just because they do not work on every single grid yet.

@mattiarighi
Copy link
Contributor Author

looks good tome, who's gonna change the preprocessors in recipes?

Guess who... 😒

Copy link
Member

@bouweandela bouweandela left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good

@mattiarighi
Copy link
Contributor Author

Thanks everybody for the quick response! 👍

@mattiarighi mattiarighi merged commit 1e22ff9 into master Jan 17, 2020
@mattiarighi mattiarighi deleted the zonal_statistics branch January 17, 2020 08:55
@valeriupredoi
Copy link
Contributor

looks good tome, who's gonna change the preprocessors in recipes?

Guess who... unamused

I can help if needed (please say no) 😁

@mattiarighi
Copy link
Contributor Author

No worries 😄, already done ESMValGroup/ESMValTool#1491

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed preprocessor Related to the preprocessor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Zonal and meridional statistics preprocessor
6 participants