-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fx_data not preserved as 'cell_measures' after iris aggregated_by and 'extract_levels' processor #1189
Comments
@sloosvel, could you please have a look at this? |
As I commented here, I'm fairly sure that this is not a bug of
However, the would involve quite some changes to other preprocessors, since they assume identical shapes for the data and the fx variables right now. It is also not a solution for time-dependent "fx" variables. |
@schlunma I agree with you about the choice made in iris to neglect such a sneaky problem of handling time dependent metrics (sorry but I was a bit slow in getting the whole thing). (I amended the remaining part of this comment as it was redundant and not improving the discussion) |
This issue comes with a number of technical constraints, so maybe a recap from the whole discussion might be useful to see if there is a viable way through. At the moment time preprocessor do not preserve grid metrics and spatial aggregation preprocessors cannot be executed afterwards. I'm wondering if this applies to both CMIP5 and CMIP6? (but I guess it will be probably just the latter) There are two main constraints, clearly stated by @schlunma (here and in #1096 (comment)):
Is it ok to keep this broken sequence in preprocessors and add a note somewhere in the documentation? Besides, I made a search within the current main and the As I think that for most of the time related aggregation providing the MEAN value of coordinates should be ok (also because other statistical operators makes a little sense to be applied to grid metrics). |
I don't have much of an opinion in the And in the case of |
@sloosvel @schlunma I'm not sure if this relates to some other on-going development branch, so before creating a new branch/PR, I put here below a first rough,working version of this 'in house' function to handle the issue. diff --git a/esmvalcore/preprocessor/_time.py b/esmvalcore/preprocessor/_time.py
index 3cf1d5f4d..2c6d3380b 100644
--- a/esmvalcore/preprocessor/_time.py
+++ b/esmvalcore/preprocessor/_time.py
@@ -558,11 +558,11 @@ def climate_statistics(cube,
clim_coord = _get_period_coord(cube, period, seasons)
operator = get_iris_analysis_operation(operator)
- clim_cube = cube.aggregated_by(clim_coord, operator)
- clim_cube.remove_coord('time')
- if clim_cube.coord(clim_coord.name()).is_monotonic():
- iris.util.promote_aux_coord_to_dim_coord(clim_cube, clim_coord.name())
- else:
+ clim_cube = aggregated_by(cube, clim_coord, operator)
+ #clim_cube.remove_coord('time')
+ if not clim_cube.coord(clim_coord.name()).is_monotonic():
+ # iris.util.promote_aux_coord_to_dim_coord(clim_cube, clim_coord.name())
+ #else:
clim_cube = iris.cube.CubeList(clim_cube.slices_over(
clim_coord.name())).merge_cube()
cube.remove_coord(clim_coord)
@@ -970,3 +970,50 @@ def resample_time(cube, month=None, day=None, hour=None):
return True
return cube.extract(iris.Constraint(time=compare))
+
+
+def aggregated_by(cube, coords, operator):
+ """Compute iris aggregation over time preserving cell_measures (#1189).
+
+ Parameters
+ ----------
+ cube: iris.cube.Cube
+ input cube.
+
+ coords: list of coord names
+ Coordinate(s) over which group aggregation is to be performed.
+
+ operator: str
+ Select operator to apply.
+
+ Returns
+ -------
+ iris.cube.Cube
+ Cube aggregated upon operator
+
+ """
+ from ._ancillary_vars import add_cell_measure
+ if cube.cell_measures():
+ # cell_measure into temporary cube
+ measure = cube.cell_measure().measure
+ fx_cube = cube.copy()
+ fx_cube.data = cube.cell_measure().data
+ fx_cube.var_name = cube.cell_measure().var_name
+ fx_cube.standard_name = cube.cell_measure().standard_name
+ fx_cube.units = cube.cell_measure().units
+ # compute aggregation
+ cube = cube.aggregated_by(coords, operator)
+ fx_cube = fx_cube.aggregated_by(coords, iris.analysis.MEAN)
+ # add back cell_measure
+ measure = iris.coords.CellMeasure(
+ fx_cube.data,
+ standard_name=fx_cube.standard_name,
+ units=fx_cube.units,
+ measure=measure,
+ var_name=fx_cube.var_name,
+ attributes=fx_cube.attributes)
+ cube.add_cell_measure(measure, range(0, measure.ndim))
+ else:
+ cube = cube.aggregated_by(coords, operator)
+
+ return cube |
You could acces the fx cube by looping over the cube.cell_measures() so there would not be a need to copy again the variable (a bit like it's being done in the |
Nice @tomaslovato! Some comments that might be relevant for the actual implementation:
|
Thanks @sloosvel for the suggestions !! I tried to import the File "/users_home/oda/tl28319/GIT/ESMValCore/esmvalcore/preprocessor/_time.py", line 23, in <module>
from ._ancillary_vars import add_cell_measure
ImportError: cannot import name 'add_cell_measure' from partially initialized module 'esmvalcore.preprocessor._ancillary_vars' (most likely due to a circular import) so, for testing purposes, I imported I had a look at I actually realized that my solution is not enough general, as I made a copy of the original cube and then overwrite the data from cell_measure, but this won't work if the cube is 3D and the measure is 2D (e.g. area). |
I think the mean is a reasonable for
@schlunma I was more oriented toward the creation of a python function contained in
I guess this should go through a separate issue/development to not pile up too many things in here ... |
Sounds good!
Yes, but I think it's fairly easy to write a general function with the signature def aggregated_by(cube, coords, operator, cell_measures_operator=None, ancilliary_variables_operator=None, **kwargs) that applies the operator I agree that it might not make sense to apply this change to all other preprocessors yet without further considerations.
Agreed! |
@schlunma @sloosvel
Looking among available recipes I found the following that would be accounted as ancilliary_variables: .. are you aware of other variables that are used as |
Used in recipes, there is also |
I thought about this a little bit more and I think that the statistical operation that is necessary for the fx files also depends on the dimension that it's applied to. For example, cell areas like I think for the |
Did you refer to the I agree with you that time average should be the right operation, at least for |
I was mainly referring to |
I think it would be good to take this up with the iris developers, as our general strategy is to make generic code available through iris instead of implementing it in ESMValCore. Are there any existing issues or discussions in the iris repository?
It may be better not to broadcast on load (even though I agree it is convenient), because I expect this is going to cause really slow runs as well as problems and slowness while trying to compute temporal statistics over this kind of time independent ancillary variables. An example is |
For "true" fx variables (not time-dependent) I think this is by far the safest way to go. Calculating statistics over these variables is non-trivial and dependent on the type of variable and dimension you're aggregating, so I don't think that is something that will be in |
I agree with @bouweandela and @schlunma - most of the cases I heard from @ledm meant fx vars need to be used as masksand then all sorts of temporal statistics are being calculated, after they've been used as masks - btw Lee is a good person to have contribute here, lots of experience with ocean cell measures; also, not to do stuff that could be done in |
Thanks for the tips @valeriupredoi ! Summarizing latest comments from @bouweandela and @schlunma I kind of realize that If so, this part of the code (and some other) has to be modified to remove broadcasting: ESMValCore/esmvalcore/preprocessor/_ancillary_vars.py Lines 89 to 102 in 197a9a1
and then cell_measures will be correctly handled by iris and propagated in cubes.
However, the present issue will be still valid for those model having time-varying metric fields, but according to the If it may be of any help for further testing, I add a list of models providing e.g. |
Running a recipe with time aggregation (any using
cube.aggregated_by()
) followed by area statistics crashes as the latter doesn't find the 'cell_measures' variable in the cube. Note that this extends also tocube.collapsed()
operation.I've been already discussing this with @schlunma in #1096 and after digging a bit of testing I realized that the example we looked in there (#1096 (comment)) was not compliant with the
cell_measures
assigned by the code to cubes, astime
coord was missing.As
cell_measures
depend also on thetime
coordinate, by extending @schlunma example with time also in thearea
variable, the problem finally come up (see details)This relates to the choice made in iris to discard
cell_measures
when these are time dependent as I guess it will be very tricky to handle it in the correct way (or maybe it is simply a bug!).Second point, but for a different reason,
cell_measures
are lost also in theextract_levels
processor, where a new cube is generated by scratch and this property is not propagated (a practical example is I want to compute global average of a specific layer from a 3D variable, e.g. seawater oxygen at 500m).In this case the 2D
cell_measures
is the area and should be associated to the cube, while in the case of a 3Dcell_measures
it should be coherently extracted along with the variable data.main_log_debug.txt
The text was updated successfully, but these errors were encountered: