-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preprocessor chain is not being applied to cell measures #436
Comments
I think that this issue had been raised elsewhere, but never had a dedicate issue number. |
@schlunma @mattiarighi @valeriupredoi @zklaus, do any of you have any thoughts on how to implement this fix? I don't mind trying to do it myself, but it would be good to have a discussion on how to do it first. Cheers! |
This fix needs:
However, there are lots of caveats, because nothing is ever easy:
Feel free to edit the comment for clarity of if we encounter additional requirements/caveats. |
cool cheers @ledm - will have a stab at this later on today, first fixing tests 🔢 |
Thank you @ledm for listing the caveats. |
A time dependent FX file is fairly common in the Ofx The CMIP6 models: GFDL-CM4, UKESM1 and HadGEM3 all have time varying volcellos. Here's a recipe that shows this problem:
In this case, we'd need to apply both the |
@ledm is right about the changing nature of |
started the work in #439 - nearly there but not ready yet, I'll let you guys know when done, most prob in a couple hours 🍺 |
Just noticed that |
Another thing I need to fix in #439 is to retrieve and use only the fx files with the same mip as the master variable since we dont want to use the eg mip=Ofx ones, just realized that |
What do you mean by that? Whether we must use eg |
I think he means this: #440. |
Yes but Ofx-volcello is not time-dependent and that can not he used for area or volume statistics where time ops are needed. @ledm are time-ops always needed for those preprocessors? |
No I mean 439 where fx vars are preprocessed - if they need to be time-processed then Ofx won't cut it |
|
yeah thx Sherlock 😁 But the question is: the code finds one of them first (if there is Ofx then it'll find Ofx and return it, if there is Omon/yr/dec it'll find that and return it) but if there's a time preprocessor needed to be applied on the fx data then if there's Ofx available then that preprocessor will fail miserably -> that's what I need to fix in #439 |
This really all depends on the individual preprocessor, doesn't it? Some examples:
Of course, all of this is for the case where "fx" files are passed as additional information for other variables. When they are actually treated as variables in their own right normal processing should be applied. |
I experimented a bit with cube = iris.load_cube("/home/bandela/esmvaltool_input/CMIP5/ta_Amon_CanESM2_historical_r1i1p1_185001-200512.nc")
print(cube.shape)
area = iris.load_cube("/home/bandela/esmvaltool_input/CMIP5/areacella_fx_CanESM2_historical_r0i0p0.nc")
print(area.shape)
measure = iris.coords.CellMeasure(area.core_data(), standard_name=area.standard_name, units=area.units, measure='area')
cube.add_cell_measure(measure, (2, 3))
cube = extract_time(cube, 2000, 1, 1, 2001, 1, 1)
cube = extract_region(cube, 0, 10, 0, 20)
print(cube.shape)
print(cube.cell_measure('cell_area').shape)
cube = iris.load_cube("/home/bandela/esmvaltool_input/CMIP6/thetao_Omon_GFDL-CM4_historical_r1i1p1f1_gn_201001-201412.nc")
print(cube.shape)
volume = iris.load_cube("/home/bandela/esmvaltool_input/CMIP6/volcello_Omon_GFDL-CM4_historical_r1i1p1f1_gn_201001-201412.nc")
print(volume.shape)
coord = iris.coords.CellMeasure(volume.core_data(), standard_name=volume.standard_name, units=volume.units, measure='volume')
cube.add_cell_measure(coord, (0, 1, 2, 3))
cube = extract_time(cube, 2010, 1, 1, 2011, 1, 1)
cube = extract_region(cube, 0, 10, 0, 20)
print(cube.shape)
print(cube.cell_measure('ocean_volume').shape) prints
If iris cell measures work as expected, I would strongly prefer to add them to the cubes at load time (if needed for the preprocessor steps defined in the recipe) and take them through the preprocessing chain in that way. |
also note that the fx variables are not exactly the same as |
Indeed, this issue really is only about Regarding iris support, @bouweandela is right that a lot is working already. In some ways, the situation is even better, because CMIP6 data often already contains the correct |
Indeed, but until it does, I think we can just modify our load/fix_metadata functions so the cell measures get attached as shown in the example above #436 (comment).
That pull request makes a lot of changes to the way tasks are constructed from the recipe and in general is very large. I'm afraid we will introduce all kinds of bugs if we merge it like this and I think it's also not needed. We could just pass the filenames of the fx files to the |
@bouweandela - I'll let @ledm explain to you in what many ways we need that PR - mate, it's really not as simple as load the fx files and change them, we need to preprocess them as per the user's needs. Do note that I have removed the changes on tasks apart the one that actually cleans up identical ancestors within a single task, that is needed imho. The PR is long but that's not a reason to cold refuse it 🍺 |
also that PR contains a lot of very useful changes @schlunma and myself have done to the fx vars handling in general, not just for the purpose of preprocessing them for the area_statistics or volume_statistics. Note that @LisaBock and @ledm and a bunch of other IPCC authors (inc. myself) need this functionality yesterday |
we can review it together, over a beer, not a worry about doing it alone 🍺 |
on top of it all - @ledm has already produced preliminary IPCC results so delaying the integration will affect the scientists -> and am done lobbying 😁 |
If it's so important you will be eager to prepare the changes to facilitate easier, ie quicker review.
This might require some history rewriting. And most important of all: Have one PR do one thing. As you say yourself
In other words, it contains a lot of things that have nothing to do with the ostensible intent of the PR. This leaves you in the position where the reviewer has to wonder at every hunk of the large diff what might be the intent of this change. Thanks to the convoluted history the unclear commit messages are also of little help. So here we have a PR with 77 commits, making substantial changes to 8 core files of the project with the only information about the intent being that it contains, but is very much not limited to, changes for area and volume statistics. For me that certainly is reason enough to request breaking up the PR and replacing the current monster with a few more manageable and clearer ones. |
That sounds great! And let's discuss the rebase over a beer. I'll buy. 😄 🍻 |
OK guys I have now broken down PR #439 into four bits (and @zklaus was quick and and on the ball to approve the first bit so that one is now in
I will now add relevant info into each PR description linking it to this comment 🍺 |
Maybe I'm missing something here, but to me it looks like it is. If you have a careful look at the example I posted earlier (#436 (comment)), you will see that once you have attached the cell measures to the cube, |
I like this simplified approach! There are a couple of things to consider, though: As touched upon before, not all fx files are cell measures. According to CF (that released version 1.8 just a few days ago), only area and volume are supported at the moment, so masks and fractions, for example, are not considered cell measures. Maybe that is something to take up with CF here? The other thing is that some preprocessors will deal better with this than others. Where we use directly iris functionality (as in using Still, we would save some and the stuff we would have left to do would need to be done anyway, so I am all for it. |
right, so @bouweandela and myself had a chat on the phone yesterday and concluded we should approach things this way rather than the proletarian way in #439 (and subsequent PR children) but this approach is still presenting us with caveats or thinhs to test nonetheless, so in the meantime, we should keep #439 and its children alive (that works well, tested by @ledm quite heavily) so that the people producing results for likes of IPCC and stuff can use its functionality. So your review on #511 is absolutely awesome and to be used, @zklaus - but we should start thinking about plugging in this approach there 🍺 |
Maybe these can be added as ancillary variables? http://cfconventions.org/Data/cf-conventions/cf-conventions-1.7/build/ch03s04.html |
Our design strategy is to rely as much on iris as possible. Therefore we may want to consider trying to get functionality that we need to implement outside of iris into iris. For example support 2D horizontal coordinates. |
There will be much better support for ancillary variables in iris 3. Removed this from the v2.0.0 milestone, since the release of iris 3 is scheduled after the ESMValCore v2.0.0 release. |
I have worked a bit following this approach. Should I open a new pull request or push everything to #439 ? There are quite some changes though. |
Yes, please |
Basically, if your preprocessor changes the shape of the diagnostic data cube (for instance a spatial or temporal slice, such as the
annual_statistics
orextract_region
preprocesors), before using thearea_statistics
, zonal_meanand
volume_statistics` preprocessors, then the resulting cube will no longer have the same shape as the FX data.This means that we can only use the mean operation in
area_statistics
, zonal_meanand
volume_statistics` can only be applied on th global scale. It is not currently possible to use them on a regional scale.Many of the preprocessors need to be applied to both the diagnostic dataset, but also the FX data, in the cases when the
mean
operator is used inarea_statistics
, zonal_meanand
volume_statistics`.This issue is related to the recent fixes merged by @mattiarighi, @schlunma and @valeriupredoi. The PR's #429 and #433 have closed some FX related issues recently, but I don't want this issue to get lost with the other closed issues and PR.
The text was updated successfully, but these errors were encountered: