Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

demote time coordinate in climate_statistics #1097

Closed
wants to merge 3 commits into from

Conversation

tomaslovato
Copy link
Contributor

@tomaslovato tomaslovato commented May 5, 2021

Description

The construction of a preprocessor that computes, e.g., first climate_statistics and then multimodel_statistics fails with a missing time coordinate error from the latter because the time coordinate is removed from the cube in

clim_cube = cube.aggregated_by(clim_coord, operator)
clim_cube.remove_coord('time')
if clim_cube.coord(clim_coord.name()).is_monotonic():

The proposed changes aims to demote the time coordinate instead of removing it (see details in #1018)

Closes #1018

Link to documentation:


Checklist

It is the responsibility of the author to make sure the pull request is ready to review. The icons indicate whether the item will be subject to the 🛠 Technical or 🧪 Scientific review.


To help with the number pull requests:

@tomaslovato tomaslovato requested review from valeriupredoi and schlunma and removed request for schlunma May 5, 2021 12:29
@tomaslovato tomaslovato added bug Something isn't working preprocessor Related to the preprocessor labels May 5, 2021
@tomaslovato
Copy link
Contributor Author

@valeriupredoi "you're my only hope" 🚀

@schlunma
Copy link
Contributor

schlunma commented May 6, 2021

Unfortunately I don't know why the original time coordinate is removed here. If the original author had a good reason for this then I really don't want to break anything here. Maybe @valeriupredoi knows more about this?

I personally like your change (i.e., keeping the time coordinate), but if we do this, we need to do this for all time-related preprocessors, need to add some unit tests and also need to perform some tests if this change breaks anything (I don't think it would, but you'll never know 😁)

However, I think once #968 is merged this change shouldn't be necessary anymore, since #968 generalized multi_model_statistics in such a way that it can deal with input data with arbitrary dimensions. Would it be okay for you to wait for that? I hope that this PR gets merged soon!

@tomaslovato
Copy link
Contributor Author

tomaslovato commented May 6, 2021

Unfortunately I don't know why the original time coordinate is removed here. If the original author had a good reason for this then I really don't want to break anything here. Maybe @valeriupredoi knows more about this?

I personally like your change (i.e., keeping the time coordinate), but if we do this, we need to do this for all time-related preprocessors, need to add some unit tests and also need to perform some tests if this change breaks anything (I don't think it would, but you'll never know 😁)

However, I think once #968 is merged this change shouldn't be necessary anymore, since #968 generalized multi_model_statistics in such a way that it can deal with input data with arbitrary dimensions. Would it be okay for you to wait for that? I hope that this PR gets merged soon!

Thanks @schlunma for having a look a this! Certainly some testing will be a healthy step!
After digging a bit in the development, I found that time coord was removed in bfe1456 by @jvegasbsc but I did not found any specific comment for that choice in the old PR #87.

So far, only climate_statistics remove time in favour of either month_number, day_of_month, day_of_year (maybe also season but not checked yet) while other functions like annual_statistics or decadal statistics preserve time.

About #968 it is my understanding that in multi_model_statistics time is always required, at least I got this by looking at the code and the function preamble

Apart from the time coordinate, cubes must have consistent shapes. There
are two options to combine time coordinates of different lengths, see the
``span`` argument.

@schlunma
Copy link
Contributor

schlunma commented May 6, 2021

About #968 it is my understanding that in multi_model_statistics time is always required, at least I got this by looking at the code and the function preamble

Apart from the time coordinate, cubes must have consistent shapes. There
are two options to combine time coordinates of different lengths, see the
``span`` argument.

I certainly hope that's not the case, we already have lots of issues that describe the problem (e.g., #890, #891) and I think they are considered in #968. Maybe @Peter9192 or @stefsmeets can briefly clarify that for us?

@stefsmeets
Copy link
Contributor

stefsmeets commented May 6, 2021

About #968 it is my understanding that in multi_model_statistics time is always required, at least I got this by looking at the code and the function preamble

Apart from the time coordinate, cubes must have consistent shapes. There
are two options to combine time coordinates of different lengths, see the
``span`` argument.

I certainly hope that's not the case, we already have lots of issues that describe the problem (e.g., #890, #891) and I think they are considered in #968. Maybe @Peter9192 or @stefsmeets can briefly clarify that for us?

#890 is not resolved with the #968. It's not the goal of the PR, but it may be tackled in a follow-up PR, since #968 already simplifies the time implementation. At the moment multimodel statistics function fails because iris cannot guess the bounds of a cube with no time coordinate.

@schlunma
Copy link
Contributor

schlunma commented May 6, 2021

All right, then I misunderstood this, sorry. As I said, I'm in favor for merging this PR (once it has been tested and some unit tests), but I think we need some more discussion about this, especially about the original purpose of the removal of the time coordinate.

@Peter9192
Copy link
Contributor

To add to Stef's comment: making mmstats independent of time is a bit of a challenge since we have the span argument that is explicitly about time atm. So to do that you'd have to reinterpret this argument.

@zklaus
Copy link

zklaus commented May 7, 2021

Is the time coordinate that comes out of climate_statistics still adequate for the data? It is possible to construct an appropriate time coordinate following CF conventions 1.8, Section 7.4 "Climatological Statistics", but I am not sure that merely aggregating the cube gives you the correct information.

@tomaslovato, if you look at the demoted time coordinate in the output, does the information make sense?

@tomaslovato
Copy link
Contributor Author

Thanks @stefsmeets @Peter9192 for the clarification about the need to have time coord for multimodel statistics

I tried the aggregation with a recipe that I had (btw, falling into another issue) and to the time coord that is generated after the operation looks coherent.
In the following, I first selected data between 2000 and 2014 and then computed monthly clim

DimCoord([2007-01-16 12:00:00, 2007-02-15 00:00:00, 2007-03-17 00:00:00,
       2007-04-16 12:00:00, 2007-05-17 00:00:00, 2007-06-16 12:00:00,
       2007-07-17 00:00:00, 2007-08-17 00:00:00, 2007-09-16 12:00:00,
       2007-10-17 00:00:00, 2007-11-16 12:00:00, 2007-12-17 00:00:00], bounds=[[2000-01-01 00:00:00, 2014-02-01 00:00:00],
       [2000-02-01 00:00:00, 2014-03-01 00:00:00],
       [2000-03-01 00:00:00, 2014-04-01 00:00:00],
       [2000-04-01 00:00:00, 2014-05-01 00:00:00],
       [2000-05-01 00:00:00, 2014-06-01 00:00:00],
       [2000-06-01 00:00:00, 2014-07-01 00:00:00],
       [2000-07-01 00:00:00, 2014-08-01 00:00:00],
       [2000-08-01 00:00:00, 2014-09-01 00:00:00],
       [2000-09-01 00:00:00, 2014-10-01 00:00:00],
       [2000-10-01 00:00:00, 2014-11-01 00:00:00],
       [2000-11-01 00:00:00, 2014-12-01 00:00:00],
       [2000-12-01 00:00:00, 2015-01-01 00:00:00]], standard_name='time', calendar='gregorian', long_name='Time axis', var_name='time')

Note that time boundaries are so wide as the aggregation considers for the whole time period, which is coherent for the climate statistics operation ... @zklaus comments?

@zklaus
Copy link

zklaus commented May 7, 2021

Hm. Iris has gained the ability to deal with climatological bounds relatively recently. I think it would be worthwhile to checkout how we might leverage that, but it seems to me the problem here really is #890, i.e. a bug in multimodel statistics, that really should not depend on the time axis.

So my suggestion is to not change climate_statistics for now, but rather fix #890 and open a feature request to include proper treatment of climatologies in the future. That might be as simple as saying cube.coord('time').climatological = True in the preprocessor, but further study of the documentation of iris and of the cf conventions is needed to confirm.

@tomaslovato
Copy link
Contributor Author

@zklaus As I see it by now the time (and bounds) are revised within multi-model stats by _unify_time_coordinates so I won't be much worried about it. Moreover, the cube generated from climate_statistics still retain its original main axis based on the monthly, yearly or daily counter ...
I personally think that it is useful to allow users creating a processing chain to compute climatologies and multimodel stats.

@zklaus
Copy link

zklaus commented May 7, 2021

I personally think that it is useful to allow users creating a processing chain to compute climatologies and multimodel stats.

Oh definitely. It just shouldn't depend on a time axis. That is simply a bug in multimodel statistics.

@Peter9192
Copy link
Contributor

by now the time (and bounds) are revised within multi-model stats by _unify_time_coordinates so I won't be much worried about it

See #744

@tomaslovato
Copy link
Contributor Author

@zklaus Given that the issue with time dependency in multimodel statistics won't be fixed in the short period, should we add this fix to enable the correct concatenation of these preprocessors?

@bouweandela
Copy link
Member

Now that #1150 has been merged, I think this is no longer needed. Could you please check @tomaslovato?

@tomaslovato
Copy link
Contributor Author

@bouweandela I made a test with the recipe detailed here below and it fails with the following
iris.exceptions.CoordinateNotFoundError: 'Expected to find exactly 1 time coordinate, but found none.' (see also
main_log_debug.txt

It is my guess that the code is crashing in the check for aligned cubes here

aligned_cubes = _align(copied_cubes, span=span)

that goes directly into the check for time coords
def _get_consistent_time_unit(cubes):
"""Return cubes' time unit if consistent, standard calendar otherwise."""
t_units = [cube.coord('time').units for cube in cubes]
if len(set(t_units)) == 1:

I frankly don't see why it is so terrible in demoting the time to an auxiliary in this operation and have a fix for the time being ... otherwise it would be necessary to instruct the _multimodel.py to handle similarly to time also year, month, day coords derived in climate_statistics. The latter seems to me very unlikely as it will generate a significant dependence form changes in climate_statistics.

documentation:
description: test
authors:
  - lovato_tomas

datasets:
- {dataset: CMCC-ESM2,     grid: gn, project: CMIP6,  exp: historical,  ensemble: r1i1p1f1,  start_year: 2000,  end_year: 2014}
- {dataset: UKESM1-0-LL, grid: gn, project: CMIP6,  exp: historical,  ensemble: r1i1p1f2,  start_year: 2001,  end_year: 2004}


preprocessors:
prep_test:
  custom_order: true
  extract_time:
    { start_year: 2000, start_month: 1, start_day: 1, end_year: 2001, end_month: 12, end_day: 31 }
  regrid:
    target_grid: 2x2
    scheme: linear
  climate_statistics:
    operator: mean
    period: month
  multi_model_statistics:
    span: overlap
    statistics: [mean]
    exclude: ['WOA']

diagnostics:
test:
  variables:
    tos:
       preprocessor: prep_test
       mip: Omon
  scripts:  ```
</details>

@tomaslovato
Copy link
Contributor Author

@bouweandela there is one thing that I forgot to add in my previous comment,
I made a test with annual_statistics operator and after the time agregation the resulting cube has time as Dimcoord and year as an auxiliary coord, which looks the opposite of what is done in climate_statistics ...

 2021-06-15 09:16:28,860 UTC [296784] DEBUG   esmvalcore.preprocessor:279 Running area_statistics(sea_surface_temperature                                                    (time: 2; first spatial index for variables stored on an unstructured grid: 292; second spatial index for variables stored on an unstructured grid: 362)
     Dimension coordinates:
          time                                                                  x                                                                    -                                                                       -
          first spatial index for variables stored on an unstructured grid      -                                                                    x                                                                       -
          second spatial index for variables stored on an unstructured grid     -                                                                    -                                                                       x
     Auxiliary coordinates:
          year                                                                  x                                                                    -                                                                       -
          latitude                                                              -                                                                    x                                                                       x
          longitude                                                             -                                                                    x                                                                       x
     Attributes:

@bouweandela
Copy link
Member

I frankly don't see why it is so terrible in demoting the time to an auxiliary in this operation

I doubt whether that works correctly with the current implementation. What is needed, is a bit of code in the multimodel statistics preprocessor function to align the time coordinate from all the input models in case that coordinate has only 1 value, so the cubes from all models can be merged and collapsed along the new model dimension. This is needed regardless of whether the time coordinate is a dimension coordinate or an auxiliary coordinate.

I made a test with annual_statistics operator and after the time agregation the resulting cube has time as Dimcoord and year as an auxiliary coord, which looks the opposite of what is done in climate_statistics ...

I agree that there should be a single consistent behaviour. Probably it would be best to just keep the time coordinate as it is after the statistics operation, because that is how the iris authors thought it made most sense.

Note that some more work on multimodel statistics is under way in #968, but as I said before, code to handle a time coordinate with just a single point is still missing.

@tomaslovato
Copy link
Contributor Author

tomaslovato commented Jun 16, 2021

What is needed, is a bit of code in the multimodel statistics preprocessor function to align the time coordinate from all the input models in case that coordinate has only 1 value

Now that I know this I feel kind of lucky since I always tried with monthly climatological stats !!

I agree that there should be a single consistent behaviour. Probably it would be best to just keep the time coordinate as it is after the statistics operation, because that is how the iris authors thought it made most sense.

I fully support the consistent behaviour toward iris philosophy, so if we agree on that I can at least give a try in modifying climate_statistics by retaining the time resulting after the computations and demoting the coordinate derived from climate aggregation ...

@tomaslovato
Copy link
Contributor Author

tomaslovato commented Jun 17, 2021

@bouweandela I commited a modification to the code of climate_statistics to produce a cube that retains time as Coord and the climate coordinate as an auxiliary one to align with the general behaviour of iris.
In this way the problem with time handling is also fixed (apart from the limitation of multimodel function you mentioned before).

main_log_debug.txt

@bouweandela
Copy link
Member

I suspect that the time coordinate on the resulting multmodel statistic cube is completely wrong if you try to fix it like this. Because we're actively working on fixing the multimodel statistics preprocessor, I think it would be best to complete that work before starting to modify others just so they work with the omissions in the current implementation of the multimodel statistics.

If you need a workaround for now, you could also compute the multimodel statistics before the climate statistics using a custom preprocessor order.

@tomaslovato
Copy link
Contributor Author

I suspect that the time coordinate on the resulting multimodel statistic cube is completely wrong if you try to fix it like this.

Actually it should not, as there is the alignement of cubes time coord before computing the multimodel statistics

def _unify_time_coordinates(cubes):
"""Make sure all cubes' share the same time coordinate.
This function extracts the date information from the cube and
reconstructs the time coordinate, resetting the actual dates to the
15th of the month or 1st of july for yearly data (consistent with
`regrid_time`), so that there are no mismatches in the time arrays.
If cubes have different time units, it will use reset the calendar to
a default gregorian calendar with unit "days since 1850-01-01".
Might not work for (sub)daily data, because different calendars may have
different number of days in the year.
"""
t_unit = _get_consistent_time_unit(cubes)

which is indeed at the origin of this issue (see my comments above) since there is the missing time coord in the cube that is explicitly requested by this align procedure, namely in function _get_consistent_time_unit.

If you need a workaround for now, you could also compute the multimodel statistics before the climate statistics using a custom preprocessor order.

I tested the option you suggest, but it works at the cost of a larger computational burden...

@tomaslovato
Copy link
Contributor Author

I suspect that the time coordinate on the resulting multmodel statistic cube is completely wrong if you try to fix it like this. Because we're actively working on fixing the multimodel statistics preprocessor, I think it would be best to complete that work before starting to modify others just so they work with the omissions in the current implementation of the multimodel statistics.

If you need a workaround for now, you could also compute the multimodel statistics before the climate statistics using a custom preprocessor order.

@bouweandela based on the previous comment and on recent discussions in #1221 on default preprocessors order I will close this PR and #1018 as won't be a problem unless the processor order will be changed in the future

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working preprocessor Related to the preprocessor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Preprocessor chain 'climate_statistics' and 'multimodel_statistics' fails as time dimension is not present
6 participants