Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add preprocessor function to resample time and compute x-hourly statistics #696

Merged
merged 18 commits into from
Dec 21, 2020

Conversation

jvegreg
Copy link
Contributor

@jvegreg jvegreg commented Jul 3, 2020

@sloosvel found a use case in which we needed to convert 3-hourly data to 6-hourly and we found that this capability is missing in the core, as we don't go futher than daily_statistics. This pull requests add support for x-hourly statistics

But in this case we are working with instantaneous data and so we don't really want to compute but just to extract some values. I added the functionality for this, so we can now convert data frequencies by resampling it.

Tasks

  • Create an issue to discuss what you are going to do, if you haven't done so already (and add the link at the bottom)
  • This pull request has a descriptive title that can be used in a changelog
  • Add unit tests
  • Public functions should have a numpy-style docstring so they appear properly in the API documentation. For all other functions a one line docstring is sufficient.
  • If writing a new/modified preprocessor function, please update the documentation
  • Circle/CI tests pass. Status can be seen below your pull request. If the tests are failing, click the link to find out why.
  • Codacy code quality checks pass. Status can be seen below your pull request. If there is an error, click the link to find out why. If you suspect Codacy may be wrong, please ask by commenting.
  • Please use yamllint to check that your YAML files do not contain mistakes
  • If you make backward incompatible changes to the recipe format, make a new pull request in the ESMValTool repository and add the link below

@jvegreg jvegreg added enhancement New feature or request preprocessor Related to the preprocessor labels Jul 3, 2020
@sloosvel
Copy link
Contributor

sloosvel commented Jul 3, 2020

Could they be added in the init list to be called in the recipe? In this way you get rid of the data before continuing with the preprocessing. I guess the output product would need to have the frequency updated then.

Copy link
Contributor

@valeriupredoi valeriupredoi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice addition, Javi! 🍺

esmvalcore/preprocessor/_time.py Outdated Show resolved Hide resolved
@jvegreg
Copy link
Contributor Author

jvegreg commented Jul 7, 2020

Could they be added in the init list to be called in the recipe? In this way you get rid of the data before continuing with the preprocessing. I guess the output product would need to have the frequency updated then.

Added.

Copy link
Contributor

@stefsmeets stefsmeets left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jvegasbsc , I just had a look at this pull request. It looks like it is more or less complete. I think we should try to get this merged. It just needs a bit of polish to get it finished. Please see my review comments.

Also, could you have a look at the Dockerfile so that it does not conflict with master and fix the Codacy complaints?

doc/recipe/preprocessor.rst Outdated Show resolved Hide resolved
doc/recipe/preprocessor.rst Outdated Show resolved Hide resolved
doc/recipe/preprocessor.rst Outdated Show resolved Hide resolved
doc/recipe/preprocessor.rst Outdated Show resolved Hide resolved
doc/recipe/preprocessor.rst Outdated Show resolved Hide resolved
esmvalcore/preprocessor/_time.py Outdated Show resolved Hide resolved
esmvalcore/preprocessor/_time.py Show resolved Hide resolved
esmvalcore/preprocessor/_time.py Outdated Show resolved Hide resolved
esmvalcore/preprocessor/_time.py Outdated Show resolved Hide resolved
esmvalcore/preprocessor/_time.py Outdated Show resolved Hide resolved
esmvalcore/preprocessor/_time.py Outdated Show resolved Hide resolved
doc/recipe/preprocessor.rst Outdated Show resolved Hide resolved
esmvalcore/preprocessor/_time.py Show resolved Hide resolved
esmvalcore/preprocessor/_time.py Show resolved Hide resolved
esmvalcore/preprocessor/_time.py Show resolved Hide resolved
Javier Vegas-Regidor and others added 2 commits October 26, 2020 11:09
@stefsmeets
Copy link
Contributor

Nice work @jvegasbsc ! Can you fix the last CI errors:

commit/documentation:

Warning, treated as error:
/doc/doc/recipe/preprocessor.rst:974:Title underline too short.

``resample_hours``
-----------------

Codacy:
https://app.codacy.com/gh/ESMValGroup/ESMValCore/pullRequest?prid=5799969

Then I think this is ready to be merged!

@jvegreg
Copy link
Contributor Author

jvegreg commented Oct 27, 2020

Done!

@stefsmeets
Copy link
Contributor

Hi @axel-lauer , this PR is finished from a technical point of view. We need a scientific review to get this merged, but I'm not entirely sure who to ask. Do you have any idea who we could ask?

@axel-lauer
Copy link
Contributor

Maybe you could take a look at this issue: ESMValGroup/ESMValTool#1818. The list is still a draft and far from complete but I would guess we might ask e.g. someone listed under 'extreme events'.

@stefsmeets
Copy link
Contributor

stefsmeets commented Nov 12, 2020

Thanks @axel-lauer ! Let's see if we can do this 😉

Hi @jhardenberg @maritsandstad, could I ask one of you to do a scientific review of this PR? We already did a technical review and would like to get this merged.

You can perhaps start by looking at the documentation (https://github.com/ESMValGroup/ESMValCore/blob/f8bfd90e93c30e0f4935dfbb4d7bc618274ac37d/doc/recipe/preprocessor.rst), particularly the sections on Hourly statistics, Resample time, and Resample hours, and see if that makes sense and go from there. Thanks!

@jhardenberg
Copy link
Contributor

jhardenberg commented Nov 12, 2020

Hi, looks like a very useful addition indeed! The difference between resample_time and resample_hours is that the first one should be used to extract for example always 1PM from maybe all days, or to extract only 1PM on the 12th of each month, while resample_hours extracts data by resampling every 6h for example, correct?

Both seem useful, but would the exact same functionality of resample_hours not be covered by resample_time if it allowed to specify a list of hours, instead of only one value? Anyway this could maybe be a later change.

Isn't extract_month and (resample_time specifying only "month:") doing the same?

As you write this sampling makes sense only for instantaneous data and would be clearly wrong for cumulated data (or time-averaged) data (all mass or energy fluxes). Would it maybe be better to add a check in the variable attributes to prevent users to apply it to such variables, like e.g. precipitation? Should the name "resample_hours" maybe be changed to something like "resample_inst_hours" to make this clear and avoid abuse?

Actually it would be nice in the future to develop a similar functionality also for time-averaged variables (e.g. precipitation, radiative fluxes), so that one can change from 3hr frequency to 6hr etc .... In this case, since the variables are in units like Kg/m^2/s or W/m^2, they should be averaged between timesteps. E.g. 3hr data would become 6hr data averaging the 1st and 2nd timestep, the 3rd and 4th etc. Also in this case an offset could be specified.

@jvegreg
Copy link
Contributor Author

jvegreg commented Nov 16, 2020

Hi, looks like a very useful addition indeed! The difference between resample_time and resample_hours is that the first one should be used to extract for example always 1PM from maybe all days, or to extract only 1PM on the 12th of each month, while resample_hours extracts data by resampling every 6h for example, correct?

Yes, that's the difference

Both seem useful, but would the exact same functionality of resample_hours not be covered by resample_time if it allowed to specify a list of hours, instead of only one value? Anyway this could maybe be a later change.

Yes, but in that case I feel inclined to also allow lists on the other options and this will lead to frequencies that we are not currently supporting

Isn't extract_month and (resample_time specifying only "month:") doing the same?

Yes, certain configurations of this function can do exactly what other functions already do.

As you write this sampling makes sense only for instantaneous data and would be clearly wrong for cumulated data (or time-averaged) data (all mass or energy fluxes). Would it maybe be better to add a check in the variable attributes to prevent users to apply it to such variables, like e.g. precipitation? Should the name "resample_hours" maybe be changed to something like "resample_inst_hours" to make this clear and avoid abuse?

A check in the cell methods to check if time: mean or time:sum can be done, but we are not checking those attributes. in the cmor checker so I am not really prone to take them as a reliable source of information, so I think we could not add more than a warning here

Actually it would be nice in the future to develop a similar functionality also for time-averaged variables (e.g. precipitation, radiative fluxes), so that one can change from 3hr frequency to 6hr etc .... In this case, since the variables are in units like Kg/m^2/s or W/m^2, they should be averaged between timesteps. E.g. 3hr data would become 6hr data averaging the 1st and 2nd timestep, the 3rd and 4th etc. Also in this case an offset could be specified.

The hourly statistics will do this, although an offset can not be specified

@stefsmeets
Copy link
Contributor

Hi @jhardenberg , if you are happy with the response, could you please approve this PR? Then I think this can be merged 😄

@jhardenberg
Copy link
Contributor

Hi, the responses are fine for me, thanks. As said, I would just add at least a clear warning about the fact that the sampling provided by this function should be used only for instantaneous data and not for fluxes (which should be averaged).

@stefsmeets
Copy link
Contributor

Hi @ESMValGroup/esmvaltool-coreteam , this PR is ready to be merged!

@jvegreg jvegreg merged commit 8f07d2c into master Dec 21, 2020
@jvegreg jvegreg deleted the resample_time branch December 21, 2020 10:05
@jvegreg
Copy link
Contributor Author

jvegreg commented Dec 21, 2020

The pending errors are related to the documentation and they already exist on master

@bouweandela
Copy link
Member

As said, I would just add at least a clear warning about the fact that the sampling provided by this function should be used only for instantaneous data and not for fluxes (which should be averaged).

@jvegasbsc It would have been nice if this comment by @jhardenberg could have been addressed before merging this, but it's too late now I guess.

* operator: operation to apply. Accepted values are 'mean',
'median', 'std_dev', 'min', 'max' and 'sum'. Default is 'mean'

See also :func:`esmvalcore.preprocessor.daily_statistics`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be:
See also :func:esmvalcore.preprocessor.hourly_statistics.
I guess

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request preprocessor Related to the preprocessor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants