Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added new operators for statistics preprocesor (e.g., percentile) and allowed arbitrary kwargs #2191

Merged
merged 42 commits into from
Oct 10, 2023

Conversation

schlunma
Copy link
Contributor

@schlunma schlunma commented Sep 6, 2023

Description

This PR unifies all statistics preprocessors. Now, all statistics preprocessors support the same operators and have a common documentation. In additon, arbitrary keyword arguments for the statistical operation can be directly given to the preprocessor.

Corresponding PR in ESMValTool: ESMValGroup/ESMValTool#3351

Examples:

  preprocessors:
    monthly_percentiles:
      climate_statistics:
        period: monthly
        operator: percentile
        percent: 95.0

  preprocessors:
    mm_stats:
      multi_model_statistics:
        span: overlap
        statistics:
          - operator: percentile
            percent: 5
          - median
          - operator: percentile
            percent: 95

Deprecations

Prior to this PR, the only preprocessors that supported percentiles were multi_model_statistics and ensemble_statistics. The syntax was for example statistics: [p50.0], which can now be specified by using statistics: {operator: percentile, percent: 50.0}]. The following example shows how to modify existing recipes:

Old:

  preprocessors:
    mm_stats:
      multi_model_statistics:
        span: overlap
        statistics: [p5, p95]

New:

  preprocessors:
    mm_stats:
      multi_model_statistics:
        span: overlap
        statistics:
          - operator: percentile
            percent: 5
          - operator: percentile
            percent: 95

In addition, the usage of the operator std has been deprecated in favor of std_dev:

Old:

  preprocessors:
    test:
      zonal_statistics:
        operator: std

New:

  preprocessors:
    test:
      zonal_statistics:
        operator: std_dev

The old syntax will be deprecated with version 2.10 and removed with version 2.12.


Closes #1617
Closes #1249
Closes #1851
Closes #2223

Link to documentation: https://esmvaltool--2191.org.readthedocs.build/projects/ESMValCore/en/2191/recipe/preprocessor.html#statistical-preprocessors


Before you get started

Checklist

It is the responsibility of the author to make sure the pull request is ready to review. The icons indicate whether the item will be subject to the 🛠 Technical or 🧪 Scientific review.


To help with the number pull requests:

@schlunma schlunma added the preprocessor Related to the preprocessor label Sep 6, 2023
@schlunma schlunma added this to the v2.10.0 milestone Sep 6, 2023
@schlunma schlunma self-assigned this Sep 6, 2023
@codecov
Copy link

codecov bot commented Sep 6, 2023

Codecov Report

Merging #2191 (6481245) into main (0cea725) will increase coverage by 0.18%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main    #2191      +/-   ##
==========================================
+ Coverage   93.29%   93.47%   +0.18%     
==========================================
  Files         238      238              
  Lines       12848    12927      +79     
==========================================
+ Hits        11986    12084      +98     
+ Misses        862      843      -19     
Files Coverage Δ
esmvalcore/_recipe/check.py 92.42% <100.00%> (+0.39%) ⬆️
esmvalcore/_recipe/recipe.py 99.00% <100.00%> (+<0.01%) ⬆️
esmvalcore/cmor/_fixes/shared.py 100.00% <ø> (ø)
esmvalcore/preprocessor/_area.py 98.78% <100.00%> (+4.23%) ⬆️
esmvalcore/preprocessor/_multimodel.py 98.72% <100.00%> (+0.68%) ⬆️
esmvalcore/preprocessor/_rolling_window.py 100.00% <100.00%> (ø)
esmvalcore/preprocessor/_shared.py 100.00% <100.00%> (ø)
esmvalcore/preprocessor/_time.py 97.76% <100.00%> (+0.04%) ⬆️
esmvalcore/preprocessor/_volume.py 92.36% <100.00%> (+0.76%) ⬆️

... and 1 file with indirect coverage changes

@schlunma schlunma marked this pull request as ready for review September 7, 2023 17:19
@valeriupredoi
Copy link
Contributor

this looks like a superb PR and feature addons, I'll go through it in detail tomorrow @schlunma - 20 files with ~+1,700 lines deserve proper love 😁

@bouweandela
Copy link
Member

Nice work @schlunma!

I'd like to propose a slightly different API though. It is common practice to pass through keyword arguments using just the **kwargs argument, so I would like to suggest to change the preprocessor function signatures from

def area_statistics(
    cube: Cube,
    operator: str,
    operator_kwargs: Optional[dict] = None,
) -> Cube:

to

def area_statistics(
    cube: Cube,
    operator: str,
    **kwargs: Any,
) -> Cube:

and the recipe usage to:

  preprocessors:
    monthly_percentiles:
      climate_statistics:
        period: monthly
        operator: percentile
        percent: 95.0

For multiple statistics in the same preprocessor function, it would be easier to read if the operators were grouped with their arguments, e.g.

  preprocessors:
    mm_stats:
      multi_model_statistics:
        span: overlap
        statistics:
          - operator: percentile
            percent: 5
          - operator: median
          - operator: percentile
            percent: 95

Copy link
Contributor

@valeriupredoi valeriupredoi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first batch of review (just before rolling window)

doc/recipe/preprocessor.rst Outdated Show resolved Hide resolved
doc/recipe/preprocessor.rst Show resolved Hide resolved
doc/recipe/preprocessor.rst Outdated Show resolved Hide resolved
doc/recipe/preprocessor.rst Outdated Show resolved Hide resolved
esmvalcore/_recipe/check.py Show resolved Hide resolved
esmvalcore/preprocessor/_multimodel.py Outdated Show resolved Hide resolved
Copy link
Contributor

@valeriupredoi valeriupredoi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couple more minor changes requested, bud. I sped through the tests changes since I know foull well you write very good tests 👍

esmvalcore/preprocessor/_shared.py Outdated Show resolved Hide resolved
esmvalcore/preprocessor/_shared.py Show resolved Hide resolved
esmvalcore/preprocessor/_shared.py Show resolved Hide resolved
esmvalcore/preprocessor/_shared.py Outdated Show resolved Hide resolved
esmvalcore/preprocessor/_shared.py Outdated Show resolved Hide resolved
esmvalcore/preprocessor/_shared.py Outdated Show resolved Hide resolved
esmvalcore/preprocessor/_shared.py Outdated Show resolved Hide resolved
esmvalcore/preprocessor/_time.py Show resolved Hide resolved
@valeriupredoi
Copy link
Contributor

OK I finished the review - a very solid PR - I guess the fundamental question here is if we want to get this closer to iris - the infrastructure introduced in here ties us inextricably with the way iris does stats - I can see both pros and (a few big) cons, but since we're not even contemplating a switch from iris, it's OK. Many thanks, Manu! 🍺

Copy link
Contributor

@valeriupredoi valeriupredoi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good work, Manu! If you could please God Flake, then this is ready to go IMO - maybe @zklaus can merge after flake's fixed 🍺

@zklaus zklaus merged commit dde0755 into main Oct 10, 2023
3 of 4 checks passed
@zklaus zklaus deleted the stat_preprocs branch October 10, 2023 14:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment