Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi model statistics: missing files for individual models #975

Closed
remi-kazeroni opened this issue Feb 1, 2021 · 6 comments · Fixed by #978
Closed

Multi model statistics: missing files for individual models #975

remi-kazeroni opened this issue Feb 1, 2021 · 6 comments · Fixed by #978
Assignees
Labels
bug Something isn't working
Milestone

Comments

@remi-kazeroni
Copy link
Contributor

Describe the bug
When using the multi_model_statistics preprocessor step, the preproc directory contains only the global file for multi model statistics. The files for each datasets are missing.

Note that everything seems fine with the branch release_2.1. We should aim at fixing that before the next release @ESMValGroup/esmvaltool-coreteam

Please attach

  • The recipe that you are trying to run, you can find a copy in the run directory in the output directory
  • The main_log_debug.txt file, this can also be found in the run directory in the output directory
    recipe_test_mmm.yml.txt
    main_log_debug.txt
@remi-kazeroni
Copy link
Contributor Author

Mistral being super slow today, I haven't been able to create an up to date environment... Mine is about one week old...

@schlunma
Copy link
Contributor

schlunma commented Feb 1, 2021

I'm pretty sure this was introduced in #949:

The old version of the cube added all products to the output of multi_model_statistics:

if output_products:
products |= statistic_products
return products
return statistic_products

The new version does not:

statistics_products = set()
for statistic, cube in statistics_cubes.items():
statistics_product = output_products[statistic]
statistics_product.cubes = [cube]
for product in products:
statistics_product.wasderivedfrom(product)
logger.info("Generated %s", statistics_product)
statistics_products.add(statistics_product)
return statistics_products

@schlunma schlunma assigned Peter9192 and unassigned schlunma Feb 1, 2021
@Peter9192
Copy link
Contributor

Good spot! The behaviour shouldn't have changed like this, but also I think maybe we shouldn't change it back. Let me explain.

#949 is actually working towards support for an ensemble statistics preprocessor. We first implemented all of this in #673 but then figured it had become too big and we wanted to split it up in smaller tasks. Hence I copied some of the changes to #949. At this point it was not necessary to change the behaviour, and I should have spotted that.

However, when we continue towards the ensemble stats, we need this behaviour to change. See for example this use case. If you, at some point in the future, want to combine multimodel and ensemble statistics in a single preprocessor (e.g. first compute ens means, then multimodel between the ensemble means), you don't want to pass the original products to multimodel as well.

So a preprocessor that contains multimodel should only return multimodel products. An alternative way to get the output from the original cubes as well is to split the preprocessor:

preprocessors:
  everything_else: &everything_else
    area_statistics: ... 
    regrid_time: ...  
  multimodel:
    <<: *everything_else
    multi_model_statistics

variables:
  tas_all_cubes:
    preprocessor: everything_else
  tas_mmstats:
    preprocessor: multimodel

Do you know which recipes are affected by this? And do you agree this is the better solution in the long run? Then the quesion is: should we change it back for now and then change it again in the future, or leave it like this, document the changes in the changelog, and make a PR to fix the affected recipes? @remi-kazeroni

@LisaBock
Copy link
Contributor

LisaBock commented Feb 2, 2021

@Peter9192
The problem is that many diagnostics are affected by this modification. Including several IPCC diagnostics in the recipe
recipe_flato13ipcc.yml (Perfmetrics, Temperature Anomaly, Pattern Correlation,...) and in the recipe recipe_collins13ipcc.yml, as well as diagnostics in the recipe recipe_lauer13jclim.yml and several more... All these diagnostics show single models as well as the multi-model mean in their plots.

As your suggestion means that we would have to preprocess and regrid all models twice for these diagnostics and first of all, adjust them all to this modification, I would suggest to add an option to the 'multi_model_statistics' preprocessor. The default could be the old version as used by several diagnostics and the switch could be to supress the additional output of single models.

And please check the next time more carefully how new features affect existing diagnostics! Sorry, but I need the old feature as soon as possible because I have to create the new IPCC figures right now!

@Peter9192
Copy link
Contributor

@LisaBock thanks for the clarification. I think the solution you propose is also I good idea, I'll look into it.

If you need the old behaviour right away please have a look at this tip from Bouwe.

@Peter9192
Copy link
Contributor

Peter9192 commented Feb 2, 2021

Here's a fix: #978

@LisaBock @remi-kazeroni can you see if this solves your problems? I didn't test it thoroughly yet as I get the impression you want it sooner rather than later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants