Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New multimodel module can be slow and buggy for certain recipes #1201

Closed
valeriupredoi opened this issue Jun 29, 2021 · 34 comments
Closed

New multimodel module can be slow and buggy for certain recipes #1201

valeriupredoi opened this issue Jun 29, 2021 · 34 comments
Assignees
Labels
bug Something isn't working preprocessor Related to the preprocessor release

Comments

@valeriupredoi
Copy link
Contributor

Unfortunately I must consider reverting to the older multimodel module asap - the new module is EXTREMELY slow, buggy in different spots and not that much more efficient for memory usage. Take this recipe for instance:

# ESMValTool
# recipe_capacity_factor.yml
---
documentation:
  description: |
     Solar Capacity Factor

  authors:
    - cionni_irene

  maintainer:
    - weigel_katja

  references:
    - bett2016renene
    - weigel2021gmd

  projects:
    - crescendo

datasets:
  - {dataset: ERA-Interim, project: OBS6, type: reanaly, version: 1, tier: 3,
     start_year: 1980, end_year: 2005}
  #- {dataset: ACCESS1-0, project: CMIP5, exp: historical, ensemble: r1i1p1,
  #   start_year: 1980, end_year: 2005}
  - {dataset: ACCESS1-3, project: CMIP5, exp: historical, ensemble: r1i1p1,
     start_year: 1980, end_year: 2005}
  #- {dataset: CanESM2, project: CMIP5, exp: historical, ensemble: r1i1p1,
  #   start_year: 1980, end_year: 2005}
  #- {dataset: CMCC-CM, project: CMIP5, exp: historical, ensemble: r1i1p1,
  #   start_year: 1980, end_year: 2005}
  - {dataset: CNRM-CM5, project: CMIP5, exp: historical, ensemble: r1i1p1,
     start_year: 1980, end_year: 2005}
  #- {dataset: CSIRO-Mk3-6-0, project: CMIP5, exp: historical, ensemble: r1i1p1,
  #   start_year: 1980, end_year: 2005}
  #- {dataset: GFDL-ESM2G, project: CMIP5, exp: historical, ensemble: r1i1p1,
  #   start_year: 1980, end_year: 2005}
  - {dataset: IPSL-CM5A-MR, project: CMIP5, exp: historical, ensemble: r1i1p1,
     start_year: 1980, end_year: 2005}
  #- {dataset: MIROC5, project: CMIP5, exp: historical, ensemble: r1i1p1,
  #   start_year: 1980, end_year: 2005}
  - {dataset: MPI-ESM-MR, project: CMIP5, exp: historical, ensemble: r1i1p1,
     start_year: 1980, end_year: 2005}
  #- {dataset: MRI-CGCM3, project: CMIP5, exp: historical, ensemble: r1i1p1,
  #   start_year: 1980, end_year: 2005}
  #- {dataset: NorESM1-M, project: CMIP5, exp: historical, ensemble: r1i1p1,
  #   start_year: 1980, end_year: 2005}
preprocessors:
  preproc:
    regrid:
      target_grid: reference_dataset
      scheme: linear
    extract_region:
      start_longitude: -20
      end_longitude: 60
      start_latitude: 30
      end_latitude: 80
#     extract_season: &season
#       season: djf
# Multi_model_statistics turned of due to an issue with the Core v2.3.0
    multi_model_statistics:
      span: overlap
      statistics: [mean]
      exclude: [reference_dataset]


diagnostics:
  capacity_factor:
    description: Calculate the wind power capacity factor.
    variables:
      tas:
        reference_dataset: ERA-Interim
        preprocessor: preproc
        mip: day
      rsds:
        reference_dataset: ERA-Interim
        preprocessor: preproc
        mip: day
    scripts: null
      #main:
      #  # <<: *season
      #  season: all
      #  script: pv_capacityfactor/pv_capacity_factor.R
      #  maxval_colorbar: 0.15
  • this is the chopped up recipe that @katjaweigel uses to report a suite of issues in Multimodel fails for daily data with different calendars #1198 - the whole thing runs in 7min with esmvalcore=2.2.0 (about 3min max for each of the multimodel bits), with current implement it takes 15min to fail (not even completing the first multimodel bit). Memory-wise the 2.2.0 module consumed max 9GB whereas the new module consumed 5GB so not that much better. Fails are at different stress points;
  • one of the models (ACCESS1-3) has a different AuxCoord: height than the others (at 1.5m compared to 2m) so slice merge fails because of that
  • cubes entered in the cube list comprehension at line 274 have different shapes, importantly different time axis lengths, and indexing on an index that is outside the axis length crashes the code there

I propose we retire the current implement to a development branch and temporarily re-instate the 2.2.0 module

@valeriupredoi valeriupredoi added bug Something isn't working preprocessor Related to the preprocessor release labels Jun 29, 2021
@zklaus
Copy link

zklaus commented Jun 29, 2021

First, let's try to stay factual and polite. SHOUTING is not generally considered that.

On the points:

  • I think it is (almost) meaningless to look at runtime and, for that matter, memory before the two approaches are successfully doing the same thing.
  • For the different heights, I like the new approach more than the old one. Silently ignoring a coordinate that is different doesn't seem correct to me. Perhaps a tas fix for Access with a warning output could be a good solution? The CMIP5 data request unfortunately is a bit soft with

normally, the temperature should be reported at the 2 meter height

  • The last point is interesting. Why do the cubes end up with different lengths? This smells like a bug and should be treated as such.

@katjaweigel
Copy link
Contributor

@zklaus: The different length of the time axis seems to be connected to leap years. I just wrote in the other issue:
The reason for the

WARNING [32819] /work/bd0854/b380216/anaconda3/envs/esmvaltool202106/lib/python3.9/site-packages/iris/coords.py:1979: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'time'.
warnings.warn(msg.format(self.name()))

Warning and the

IndexError: index 9490 is out of bounds for axis 0 with size 9490

Error is most probably, that there are is a different number of days in the 26 years the recipe tries to look at, either 9490 days or 9497 days (probably depending on if the calendar contains leap years or not).
I'll try if "regrid_time" would help.

@katjaweigel
Copy link
Contributor

(esmvaltool202006) b380216@mistralpp3% grep -in Running.extract_region.air_temperature.*time: /work/bd1083/b380216/output/recipe_pv_capacity_factor_20210629_095915/run/main_log_debug.txt
6449:2021-06-29 10:00:43,142 UTC [9314] DEBUG esmvalcore.preprocessor:279 Running extract_region(air_temperature / (K) (time: 9490; latitude: 241; longitude: 480)
6731:2021-06-29 10:01:01,890 UTC [9314] DEBUG esmvalcore.preprocessor:279 Running extract_region(air_temperature / (K) (time: 9497; latitude: 241; longitude: 480)
7020:2021-06-29 10:01:06,680 UTC [9314] DEBUG esmvalcore.preprocessor:279 Running extract_region(air_temperature / (K) (time: 9497; latitude: 241; longitude: 480)
7593:2021-06-29 10:01:32,682 UTC [9314] DEBUG esmvalcore.preprocessor:279 Running extract_region(air_temperature / (K) (time: 9497; latitude: 241; longitude: 480)
8138:2021-06-29 10:02:06,893 UTC [9314] DEBUG esmvalcore.preprocessor:279 Running extract_region(air_temperature / (K) (time: 9497; latitude: 241; longitude: 480)
8733:2021-06-29 10:02:30,240 UTC [9314] DEBUG esmvalcore.preprocessor:279 Running extract_region(air_temperature / (K) (time: 9490; latitude: 241; longitude: 480)
9052:2021-06-29 10:02:45,243 UTC [9314] DEBUG esmvalcore.preprocessor:279 Running extract_region(air_temperature / (K) (time: 9497; latitude: 241; longitude: 480)
9736:2021-06-29 10:03:08,329 UTC [9314] DEBUG esmvalcore.preprocessor:279 Running extract_region(air_temperature / (K) (time: 9490; latitude: 241; longitude: 480)
10297:2021-06-29 10:03:46,899 UTC [9314] DEBUG esmvalcore.preprocessor:279 Running extract_region(air_temperature / (K) (time: 9490; latitude: 241; longitude: 480)
11117:2021-06-29 10:04:26,199 UTC [9314] DEBUG esmvalcore.preprocessor:279 Running extract_region(air_temperature / (K) (time: 9497; latitude: 241; longitude: 480)
11743:2021-06-29 10:04:47,619 UTC [9314] DEBUG esmvalcore.preprocessor:279 Running extract_region(air_temperature / (K) (time: 9490; latitude: 241; longitude: 480)
12691:2021-06-29 10:05:41,717 UTC [9314] DEBUG esmvalcore.preprocessor:279 Running extract_region(air_temperature / (K) (time: 9497; latitude: 241; longitude: 480)
13764:2021-06-29 10:06:30,602 UTC [9314] DEBUG esmvalcore.preprocessor:279 Running extract_region(air_temperature / (K) (time: 9490; latitude: 241; longitude: 480)

@valeriupredoi
Copy link
Contributor Author

yes @katjaweigel that's exactly that, well spotted! About @zklaus point about run times, I agree, in principle, if it was a run time difference of tens of percent, but a factor 4 or 5 is unacceptable. I am aware of this happening in certain situations but I've done quite a few tests prior to this being merged and noticed it consistently, albeit those tests were done with significantly lower amounts of data. This is a solid case with plenty of data, and unfortunately this runtime difference is a showstopper in my books, especially that the gain from less required memory is relatively marginal 👍

@valeriupredoi
Copy link
Contributor Author

For the different heights, I like the new approach more than the old one. Silently ignoring a coordinate that is different doesn't seem correct to me. Perhaps a tas fix for Access with a warning output could be a good solution? The CMIP5 data request unfortunately is a bit soft with

agreed! but we need a checker inside multimodel to either ignore or fix that AuxCoord (or others that might differ). Failing on a coordinate that is not used in the statistic itself is not the desired behavior

@zklaus
Copy link

zklaus commented Jun 29, 2021

On the runtime: @valeriupredoi, your opinion is noted and will of course be taken into consideration.

On the AuxCoord: This coordinate is used. What is the correct height of the resulting data? 2.0m? 1.5m? The weighted average? I don't see that the multimode preprocessor is in a good position to make that call. Since the overwhelming majority of models does use 2m and this is the suggested height of the standards, the best approach to me seems to add a fix to ACCESS-1.3 to change the height coordinate in the tas* variable(s). One might consider a correction to the data itself, but I leave this to the atmospheric experts.

On the leap day issue: How did the old preprocessor handle this? The only clear solution seems to be some form of time regridding, but I think that should be done by the user explicitly, thus raising an error (though possibly with a better message), is the right thing, imho.

@katjaweigel
Copy link
Contributor

Currently, the preprocessor setting:

regrid_time:
  frequency: day

Doesn't solve the leap year issue, its fails with:

IndexError: list index out of range
2021-06-29 12:27:34,816 UTC [8169] INFO esmvalcore._main:444

and there are still cubes with 9490 and 9497 days.

@bouweandela
Copy link
Member

not that much more efficient for memory usage.

Note that work on the performance is still in progress in #968, no performance improvement should be expected from the current implementation. The improvement that is included in main branch at the moment is that it uses iris instead of numpy for computing the statistics. As expected, this will lead to more strict handling of metadata. We will probably need some additional fixes for that, e.g. I proposed #1177 to address some of these issues.

@valeriupredoi
Copy link
Contributor Author

the old preprocessor would find the overlap via np.intersect1d and would chop the data around it (if overlap was the setting), I think this new one does it in a similar manner but looks at the start/end points without checking that the points inside the overlap interval are the same (or time points arrays have equal lengths)

@bettina-gier
Copy link
Contributor

If we're talking issues with the new mmm handling that weren't there before the refactoring, I also found one testing recipe_perfmetrics_land_CMIP5.yml as mentioned in this comment: ESMValGroup/ESMValTool#2198 (comment)
After regridding not all models had bounds for the longitude coordinate, making iris fail the cube merge. If I add a guess_bounds in there, the problem is solved. Seems to me that for now with all these issues arising, it would be better to revert the change, until all mmm recipes are tested and potential issues compiled and looked at. Now, this might highlight some things we need to change in other areas of the core - but considering the release timing a revert is the most plausible thing.

@valeriupredoi
Copy link
Contributor Author

not that much more efficient for memory usage.

Note that work on the performance is still in progress in #968, no performance improvement should be expected from the current implementation. The improvement that is included in main branch at the moment is that it uses iris instead of numpy for computing the statistics. As expected, this will lead to more strict handling of metadata. We will probably need some additional fixes for that, e.g. I proposed #1177 to address some of these issues.

right on, but how can you explain the slowness?

@valeriupredoi
Copy link
Contributor Author

If we're talking issues with the new mmm handling that weren't there before the refactoring, I also found one testing recipe_perfmetrics_land_CMIP5.yml as mentioned in this comment: ESMValGroup/ESMValTool#2198 (comment)
After regridding not all models had bounds for the longitude coordinate, making iris fail the cube merge. If I add a guess_bounds in there, the problem is solved. Seems to me that for now with all these issues arising, it would be better to revert the change, until all mmm recipes are tested and potential issues compiled and looked at. Now, this might highlight some things we need to change in other areas of the core - but considering the release timing a revert is the most plausible thing.

good call @bettina-gier 👍

@zklaus
Copy link

zklaus commented Jun 29, 2021

Note that all the problems we have discussed so far (including the one brought up by @bettina-gier) are pre-existing bugs that have passed us unnoticed until now. Reverting would therefore not solve any issue but just cover up wrong results. I am at the moment not convinced that is the right way to go about it.

Let's try to focus our energy now on understanding the problems and move to a solution strategy after that.

@valeriupredoi
Copy link
Contributor Author

Klaus, those are not masked problems but rather things we don't really care for enough to fix them in the old module and we don't care enough to cause failures in the new module. We have a broken MM module on several aspects and the time we need to address those aspects is much longer than the time to revert and have a working module. We care about functionality foremost and then about improving performance and fixing knickknacks - not that the performance of the new MM is better, as Bouwe says, it's still work in progress

@zklaus
Copy link

zklaus commented Jun 29, 2021

V, I disagree with that assessment and don't think you can speak for all of us here.

As a point in case, trying to advance the issue of the uneven time axis, MOHC's HadGEM2-ES uses a 360 day calendar. Over the 25 year period of this recipe, that's a difference of 25*5+25/4=131 days or over 4 months of data that get chopped off the end of all models when you add HadGEM to the list of datasets. Worse, since in all calendars the time is kept as days since a reference point, the same time point (as integer number of days since the reference) refers to different days of the year, towards the end of the period in completely different seasons that all get mixed up here. Whether Summer days get mixed into Winter or Winter days into Summer depends on the position of HadGEM in the list of datasets.

I think scientists care about this sort of thing.

@valeriupredoi
Copy link
Contributor Author

I don't speak for the others - I speak for the core functionality of a key module in ESMValTool which is currently busted and it needs serious repairs to get both its functionality and performance to standards. Interim solution is to replace it with the old module which is known to work (and has been tested against reference results) while we perform the needed fixes and performance optimizations, it's as simple as that

@zklaus
Copy link

zklaus commented Jun 29, 2021

I gave a concrete example of where the old module seems to lead to very wrong results. Do you think that example is wrong? Do you think people are aware of this?

@valeriupredoi
Copy link
Contributor Author

seems is the key word, the module has been tested thoroughly against reference results, unlike the new one. Have a go using the revert branch when you have some time and if that's a bug indeed, then we list it here and we weigh in the arguments for or against reverting

@valeriupredoi
Copy link
Contributor Author

valeriupredoi commented Jun 29, 2021

OK @zklaus @katjaweigel @bettina-gier I have got the revert done in #1202 (just need to fix the docs build but that's be done if we decide to revert) - if you guys want to test/or just run your recipes to actually get results, go for it! I think we need to talk about this asap, lucky we have the ESMValTool call on Thu, @bouweandela sign me up for this pls 😁

@Peter9192
Copy link
Contributor

Peter9192 commented Jun 29, 2021

Sorry to hear that you're experiencing issues with the updated multi-model code (but good to see some real use case testing coming in!) My intention has always been for the code to be more transparent, so that it will be easier to understand, debug, and optimise. While I understand it's annoying that this recipe doesn't work, let's not rush into reverting the update.

I agree with @zklaus that we should prefer solving these bugs that are surfacing right now properly, rather than hiding them away. In the old branch, there were no tests, notes, docstrings, etc. that would inform fellows about these "intentional workarounds". In updating the mmstats code over the past year or so, we stumbled upon many of these implicit assumptions, e.g. also in #685, and these things keep surfacing all the time, which is just as annoying, just not all at once.

With respect to the specific issues mentioned here:

@katjaweigel
Copy link
Contributor

katjaweigel commented Jun 30, 2021

@Peter9192 thanks for the summary and the links to the related issues!
So far, I did not find any recipe that uses mip = day and multi-model statistics (recipe_pv_capacity_factor.yml is not merged yet and the multi-model mean was removed from the PR) but one never knows what happens in non merged recipes.
I think the leap year issue could be solved through the preprocessor setting regrid_time?
But currently that doesn't work (at least not with:
regrid_time:
frequency: day)
see above.
About all this issues I just wonder:

  • Why this was not marked as possible breaking changes to the development team (as for example this PR: Add cmor_name as an extra tag for getting variable info #1099)?

  • Why not test all recipes with multi-model mean before the core release, if there is such a large change? (Even if the solution should be changes in the diagnostics in several cases, it is not clear if this is the case for all issues, several could as well require further changes in the core.)

@Peter9192
Copy link
Contributor

I think the leap year issue could be solved through the preprocessor setting regrid_time?

I think so too. Also relevant: #744 (and see #781 for an overview of many related developments)

* Why this was not marked as possible breaking changes to the development team

Not sure what you mean by 'marking' here. I see no label or so on the PR you linked. I think we've made a lot of progress in improving the development process, but this could've been smoother, I agree.

* Why not test all recipes with multi-model mean before the core release, if there is such a large change?

I think there's several reasons for that. For one, there's no easy way to do that. That's why we've spent a lot of time in improving the tests and the test infrastructure (e.g. the testbot, the regression tests with sample data, and #1023). Bouwe is currently trying to set up infrastructure to run all recipes on Mistral, but this is still challenging. And for previous releases, we actually did a lot of manual tests with recipes, so I would have expected issues to surface during this release as well. But of course in this case it depends on the effort everyone puts into testing this with their frequently used recipes.

@Peter9192 Peter9192 changed the title New multimodel module is EXTREMELY slow, buggy and not so much more memory efficient New multimodel module can be slow and buggy for certain recipes Jun 30, 2021
@katjaweigel
Copy link
Contributor

@Peter9192 Thanks for your answer!
With marking I meant just to write "@ESMValGroup/esmvaltool-developmentteam" somewhere in the discussion, as for the issue I mentioned above. Then everybody in the development team gets an email to be aware of the discussion. I think there was the idea communicated during the last ESMValTool workshop that this should be done for potential "breaking changes", i.e. changes in the core which could lead to recipes, that don't run any more.

@valeriupredoi
Copy link
Contributor Author

cheers @Peter9192 🍺 Here's how I see it:

  • the new MM module is a visible improvement on many aspects compared to the previous implement, but needs thorough testing both to confirm its functionality and performance (tweaking the performance is ongoing, as Bouwe points out); I must confess I should have tested it much more than I did, but I can say the same about all you folk 😁
  • we want it over the old implement asap
  • but as things stand now it isn't mature and tested enough, so it will need a period of time until this is done ("shakedown"); while this is happening I strongly suggest we revert to the older module because that has enough functionality and enough good performance for users to run it and not get unexpected errors and performance issues; this shakedown period may last a couple weeks or even more, once it's over and we're happy with the new MM, we release a bugfix release; this might be good to have anyway to get all the minor issues with the fx handling fixed in it too

If we decide to use the older v2.2.0 MM module we should have a quick and dirty v2.3.1 out asap, followed by the bugfix release 2.3.2 when the new MM (and fx stuff) have been merged

@axel-lauer
Copy link
Contributor

I like the idea of going back to the old version and waiting until the "shakedown" is finished. A bugfix release once we are happy with the new version sounds like the way to go.

@stefsmeets
Copy link
Contributor

Would it be an idea to have both functions side by side? i.e. users can call multimodel_statistics for the old implementation (I recommend a very annoying deprecation warning!), and multimodel_statistics2 for the new implementation.

Then you can have the benefit of having no breakage of the old recipes giving users some time to migrate, while also having the new implementation available in the release for the shakedown. Once the new implementation has matured, the old version can be removed.

@zklaus
Copy link

zklaus commented Jun 30, 2021

Here is how I see it:

  • The new version works correctly according to the implementor(s) of the old version since it passes their tests that guarantees it.
  • It improves the integrity and has already uncovered two bugs in recipes:
    • one, where incorrectly data from different heights is being averaged as if it was at the same height and
    • one, where data from different days is averaged and passed of as data from the same day
  • Despite leaving out the actual performance tuning, it already decreases the memory usage a bit
  • There is some concern about the time requirements, though the evidence seen so far is anecdotal and of very short times (only a few minutes). This might be worth investigating, but even then might be a price worth paying for the improved integrity.

The recipes should surely be fixed, no matter what we decide about the multimodel preprocessor.

@valeriupredoi
Copy link
Contributor Author

Would it be an idea to have both functions side by side? i.e. users can call multimodel_statistics for the old implementation (I recommend a very annoying deprecation warning!), and multimodel_statistics2 for the new implementation.

Then you can have the benefit of having no breakage of the old recipes giving users some time to migrate, while also having the new implementation available in the release for the shakedown. Once the new implementation has matured, the old version can be removed.

that's the right way from a software cycle point, absolutely, and in fact it'd be easy since we don't have to change anything in Tool - it'd also allow us to do a direct apples-to-apples comparison between the two modules, I like the idea 👍

@katjaweigel
Copy link
Contributor

@zklaus
To "fix the recipes" one probably needs to do additional fixes in the core in most cases:

  • As far as I understood the bounds issue from @bettina-gier, this should be fixed in the core.
  • The only way to "fix" the leap year issue is at the moment, not to use the multi-model mean for daily data (what was done for the recipe where I encountered it). To really fix it, we would probably need a completely new version of regrid_time (preprocessor, so belongs to the core as well). If I understood it correctly, regrid_time only resamples the time keeping the number of points at the moment. If we would like to really regrid daily data onto the same grid, we would need possibilities to interpolate as for the horizontal regridding.
  • For the issue with the 2m or 1.5m height a model fix was suggested (which would be part of the core). But I'm actually not sure, if that would be a better solution, because:
    • A model fix would change the height to 2m also if no multi-model mean is applied.
    • It is difficult, to calculate the right tas for 2m (If you use a climatology/fixed value the error will probably depend on latitude, land/sea, "weather", .... I'm not sure how well you could approximate it with the help of other variables: I guess the height grid for ta is too coarse to help, rather hfls and/or ts could be used for it, but I'm not sure how well this would work and if it is a good idea to use additional variables for fixes since it would make tas (2m) for such models to kind of a derived variable.)
    • The difference for tas (2m) and tas (1.5m) is probably tiny in most case.
    • Not to use tas for a multi-model mean from models where it is not given for 2m is probably not a popular idea.
    • Therefore it would be maybe better to somehow allow to average 1.5m and 2m tas in case of multi-model means and give out a warning? (On the other hand nobody reads warnings and there might be other similar but larger differences.)

@zklaus
Copy link

zklaus commented Jul 1, 2021

There certainly may be changes necessary in the core and a bugfix release is a real possibility. We may even decide to revert the multi-model statistics if weighing the issues and solutions makes that the best choice.

But we should be clear where the problems lie, so as

  • not to unfairly blame people that have, with a lot of work, improved our codebase
  • to avoid that people come away from this discussion with the expectation that somebody will fix the multi-model statistics and make the other problems go away
  • to make sure that we really address the actual problems.
* As far as I understood the bounds issue from @bettina-gier, this should be fixed in the core.

Yes. Longitudes and latitudes must have bounds. I am not sure whether this is a simple dataset issue that needs a fix or a bug in the regridding. That should be discussed in the appropriate issue.

* The only way to "fix" the leap year issue is at the moment, not to use the multi-model mean for daily data (what was done for the recipe where I encountered it). To really fix it, we would probably need a completely new version of regrid_time (preprocessor, so belongs to the core as well). If I understood it correctly, regrid_time only resamples the time keeping the number of points at the moment. If we would like to really regrid daily data onto the same grid, we would need possibilities to interpolate as for the horizontal regridding.

I think the issue is with different calendars, so at the moment one needs to avoid using multi-model statistics for daily data with mixed calendars. That seems quite sensible to me. A long-term solution involves probably conversions between calendars. This is commonly done by duplicating or removing days, but the details should be discussed elsewhere. I would probably rather place such a thing in a (very useful!) align_calendars or similar pre-processor.

* For the issue with the 2m or 1.5m height a model fix was suggested (which would be part of the core). But I'm actually not sure, if that would be a better solution, because:
  * A model fix would change the height to 2m also if no multi-model mean is applied.
  * It is difficult, to calculate the right tas for 2m (If you use a climatology/fixed value the error will probably depend on latitude, land/sea, "weather", .... I'm not sure how well you could approximate it with the help of other variables: I guess the height grid for ta is too coarse to help, rather hfls and/or ts could be used for it, but I'm not sure how well this would work and if it is a good idea to use additional variables for fixes since it would make tas (2m) for such models to kind of a derived variable.)
  * The difference for tas (2m) and tas (1.5m) is probably tiny in most case.

You are completely right about these points.

  * Not to use tas for a multi-model mean from models where it is not given for 2m is probably not a popular idea.

I'm not sure how much backlash there really would be. If it's only about a few CMIP5 datasets (that have a history of issues with their surface temperature), perhaps not many people care. Though this may be more widely spread.

  * Therefore it would be maybe better to somehow allow to average 1.5m and 2m tas in case of multi-model means and give out a warning? (On the other hand, nobody reads warnings and there might be other similar but larger differences.)

Perhaps. I am really not sure either way. But should that also be done for the other offered statistics? What about standard deviation or extreme percentiles?

Either way, again, let's discuss the right issue. If we decide that averaging 1.5m and 2m tas is the way to go, let's state it that way and make clear what exactly is the behavior we want.

@ruthlorenz
Copy link

For the issue with the 2m or 1.5m height a model fix was suggested (which would be part of the core). But I'm actually not sure, if that would be a better solution, because:

  • A model fix would change the height to 2m also if no multi-model mean is applied.
  • It is difficult, to calculate the right tas for 2m (If you use a climatology/fixed value the error will probably depend on latitude, land/sea, "weather", .... I'm not sure how well you could approximate it with the help of other variables: I guess the height grid for ta is too coarse to help, rather hfls and/or ts could be used for it, but I'm not sure how well this would work and if it is a good idea to use additional variables for fixes since it would make tas (2m) for such models to kind of a derived variable.)
  • The difference for tas (2m) and tas (1.5m) is probably tiny in most case.
  • Not to use tas for a multi-model mean from models where it is not given for 2m is probably not a popular idea.
  • Therefore it would be maybe better to somehow allow to average 1.5m and 2m tas in case of multi-model means and give out a warning? (On the other hand nobody reads warnings and there might be other similar but larger differences.)

I totally agree with Katja on the tas height issue.
Actually, tas is defined as "near-surface (usually, 2 meter) air temperature". Some models give it at 1.5m (definitely ACCESS and I guess anything else using UM for the atmosphere, so UKESM, HadGEM etc.). So its quite a few models from the same family. Ignoring all of them in a multi-model mean because they define tas on a different height is not the way to go! This is not an error in the data.
tas is a diagnostic anyway, and not a prognostic variable. If you need the prognostic variable and would care about the difference between 1.5 and 2 meters you need to use ta. So I would allow averaging 1.5m and 2m heights (simply ignore the height in this case would be my suggestion).

@katjaweigel
Copy link
Contributor

@zklaus I'm not blaming the changes in the core for the issues themselves. I'm just not happy, that there were not enough test before the release to discover them and that there was no warning about possible "breaking changes". It would be good, if everything in the tool would be working, when it is released and at the moment that is not the case with the new core. So I wonder how that could be solved best?
I don't think it is good if we have to reduce/change the models used in the recipes for new release (because of bounds, 2m height, new model fixes which fix the grid for one variable but make a derived variable impossible as it happened for one model in case of recipe_deangelis15nat.yml ), but that would be the only way to now to only change the tool to make it work with the released core. (And I'm not even sure if it would help in all cases.)
Part of theses issues can always happen (the leap year issue currently probably only effects a non merged recipe in a PR, one never can avoid something like that completely), but for this release there are rather many of these issues and we probably would have seen several of them with more tests before the core release.

@zklaus
Copy link

zklaus commented Jul 1, 2021

@ruthlorenz, @katjaweigel, it's good to know that more models are affected (and I confirmed this for both CMIP5 and CMIP6). I opened a new issue to discuss specifically the question on how to treat this. It would be great if you could help to resolve the questions over there in #1204.

@katjaweigel, on tests vs releases, first, keep in mind that the current released version of the tool, 2.2.0, requires a core version <2.3.0, so there is no released version of the tool that has these problems. Whether a change is breaking or not can be difficult to determine until it breaks something, more so if the actual behavior is correct.

I agree that we should try to keep all models in the recipes, and even though that is sometimes not possible (for example, Manuel Schlund ran into a problem with a dataset that has meanwhile been retracted), just removing models because we can't deal with them anymore is certainly not the way to go.

@zklaus
Copy link

zklaus commented Jul 20, 2021

I think all issues raised here are now being addressed in other issues. An overview can be found in ESMValGroup/ESMValTool#2218. Closing this. If you disagree, please feel free to reopen with a comment explaining the need.

@zklaus zklaus closed this as completed Jul 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working preprocessor Related to the preprocessor release
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants