Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Including recipe_era5.yml under the folder cmorizer is misleading the users. #1909

Closed
SarahAlidoost opened this issue Nov 25, 2020 · 28 comments · Fixed by #2182
Closed

Including recipe_era5.yml under the folder cmorizer is misleading the users. #1909

SarahAlidoost opened this issue Nov 25, 2020 · 28 comments · Fixed by #2182
Labels

Comments

@SarahAlidoost
Copy link
Contributor

ESMValTool provides (limited) support for data in their native format. In this case, the steps needed to reformat the data are executed as datasets fixes during the execution of an ESMValTool recipe. This has the advantage that the user does not need to store a duplicate (cmorized) copy of the data. Instead, the cmorization is performed ‘on the fly’ when running a recipe. ERA5 is the first dataset for which this ‘cmorization on the fly’ is supported. (see cmorization as a fix for more information.)

Currently, there is a recipe_era5.yml under the folder cmorizer. This misleads the users to run this recipe to cmorize era5 data (converting from native6 to OBS6) before running their own recipe.

This recipe includes three diagnostics sections: hourly, daily, monthly.
The diagnostics hourly and monthly are meant as examples to show users how to write a recipe working with era5 data. Therefore, the example folder is a better place for them.
The diagnostics daily is very useful to convert hourly data to daily data. Because era5 archive does not include daily data.
A recipe including only this part can be created in the recipes folder with a name like recipe_daily_era5.yml.

In addition, now recipe_check_obs.yml includes some checks for hourly and monthly era5 under the project OBS6. Those checks should be removed.

Please see also related issues here #1884 and #1889 .

@SarahAlidoost
Copy link
Contributor Author

@bouweandela , @Peter9192, @katjaweigel please let me know what you think.

@bouweandela
Copy link
Member

Hi @SarahAlidoost,

Good point, maybe we could move the example use for hourly and monthly data to recipe_check_obs.yml, because that contains all examples of using this kind of data and move the daily data diagnostic to a recipe cmorizers/recipe_daily_era5.yml, because there we do want to pre-cmorize?

@katjaweigel
Copy link
Contributor

I'm not sure. Without a hint from Birgit @hb326 I didn't manage to find them there, when I first wanted to use ERA5 data, so I agree that it is probably not the best the place. But I'm not sure your suggestions make it easier. Also based on what Birgit told me and my own workflow I'd expect that a lot of users will actually use them as cmorizors to use OBS6 in the actual recipes to make them faster.

@bouweandela
Copy link
Member

bouweandela commented Nov 25, 2020

Also based on what Birgit told me and my own workflow I'd expect that a lot of users will actually use them as cmorizors to use OBS6 in the actual recipes to make them faster.

What frequency do you mean @katjaweigel? For hourly or monthly data, I would be surprised if using cmorized OBS6 data was any faster than just the native6 data.

@katjaweigel
Copy link
Contributor

It is also confusing, that there is the diagnostic:
ESMValTool/esmvaltool/diag_scripts/cmorizers/era5.py
and the fix:
ESMValCore/esmvalcore/cmor/_fixes/native6/era5.py
with the same name. Maybe the diagnostic should be renamed?

@katjaweigel
Copy link
Contributor

katjaweigel commented Nov 25, 2020

Also based on what Birgit told me and my own workflow I'd expect that a lot of users will actually use them as cmorizors to use OBS6 in the actual recipes to make them faster.

What frequency do you mean @katjaweigel? For hourly or monthly data, I would be surprised if using cmorized data was any faster than just the native data.

Monthly data I think. I only tested short bits so far, but @hb326 mentioned, that the on the flight cmorizer slows things down,

@SarahAlidoost
Copy link
Contributor Author

Hi @SarahAlidoost,

Good point, maybe we could move the example use for hourly and monthly data to recipe_check_obs.yml, because that contains all examples of using this kind of data and

Thank you. I am still concerned about including monthly/hourly dataset with project native 6 in the recipe_check_obs.yml. This recipe is meant for checking cmorized data that are OBS/OBS6, am I right? Also, it would be difficult to find examples there. The monthly/hourly diagnostics sections are meant as an example. So it would be nice to have them in the example folder and notify users in the documentation, here .

move the daily data diagnostic to a recipe cmorizers/recipe_daily_era5.yml, because there we do want to pre-cmorize?

This is fine.

@Peter9192
Copy link
Contributor

Peter9192 commented Nov 25, 2020

maybe we could move the example use for hourly and monthly data to recipe_check_obs.yml, because that contains all examples of using this kind of data and move the daily data diagnostic to a recipe cmorizers/recipe_daily_era5.yml, because there we do want to pre-cmorize?

I like this suggestion. Maybe we can do both? Do the check of all variables in recipe check obs, and make a separate example recipe to show in general how to use the native data?

@katjaweigel
Copy link
Contributor

maybe we could move the example use for hourly and monthly data to recipe_check_obs.yml, because that contains all examples of using this kind of data and move the daily data diagnostic to a recipe cmorizers/recipe_daily_era5.yml, because there we do want to pre-cmorize?

I like this suggestion. Maybe we can do both? Do the check of all variables in recipe check obs, and make a separate example recipe to show in general how to use the native data?

I don't really like it. I think it makes it really hard to find and forces the user to check the code to find available variables (with fixes) and just try it for all others. I also like that the recipe currently provides the translation between ERA5 and cmor variable name, this is often not easy to find (could be moved to the description though, like we discussed for ERA5-Land).

@Peter9192
Copy link
Contributor

also like that the recipe currently provides the translation between ERA5 and cmor variable name

Actually this has changed, and the era5_name and era5_frequency are no longer needed. The current DRS to find the right ERA5 data is:

  input_dir:
    default: 'Tier{tier}/{dataset}/{latestversion}/{frequency}/{short_name}'
  input_file:
    default: '*.nc'

so it depends on the user putting the right variables inside the right folders. I agree that this is not pretty. Perhaps we should add a table with this mapping to the documentation instead?

@bouweandela
Copy link
Member

Perhaps we should add a table with this mapping to the documentation instead?

Yes, I agree, and a check that they did it right in the era5 fix.

@bouweandela
Copy link
Member

Monthly data I think. I only tested short bits so far, but @hb326 mentioned, that the on the flight cmorizer slows things down,

@katjaweigel @hb326 It would be great if you could share a recipe and some run times, if you happen to have this.

@fserva

This comment has been minimized.

@SarahAlidoost
Copy link
Contributor Author

A related issue is #1889

@hb326
Copy link
Contributor

hb326 commented May 6, 2021

Monthly data I think. I only tested short bits so far, but @hb326 mentioned, that the on the flight cmorizer slows things down,

@katjaweigel @hb326 It would be great if you could share a recipe and some run times, if you happen to have this.

I had cmorized the ERA5 data not on the fly, but before I used them in the recipe, since the diagnostics are still fickle and crash a lot. So I cmorized the data beforehand to not have to do this step every time I needed to run the diagnostics again after it crashed. I do not really know if, with a functioning diagnostic, the time to run the on-the-fly-cmorizer would slow things down. I have not timed it so far. Do you still want me to do it?
I used monthly data for the diagnostic.

@katjaweigel
Copy link
Contributor

I used both on the flight and pre cmorized ERA5 monthly data, but never for exactly the same data set (I had that test on my todo list for a while but I somehow never found time, sorry!) I would assume that if there is a time difference for monthly data it is not large, at least the time for preprocessing ERA5 data with the on-the-fly-cmorizer was not unusually long compared to CMIP data.

@SarahAlidoost

This comment has been minimized.

@bouweandela

This comment has been minimized.

@remi-kazeroni

This comment has been minimized.

@fserva

This comment has been minimized.

@fserva

This comment has been minimized.

@fserva

This comment has been minimized.

@zklaus

This comment has been minimized.

@fserva

This comment has been minimized.

@remi-kazeroni

This comment has been minimized.

@zklaus
Copy link

zklaus commented May 18, 2021

Spot on, @remi-kazeroni ! Is the original issue here solved? @SarahAlidoost, what needs to happen to close this issue?

@SarahAlidoost
Copy link
Contributor Author

Spot on, @remi-kazeroni ! Is the original issue here solved? @SarahAlidoost, what needs to happen to close this issue?

We need a pull request to fix the original issue (i.e. moving files around) including this comment . I can make the PR.

@zklaus
Copy link

zklaus commented May 18, 2021

@fserva, I have marked a bunch of comments here as off-topic, so that we can focus on the original issue here. Please don't mistake that for me dismissing your questions and input! Let's just discuss the open points in other issues, eg ESMValGroup/ESMValCore#1136 and #1889.

@SarahAlidoost, great if you can take care of this, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants