Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cmor_name as an extra tag for getting variable info #1099

Closed
wants to merge 4 commits into from

Conversation

jvegreg
Copy link
Contributor

@jvegreg jvegreg commented May 5, 2021

ESMValTool was assuming that the name of all variables in the CMOR tables was the same that their out_name, which represents the name in the files (both in the path and inside the file). This is not true for all, so we have to modify it to be able to deal with these kind of variables.

To do so, we are using short_name for the out_name and cmor_name for the name in the tables. This implies changing the name of some parameters from short_name to cmor_name in most of the CMOR related functions, including checking, fixing and derivation. The only API change that is not a pure replace is the derive function as it requires both now

To replace #391

Closes #333

Checklist

It is the responsibility of the author to make sure the pull request is ready to review. The icons indicate whether the item will be subject to the 🛠 Technical or 🧪 Scientific review.


To help with the number pull requests:

@jvegreg jvegreg requested a review from sloosvel May 5, 2021 13:37
@sloosvel
Copy link
Contributor

sloosvel commented May 5, 2021

Works fine, I am able to load the data by adding the cmor_name for the variable. Maybe some documentation to address these cases could be useful?

@jvegreg jvegreg requested a review from bouweandela May 5, 2021 14:29
@jvegreg jvegreg added bug Something isn't working cmor Related to the CMOR standard labels May 5, 2021
@jvegreg jvegreg added this to the v2.3.0 milestone May 5, 2021
@bouweandela
Copy link
Member

Looks nice, very small impact on existing recipes!

Sorry, I only now realize that renaming the short_name argument of the preprocessor functions to cmor_name is a backwards incompatible change to the public API. Is the situation any better if we would keep short_name for what is now called cmor_name and have an option to provide var_name in the recipe for out_name (i.e. the opposite of this?) Or is that even worse?

@jvegreg
Copy link
Contributor Author

jvegreg commented May 5, 2021

Or is that even worse?

We will need to modify the config-developer.yml at least while the api change in the preprocessor functions will only affect users when passing parameters by name.

I think tweaking the config-developer.yml it's potentially more disruptive, if only because the function error is really well reported by python itself.

@bouweandela
Copy link
Member

I think tweaking the config-developer.yml it's potentially more disruptive, if only because the function error is really well reported by python itself.

I agree, that would be much worse.

@bouweandela
Copy link
Member

@jvegasbsc To follow up on our discussion yesterday about making backwards-incompatible changes: Could you please add a thorough description of this pull request at the top, highlighting which changes are backwards incompatible and including a guide on how to upgrade that can be used in the changelog? And then tag the core development team and ask them if they think these changes are OK?

@jvegreg
Copy link
Contributor Author

jvegreg commented May 6, 2021

@ESMValGroup/esmvaltool-developmentteam, any issues with the breaking changes on this PR? Should only affect people calling the preprocessor functions for CMOR check and fixes and derivation in diagnostics, which I do not know if it is really a current use case

Co-authored-by: Bouwe Andela <[email protected]>
@bouweandela
Copy link
Member

Sorry, I've thought about this some more and I'm still not convinced this is the best solution. The variable name (out_name) can be read from the cmor table, so there ahould be no need to provide the extra cmor_name argument in the recipe. If we consider short_name to be the name in the table, we can add var_name (i.e. out_name) automatically from the table in the same place where we add standard_name etc and use that in the DRS. I believe that the filename is the only place where var_name is used, right? This would indeed changing all occurrences of short_name with var_name in config-developer.yaml, but that's a backwards compatible change: older versions of config-developer.yaml will keep working just fine (only not for the rare cases that are fixed by this pr, but those do not work now either, so that's not a problem).

@zklaus
Copy link

zklaus commented May 7, 2021

I am also not convinced that the users will know the cmor_name. It is, for example, not something that you can search for on ESGF. I wanted to understand better the extent and kind of the problem, so I looked through the cmip6 tables. What I found is in CMIP6_duplicate_variables.json.txt. It is a listing of all those names of variables that share the same out_name for all tables, listing for every table only those out_names that occur more than once.

It seems there are three classes of variables:

  • Atmospheric variables that live on pressure levels plev27 or plev7h (suffix -27 and -7h, respectively)
  • Ocean variables that maybe available as 3d (normal, no suffix) or only as 2d variables (suffix -2d)
  • Atmospheric forcings for certain gases that are in some simulations only forced by climatology (suffix -Clim)

@jvegreg
Copy link
Contributor Author

jvegreg commented May 10, 2021

I am also not convinced that the users will know the cmor_name.

Me neither, but what is your proposal to automatically distinguish which variable are the users expecting to get when the out_name is not enough?

@zklaus
Copy link

zklaus commented May 10, 2021

Enough for what? When there are two variables with the same out_name in one table, they represent two alternative representations of the same phenomenon. Every model can only store one of those. So the out_name is always enough to find the data. What the user gets is the best possible representation per model.

In other words, a user will ask for (Emon, hus) and should get hus or hus27 depending on what is available for every model. It is then up to the user to decide how to deal with the varying availability. For some analysis, users will want both kinds of model and perhaps integrate vertically, for other kinds of analysis the user might want to choose to request only those models that provide the full vertical resolution.

@zklaus
Copy link

zklaus commented Oct 11, 2021

It's too late now for 2.4.0, but this item should receive high priority for 2.5.0.

@zklaus zklaus modified the milestones: v2.4.0, v2.5.0 Oct 11, 2021
@schlunma
Copy link
Contributor

Since this has already been postponed once, it might be good to include this in v2.5.

@sloosvel sloosvel modified the milestones: v2.5.0, v2.6.0 Jan 25, 2022
@sloosvel
Copy link
Contributor

Finally, it will go to 2.6 . I will address the remaining comments.

@zklaus
Copy link

zklaus commented Jan 25, 2022

We should address this for 2.6.0, but note that the way to address this may well be to abandon it or to start fresh. The current approach is not quite convincing at the moment.

@sloosvel
Copy link
Contributor

but note that the way to address this may well be to abandon it or to start fresh. The current approach is not quite convincing at the moment.

Maybe it's better to close this then. I don't think I will have time to address it for 2.6.

@sloosvel sloosvel closed this May 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cmor Related to the CMOR standard
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Problem loading zg for 6hrPlevPt
5 participants