Allow for a variable aliasing scheme (for use in model development) #1083

senesis · 2021-04-26T15:12:07Z

Hi, @jvegasbsc , @rswamina , @mattiarighi , @bouweandela , @bsolino

I am working with IPSL, in the context of IS-ENES3, for testing the feasibility of ESMValTool use in model development. I was impressed both by the clear design of the code and the extensive documentation. Congratulations !

For the goal above, one need more flexibility in the data_finder, for replacing {short_name} by a variable name which is let to user's or configurer's choice. This is useful when e.g. the model outputs are quite consistent with some CMIP project tables set, and the departures can be addressed by the fix_metadata and fix_data features.

This case is quite different from the 'variable_alt_names' scheme described in this other issue, and which, if I understood well, is devoted to the case when the same physical variables has different names in different 'standard' projects (as e.g. 'si' in CMIP == 'siconc' in CMIP6). The difference is that we would like, here, to avoid creating a tables set that would be specific to the model, but rather use an existing tables set

Digging in the code I found that, in function _find_input_files, variable['short_name'] is changed before calling _find_input_dirs and _get_filenames, this in order to use an alternate variable name in file naming. So I tested to set it using a short function, which queries the config for a new project entry named 'aliases' (see code below)

It works, and allows to further explore the overall goal.

Could I go forward that way toward a PR ?

And, by the way, where should I include commits that only deal with improving esmvalcore code docstrings, and logged messages text ?

def _get_variable_alias(variable, short_name) :
    """Provide an alternate value for short_name, in the variable's
    project, if it exists, else None .
    """
    cfg = get_project_config(variable['project'])
    aliases = cfg.get('aliases',{})
    return aliases.get(short_name,None)

The text was updated successfully, but these errors were encountered:

valeriupredoi · 2021-04-26T15:58:00Z

@senesis cheers for the issue! I would encourage you to provide us with a concrete used case - since this issue could be faceted in that a variable could be mapped/aliased to a CMOR variable, or not, or maybe: e.g a variable that is produced by some OBS model is the exact equivalent of pr but it's called precipitation something just because the model devs decided to call it that way and not adhere to CMOR standards but that is the perrfect equivalent of CMOR's pr - aliasing will work perfectly in this case, then you have a pr-like variable that needs a constant fudge factor, in this case a custom derived variable needs to be constructed, and the third case is when things really go bazooka and you have (like in the UM case) a stash code to reprsent a variable that doesn't really have any CMOR equivalent (to be able to map it) neither derivation will work well - in this case it's just a very custom variable that will have a custom table

senesis · 2021-04-26T16:51:47Z

OK. The use case, for now, is : model IPSL-CM6 has two native output formats, both NetCDF-based. Format 'TS' is composed of single-variable files, named e.g. :

CM61-LR-hist-03.1950_18500101_18591231_1M_t2m.nc

which includes a NetCDF variable 't2m', which is actually the exact equivalent of 'tas' variable

So, it matches your first case :

a variable could be mapped/aliased to a CMOR variable

There may be some issues with CMOR conformance, but that I intend to address either in a model-specific fix (e.g. here adding a height2m scalar coordinate) , or through built-in CMOR fixes (e.g. renaming the variable)

I will certainly also have some variable derivation issues to address (and I am not sure of how to best address re-constructing a CMOR standard variable by combining non-standard variables), but this will be another story.

sloosvel · 2021-04-28T10:47:07Z

If you work with the config-developer projects to set the proper tags and set the project for the data to native6 , you should be able to find the files without having to modify the code. This is similar to what is done to read ERA5 or ORAS4 from the original files.

Or you can even create a new project in the config-developer.yml (which is what we do to work with our model data for monitoring purposes).

senesis · 2021-04-28T11:19:16Z

Thanks. I was able to create a new project quite successfuly for 'my' model output

But my goal goes beyond finding data by indicating in the recipe a project specific variable attribute such as 'era5_name'; I want to be able to apply all existing recipes (which together form the actual treasure of ESMValTool), to a mix of CMIP data and data which filenames are formed using non-standard variable name

Said otherwise : for any recipe requesting 'tas', the _data_finder shoud, for a dataset of my newly defined project, translate 'tas' to 't2m' for finding the file named "CM61-LR-hist-03.1950_18500101_18591231_1M_t2m.nc"

I see no other way than the code change described above

senesis · 2021-04-28T14:14:30Z

And _find_input_files would just have to be slightly changed :

def _find_input_files(variable, rootpath, drs):
__short_name = variable['short_name']
__variable['short_name'] = variable['original_short_name']
__# Use project's specific alias if any
__alias = _get_variable_alias(variable, short_name)
__if alias is not None :
______variable['short_name'] = alias
__input_dirs = _find_input_dirs(variable, rootpath, drs)
__filenames_glob = _get_filenames_glob(variable, drs)
__files = find_files(input_dirs, filenames_glob)
__variable['short_name'] = short_name
__return (files, input_dirs, filenames_glob)

bouweandela · 2021-05-06T10:13:29Z

The idea we had on how to achieve this is described very shortly in #309, i.e. have a yaml file (path configurable per project in config-developer.yml) containing a mapping from CMIP6 variables to extra key-value pairs. Those extra key value pairs should then be added to the dict containing the variable-dataset description, for example right after this line esmvalcore/_recipe.py#L1099.

These extra keys could then be used to find the data using the directory structure defined in config-developer.yml without any modifications needed to the functions for finding input data.

bouweandela · 2021-05-06T10:17:08Z

On a related note: having a separate 'project' per supported model would probably be OK as these are not so many, but we also had the idea of making the DRS definition in config-developer.yml a bit more flexible #970 (comment), because we would not like to have a separate project for every supported observational/reanalysis dataset as that would just be too many.

senesis · 2021-05-06T13:07:34Z

The idea we had on how to achieve this is described very shortly in #309, i.e. have a yaml file (path configurable per project in config-developer.yml) containing a mapping from CMIP6 variables to extra key-value pairs.

That sounds great.

Can we safely assume that such a keys can be either project-specific keys (such as 'label_for_variable_in_filename') or standard keys (such as 'dataset', that would drive the choice of a fix module)

Also : there is no description there of the specific 'recipe' mechanics that would allow to apply a python code for deriving variables. And I do not see how such a code would be provided with necessary input variables, while the 'standard' derived variable scheme allows nicely for that

senesis · 2021-05-06T13:22:03Z

On a related note: having a separate 'project' per supported model would probably be OK as these are not so many, but we also had the idea of making the DRS definition in config-developer.yml a bit more flexible #970 (comment), because we would not like to have a separate project for every supported observational/reanalysis dataset as that would just be too many.

The #970 (comment) introduces the concept of 'center' , which is new for ESMValTool. And I think it is worth thinking twice at what it would mean or drive.

It should so drive the choice of

the DRS
the cmor fixes, maybe at the level of choosing the fixes directory (instead of using the project for that)
and ?

bsolino · 2021-05-10T06:54:27Z

The #970 (comment) introduces the concept of 'center' , which is new for ESMValTool. And I think it is worth thinking twice at what it would mean or drive.

Sorry for the confusion, I am not used to the ESMValTool vocabulary and it seems I chose the word poorly.

What I was calling "center" (for "datacenter") is not a new concept, as far as I know. I'm not sure how are they usually called, but it appears here with the name "key machines": https://docs.esmvaltool.org/projects/esmvalcore/en/latest/quickstart/configure.html#developer-configuration-file

I will edit the comment to avoid further confusion

senesis added the enhancement New feature or request label Apr 26, 2021

valeriupredoi assigned jvegreg and sloosvel Apr 26, 2021

This was referenced Apr 29, 2021

slow dataload for input files with multiple variables ESMValGroup/ESMValTool#2141

Open

Dates for the next (virtual) ESMValTool Workshop - May 4 to May 6 ESMValGroup/ESMValTool#2067

Closed

senesis closed this as completed Jun 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow for a variable aliasing scheme (for use in model development) #1083

Allow for a variable aliasing scheme (for use in model development) #1083

senesis commented Apr 26, 2021

valeriupredoi commented Apr 26, 2021 •

edited

Loading

senesis commented Apr 26, 2021 •

edited

Loading

sloosvel commented Apr 28, 2021

senesis commented Apr 28, 2021

senesis commented Apr 28, 2021

bouweandela commented May 6, 2021

bouweandela commented May 6, 2021

senesis commented May 6, 2021

senesis commented May 6, 2021

bsolino commented May 10, 2021

Allow for a variable aliasing scheme (for use in model development) #1083

Allow for a variable aliasing scheme (for use in model development) #1083

Comments

senesis commented Apr 26, 2021

valeriupredoi commented Apr 26, 2021 • edited Loading

senesis commented Apr 26, 2021 • edited Loading

sloosvel commented Apr 28, 2021

senesis commented Apr 28, 2021

senesis commented Apr 28, 2021

bouweandela commented May 6, 2021

bouweandela commented May 6, 2021

senesis commented May 6, 2021

senesis commented May 6, 2021

bsolino commented May 10, 2021

valeriupredoi commented Apr 26, 2021 •

edited

Loading

senesis commented Apr 26, 2021 •

edited

Loading