Building my comparison part 1: datasets_setup.py and atlas subdirectories

The C-ESM-EP is a way to apply collections of diagnostics to a set of simulations.

It is intimately linked with the concept of comparison.

In the C-ESM-EP vocabulary, a comparison is a directory containing:

subdirectories for the atlases (collections of diagnostics); each one contains a parameter file that controls the execution of the diagnostics in main_C-ESM-EP.py
a python file datasets_setup.py: this is where the user specifies the datasets that will be taken as inputs of the C-ESM-EP

In this page you will find:

Foreword: keep only the atlases you need for your comparison

It is advised to keep in your comparison directory only the atlases subdirectories you need. The C-ESM-EP runs only the atlases available in the comparison directory. As well, the C-ESM-EP frontpage contains only the links to the atlases available for the comparison. Do not hesitate to remove the subdirectories you don't need to avoid unnecessary computation and storage of the results. The atlases subdirectories (with the parameter files) are available in standard_comparison (or a 'git pull' away) and share/optional_atlas if you need them.

Foreword 2: control the routine cache cleaning

At the beginning of datasets_setup.py, you will find those two lines:

# -- Patterns to clean the cache at the end of the execution of the atlas
routine_cache_cleaning = [dict(age='+20')]

This controls a routine cleaning of the cache at the end of the execution of the cache. The C-ESM-EP will loop over the elements of this list to clean the cache. This way you can combine keywords to target specific files with the cleaning. Example:

routine_cache_cleaning = [dict(age='+20'), dict(pattern='oneVar', age='+10')]

This instruction will first clean the files that have not been used for more than 20 days, and then clean the files containing the pattern 'oneVar' that have not been used for more than 10 days.

Have a look at the CliMAF documentation on the crm function to know more about the keywords you can use for this routine cache cleaning.

1. Adding my datasets to datasets_setup.py

In datasets_setup.py, there is a python list 'models' which elements are python dictionaries describing the access to the datasets. They are basically the set of keywords/values that are provided to the CliMAF ds() function to access the data, without the keyword 'variable'. We will see now how to add your own datasets.

In CLiMAF, the different data structures are described with CliMAF 'projects'. Each CliMAF project provides access to the datasets through keywords/values:

- simulation
- frequency
- period

and other keywords that are specific to the projects.

The most commonly used projects are 'CMIP5' (CMIP5 archive on Ciclad) and 'IGCM_OUT' (data tree produced by libIGCM = most of the model outputs produced at IPSL).

Here is an example of a CMIP5 dataset definition and an IPSLCM6 coupled model simulation in datasets_setup.py:

models = [
   dict(
	project = 'CMIP5',
	model = 'IPSL-CM5A-LR',
	experiment = 'historical',
	simulation = 'r1i1p1',
	frequency = 'monthly',
	period = '1980-2005'
       ),
   dict(project = 'IGCM_OUT',
        root = '/path_to_thredds',
        login = 'p86caub',
        model = 'IPSLCM6',
        simulation = 'CM605-LR-pdCtrl01',
        frequency = ’monthly',
        clim_period = ‘last_20Y'
       )
]

Note: /path_to_thredds is the root path to the thredds until the login (well known on Ciclad and Curie)

2. Using the common_keys

In datasets_setup.py, you will see a mechanism to specify common keys for the elements of models. This to avoid duplicating the keywords in the dictionaries of models that are the same among a set of datasets. The mechanism in standard_comparison/datasets_setup.py adds the key/values to the IGCM_OUT dataset dictionaries that are not already specified in models.

Example with a set of simulations for an ORCHIDEE meeting:

models = [
      # -- Coupled models
      dict(project='IGCM_OUT', login='p86fair', simulation='CM6014-pd-splith-01', color='green' ),
      dict(project='IGCM_OUT', login='p86maf', simulation='CM6014-pd-split-D-01', color='red'),
      dict(project='IGCM_OUT', login='p86maf', simulation='CM6014-pd-ttop-01', color='blue'),

      # -- LMDZOR
      dict(project='IGCM_OUT', login='p86ghatt', model='LMDZOR', status='PROD',
           experiment='ref4438', simulation='CL5.4438.L6010.ref'),
      dict(project='IGCM_OUT', login='p86ghatt', model='LMDZOR', status='PROD',
           experiment='ref4438', simulation='CL5.4438.L6010.alt1'),

      # -- ORCHIDEE offline
      dict(project='IGCM_OUT', login='p529bast', model='OL2', status='PROD',
           experiment='ref4783', simulation='FG2.4783.v3'),
      dict(project='IGCM_OUT', login='p529bast', model='OL2', status='PROD',
           experiment='ref4783', simulation='FG2.4783.v4'),
      dict(project='IGCM_OUT', login='p529bast', model='OL2', status='PROD',
           experiment='ref4783', simulation='FG3.4783.v3'),
      dict(project='IGCM_OUT', login='p529bast', model='OL2', status='PROD',
           experiment='ref4783', simulation='FG3.4783.v4'),

]

# -- Provide a set of common keys to the elements of models
# ---------------------------------------------------------------------------- >
common_keys = dict(
           root='/path_to_thredds', login='*',
           model='IPSLCM6',
           frequency='monthly',
           clim_period='last_10Y',
           ts_period='full',
           )

for model in models:
  if model['project']=='IGCM_OUT':
    if '-pi' in model['simulation']:
        model.update(dict(experiment='piControl'))
    if '-pd' in model['simulation']:
        model.update(dict(experiment='pdControl'))
    for key in common_keys:
        if key not in model:
           model.update({key:common_keys[key]})

3. ts_period, clim_period and the period manager

The C-ESM-EP contains diagnostics on climatological averages, and other on time series. This way the user can specify a period for the climatologies, clim_period, and one for the time series, ts_period. Example:

   dict(project = 'IGCM_OUT',
        root = '/path_to_thredds',
        login = 'p86caub',
        model = 'IPSLCM6',
        simulation = 'CM605-LR-pdCtrl01',
        frequency = ’monthly',
        clim_period = ‘last_20Y'
        ts_period   = ‘full'
       )

clim_period and ts_period can take either real dates (ex: 1980-2000, 2100_2169...), or 'instructions' like 'last_20Y', 'first_1Y' or 'full' (explicit).

Those instructions are user-friendly ways to work on the XX last or first years of a simulation, without having to actually search for those dates by yourself.

This task is devoted to the period manager. The period manager is a C-ESM-EP functionality. It works for IGCM_OUT (and other IGCM_OUT related projects), CMIP5 and will work for the upcoming CMIP6 project. If you want to add your project to the C-ESM-EP and use the period manager, contact jerome . servonnat at lsce . ipsl . fr

The period manager works for monthly (CMIP5 and IGCM_OUT) and seasonal (IGCM_OUT only) frequencies. For this latter, use instructions like 'last_SE' or 'first_SE'.

4. The customname: control the name in the plot

By default, the C-ESM-EP will build a string based on the model name for CMIP5 datasets, and on the simulation name for the other projects (and the 'product' for the ref_ts and ref_climatos projects that give access to the reference products). If you want to provide a custom name to identify your simulation in the plots, you can use the keyword customname in the dictionary of the dataset:

  dict(project = 'IGCM_OUT',
       root = '/path_to_thredds',
       login = 'p86caub',
       model = 'IPSLCM6',
       simulation = 'CM605-LR-pdCtrl01',
       frequency = ’monthly',
       clim_period = ‘last_20Y'
       ts_period   = ‘full'
       customname  = ‘My favorite simulation'
      )

By default, the C-ESM-EP automatically gives a name in the plots to the datasets. It works as follows:

it takes the customname if provided by the user
for CMIP5 datasets, it uses the model name
for other datasets, it combines the model name and the simulation/realization

You can provide a (combination of) keyword(s) as customname rather than a fixed string, like those examples:

        #customname  = ‘${model} ${simulation}'
        #customname  = ‘${simulation}'
        #customname  = ‘${realization}_${period}'

The C-ESM-EP will replace the keywords that it gets from the dataset when they exist. For instance, if you want to have the realization as title rather than model and realization for CMIP members, set customname in the common_keys like this:

        customname  = ‘${realization}'

5. Control the reference used to compute the differences

The C-ESM-EP performs a lot of comparisons with a reference. The reference can be either a reference product (observations, reanalysis...) or a simulation.

The variable reference controls this in datasets_setup.py . By default, it is set to 'default'. This means that the C-ESM-EP will use a set of pre-defined reference products for the different variables.

reference = 'default'

If you want to use a simulation as reference, provide a dataset dictionary to reference:

reference = dict(project = 'CMIP5', model='CNRM-CM5', experiment='historical',
                 frequency='monthly', period='1980-2005',
                 customname='CMIP5 CNRM-CM5'
                 )

Important note: if you want to use your own simulations or obs/references as reference for your atlas, your CliMAF project must contain the attribute:

product if these are observations/reanalyses
model if you point to model simulations This way the C-ESM-EP will be able to identify that the reference is model or an obs, and use the appropriate plot parameters ('model_model' if the reference is a model, 'bias' if the reference is an obs/reanalysis). If your CliMAF project does not include one of those keywords, you will encounter errors (the C-ESM-EP will need either one or the other when searching for the name that defines the dataset).

6. One word on colors to identify your datasets in the C-ESM-EP

The C-ESM-EP uses colors to identify your datasets for the time series (MainTimeSeries) and the metrics (ParallelCoordinates_Atmosphere, TuningMetrics, HotellingTest). By default, the C-ESM-EP will automatically attribute a color to each dataset if the user didn't specify one. The colors are taken in order from this list:

cesmep_python_colors = ['royalblue', 'red', 'green', 'mediumturquoise', 'orange',
                        'navy', 'limegreen', 'steelblue', 'fuchsia',
                        'blue', 'goldenrod', 'yellowgreen', 'blueviolet', 'darkgoldenrod', 'darkgreen',
                        'mediumorchid', 'lightslategray', 'gold', 'chartreuse', 'saddlebrown', 'tan',
                        'tomato', 'mediumvioletred', 'mediumspringgreen', 'firebrick',
                        ]

The function that handles the colors in the C-ESM-EP is the colors_manager. It returns a list of colors (hashtag color names) that can be understood by both python and R scripts used in the C-ESM-EP.

It gives priority to the user specified colors. Note that the user can specify the same color to multiple datasets.

Note: the colors_manager will return 'royalblue' if the user specifies 'blue'.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly