Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERA5 Native6 Fix cannot handle single monthly-averaged files #2511

Open
rbeucher opened this issue Aug 14, 2024 · 3 comments · May be fixed by #2512
Open

ERA5 Native6 Fix cannot handle single monthly-averaged files #2511

rbeucher opened this issue Aug 14, 2024 · 3 comments · May be fixed by #2512

Comments

@rbeucher
Copy link
Contributor

Describe the bug

The ERA5 replica collection at the National Computing Infrastructure down here in Australia stores monthly-averaged variable in multiple files, one file per month.

Here is an example for the tas (tp) variable:

/g/data/rt52/era5/single-levels/monthly-averaged/tp/1959
[rb5533@gadi-login-01 1959]$ ls -l
total 13516
-rw-rw----+ 1 zt77_era5 rt52 1190902 Jul 14  2022 tp_era5_moda_sfc_19590101-19590131.nc
-rw-rw----+ 1 zt77_era5 rt52 1175650 Jul 14  2022 tp_era5_moda_sfc_19590201-19590228.nc
-rw-rw----+ 1 zt77_era5 rt52 1159305 Jul 14  2022 tp_era5_moda_sfc_19590301-19590331.nc
-rw-rw----+ 1 zt77_era5 rt52 1160865 Jul 14  2022 tp_era5_moda_sfc_19590401-19590430.nc
-rw-rw----+ 1 zt77_era5 rt52 1154256 Jul 14  2022 tp_era5_moda_sfc_19590501-19590531.nc
-rw-rw----+ 1 zt77_era5 rt52 1101700 Jul 14  2022 tp_era5_moda_sfc_19590601-19590630.nc
-rw-rw----+ 1 zt77_era5 rt52 1065742 Jul 14  2022 tp_era5_moda_sfc_19590701-19590731.nc
-rw-rw----+ 1 zt77_era5 rt52 1102501 Jul 14  2022 tp_era5_moda_sfc_19590801-19590831.nc
-rw-rw----+ 1 zt77_era5 rt52 1149061 Jul 14  2022 tp_era5_moda_sfc_19590901-19590930.nc
-rw-rw----+ 1 zt77_era5 rt52 1167799 Jul 14  2022 tp_era5_moda_sfc_19591001-19591031.nc
-rw-rw----+ 1 zt77_era5 rt52 1201453 Jul 14  2022 tp_era5_moda_sfc_19591101-19591130.nc
-rw-rw----+ 1 zt77_era5 rt52 1179680 Jul 14  2022 tp_era5_moda_sfc_19591201-19591231.nc

Using ESMValCore and the following code:

from esmvalcore.dataset import Dataset
tas = Dataset(
    short_name='tas',
    project='native6',
    mip="Amon",
    dataset='ERA5',
    timerange="19590101/19600101"
)
cube = tas.load()

Produces the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[56], line 1
----> 1 cube = tas.load()

File [/g/data/xp65/public/apps/med_conda/envs/esmvaltool-0.4/lib/python3.11/site-packages/esmvalcore/dataset.py:685](https://are.nci.org.au/g/data/xp65/public/apps/med_conda/envs/esmvaltool-0.4/lib/python3.11/site-packages/esmvalcore/dataset.py#line=684), in Dataset.load(self)
    682     input_files.extend(supplementary_dataset.files)
    683 esgf.download(input_files, self.session['download_dir'])
--> 685 cube = self._load()
    686 supplementary_cubes = []
    687 for supplementary_dataset in self.supplementaries:

File [/g/data/xp65/public/apps/med_conda/envs/esmvaltool-0.4/lib/python3.11/site-packages/esmvalcore/dataset.py:771](https://are.nci.org.au/g/data/xp65/public/apps/med_conda/envs/esmvaltool-0.4/lib/python3.11/site-packages/esmvalcore/dataset.py#line=770), in Dataset._load(self)
    766 result = [
    767     file.local_file(self.session['download_dir']) if isinstance(
    768         file, esgf.ESGFFile) else file for file in self.files
    769 ]
    770 for step, kwargs in settings.items():
--> 771     result = preprocess(
    772         result,
    773         step,
    774         input_files=self.files,
    775         output_file=output_file,
    776         debug=self.session['save_intermediary_cubes'],
    777         **kwargs,
    778     )
    780 cube = result[0]
    781 return cube

File [/g/data/xp65/public/apps/med_conda/envs/esmvaltool-0.4/lib/python3.11/site-packages/esmvalcore/preprocessor/__init__.py:397](https://are.nci.org.au/g/data/xp65/public/apps/med_conda/envs/esmvaltool-0.4/lib/python3.11/site-packages/esmvalcore/preprocessor/__init__.py#line=396), in preprocess(items, step, input_files, output_file, debug, **settings)
    395 result = []
    396 if itype.endswith('s'):
--> 397     result.append(_run_preproc_function(function, items, settings,
    398                                         input_files=input_files))
    399 else:
    400     for item in items:

File [/g/data/xp65/public/apps/med_conda/envs/esmvaltool-0.4/lib/python3.11/site-packages/esmvalcore/preprocessor/__init__.py:346](https://are.nci.org.au/g/data/xp65/public/apps/med_conda/envs/esmvaltool-0.4/lib/python3.11/site-packages/esmvalcore/preprocessor/__init__.py#line=345), in _run_preproc_function(function, items, kwargs, input_files)
    341 logger.debug(
    342     "Running preprocessor function '%s' on the data\n%s%s\nwith function "
    343     "argument(s)\n%s", function.__name__, pformat(items), file_msg,
    344     kwargs_str)
    345 try:
--> 346     return function(items, **kwargs)
    347 except Exception:
    348     # To avoid very long error messages, we truncate the arguments and
    349     # input files here at a given threshold
    350     n_shown_args = 4

File [/g/data/xp65/public/apps/med_conda/envs/esmvaltool-0.4/lib/python3.11/site-packages/esmvalcore/cmor/fix.py:195](https://are.nci.org.au/g/data/xp65/public/apps/med_conda/envs/esmvaltool-0.4/lib/python3.11/site-packages/esmvalcore/cmor/fix.py#line=194), in fix_metadata(cubes, short_name, project, dataset, mip, frequency, check_level, session, **extra_facets)
    193 cube_list = CubeList(cube_list)
    194 for fix in fixes:
--> 195     cube_list = fix.fix_metadata(cube_list)
    197 # The final fix is always GenericFix, whose fix_metadata method always
    198 # returns a single cube
    199 cube = cube_list[0]

File [/g/data/xp65/public/apps/med_conda/envs/esmvaltool-0.4/lib/python3.11/site-packages/esmvalcore/cmor/_fixes/native6/era5.py:424](https://are.nci.org.au/g/data/xp65/public/apps/med_conda/envs/esmvaltool-0.4/lib/python3.11/site-packages/esmvalcore/cmor/_fixes/native6/era5.py#line=423), in AllVars.fix_metadata(self, cubes)
    421     cube.standard_name = self.vardef.standard_name
    422 cube.long_name = self.vardef.long_name
--> 424 cube = self._fix_coordinates(cube)
    425 self._fix_units(cube)
    427 cube.data = cube.core_data().astype('float32')

File [/g/data/xp65/public/apps/med_conda/envs/esmvaltool-0.4/lib/python3.11/site-packages/esmvalcore/cmor/_fixes/native6/era5.py:389](https://are.nci.org.au/g/data/xp65/public/apps/med_conda/envs/esmvaltool-0.4/lib/python3.11/site-packages/esmvalcore/cmor/_fixes/native6/era5.py#line=388), in AllVars._fix_coordinates(self, cube)
    385     if (not coord.has_bounds() and len(coord.core_points()) > 1
    386             and coord_def.must_have_bounds == "yes"):
    387         coord.guess_bounds()
--> 389 self._fix_monthly_time_coord(cube)
    391 return cube

File [/g/data/xp65/public/apps/med_conda/envs/esmvaltool-0.4/lib/python3.11/site-packages/esmvalcore/cmor/_fixes/native6/era5.py:396](https://are.nci.org.au/g/data/xp65/public/apps/med_conda/envs/esmvaltool-0.4/lib/python3.11/site-packages/esmvalcore/cmor/_fixes/native6/era5.py#line=395), in AllVars._fix_monthly_time_coord(cube)
    393 @staticmethod
    394 def _fix_monthly_time_coord(cube):
    395     """Set the monthly time coordinates to the middle of the month."""
--> 396     if get_frequency(cube) == 'monthly':
    397         coord = cube.coord(axis='T')
    398         end = []

File [/g/data/xp65/public/apps/med_conda/envs/esmvaltool-0.4/lib/python3.11/site-packages/esmvalcore/cmor/_fixes/native6/era5.py:27](https://are.nci.org.au/g/data/xp65/public/apps/med_conda/envs/esmvaltool-0.4/lib/python3.11/site-packages/esmvalcore/cmor/_fixes/native6/era5.py#line=26), in get_frequency(cube)
     25 if len(time.points) == 1:
     26     if cube.long_name != 'Geopotential':
---> 27         raise ValueError('Unable to infer frequency of cube '
     28                          f'with length 1 time dimension: {cube}')
     29     return 'fx'
     31 interval = time.points[1] - time.points[0]

ValueError: Unable to infer frequency of cube with length 1 time dimension: air_temperature [/](https://are.nci.org.au/) (K)               (time: 1; latitude: 721; longitude: 1440)
    Dimension coordinates:
        time                             x            -               -
        latitude                         -            x               -
        longitude                        -            -               x
    Scalar coordinates:
        height                      2.0 m
    Attributes:
        Conventions                 'CF-1.6'
        license                     'Licence to use Copernicus Products: https://apps.ecmwf.int/datasets/li ...'
        source_file                 '/g/data/rt52/era5/single-levels/monthly-averaged/2t/1959/2t_era5_moda_ ...'
        summary                     'ERA5 is the fifth generation ECMWF atmospheric reanalysis of the global ...'
        title                       'ERA5 single-levels monthly-averaged 2m_temperature 19590101-19590131'

As per trace above, the issue arises in the following lines:

if len(time.points) == 1:
if cube.long_name != 'Geopotential':
raise ValueError('Unable to infer frequency of cube '
f'with length 1 time dimension: {cube}')
return 'fx'

At that time, each cube from each NetCDF file has a time dimension length of 1 which raises the issue.

An easy fix is to rewrite the get_frequency variable to make it a bit more flexible:

def get_frequency(cube):
    """Determine time frequency of input cube."""
    try:
        time = cube.coord(axis='T')
    except iris.exceptions.CoordinateNotFoundError:
        return 'fx'
    
    time.convert_units('days since 1850-1-1 00:00:00.0')
    
    if (len(time.points) == 1 and
        cube.long_name == 'Geopotential'):
        return 'fx'
    
    if len(time.points) > 1:
        interval = time.points[1] - time.points[0]
        if interval - 1 / 24 < 1e-4:
            return 'hourly'

    return 'monthly'

I will submit a PR to fix this.

@bouweandela
Copy link
Member

You might be able to use the frequency facet, which should be available in fixes, to figure out the frequency instead of looking at the data. That should avoid the problem that it's impossible compute an interval from a single time point.

@rbeucher
Copy link
Contributor Author

So you mean drop the get_frequency function entirely and rely on the frequency facet?

@bouweandela
Copy link
Member

Yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants