Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inconsistent bounds (lat_bnds etc) after subset operation #224

Closed
cehbrecht opened this issue Apr 22, 2022 · 7 comments
Closed

inconsistent bounds (lat_bnds etc) after subset operation #224

cehbrecht opened this issue Apr 22, 2022 · 7 comments
Assignees
Labels
bug Something isn't working

Comments

@cehbrecht
Copy link
Collaborator

  • clisops version: 0.9.0
  • Python version: >3.7
  • Operating System:

Description

The CDS team reported issues about inconsistent bounds (lat_bnds, ...) after using the subset operation:

The cdo sinfo command shows warnings on the subset output netcdf file:

Warning (cdf_set_var): Inconsistent variable definition for time_bnds!
Warning (cdf_set_var): Inconsistent variable definition for lat_bnds!
Warning (cdf_set_var): Inconsistent variable definition for lon_bnds!

The CDS users reported some tools have problems with these netcdf files ... like Panoply.

What I Did

I have prepared a notebook to reproduce this issue:
https://nbviewer.org/github/roocs/rooki/blob/master/notebooks/tests/test-c3s-cmip6-subset.ipynb

It runs the subset operation on a rook test instance with the latest clisops version 0.9.0.

It shows the cdo sinfo and ncdump -h outputs of the original cmip6 netcdf file, which looks fine.

On the subset output of the same netcdf file have the following issues:

  • cdo sinfo
$ cdo sinfo cmip6_subset.nc
Warning (cdf_set_var): Inconsistent variable definition for time_bnds!
Warning (cdf_set_var): Inconsistent variable definition for lat_bnds!
Warning (cdf_set_var): Inconsistent variable definition for lon_bnds!
  • ndump -h
$ ncdump -h cmip6_subset.nc
[..]

# bounds have unnecessary coordinate "height" 
double time_bnds(time, bnds) ;
    time_bnds:coordinates = "height" ;
double lat_bnds(lat, bnds) ;
    lat_bnds:coordinates = "height" ;
double lon_bnds(lon, bnds) ;
  lon_bnds:coordinates = "height" ;

# unnecessary FillValue for height not removed
double height ;
    height:_FillValue = NaN ;
@cehbrecht cehbrecht added the bug Something isn't working label Apr 22, 2022
@cehbrecht
Copy link
Collaborator Author

cehbrecht commented Apr 22, 2022

This issue is also related to #198. The FixValue issue was (partially) fixed already in our 0.9.0 release by PR #204

@cehbrecht
Copy link
Collaborator Author

The unnecessary coordinate at the bounds variables, like:

double lat_bnds(lat, bnds) ;
    lat_bnds:coordinates = "height" ;

... was already reported to xarray by @ellesmith88 :
pydata/xarray#5510

A workaround to get rid off these coordinates is provided in xarray:
pydata/xarray#5514

For example like this:

ds.lat_bnds.encoding["coordinates"] = None

@cehbrecht
Copy link
Collaborator Author

cehbrecht commented Apr 22, 2022

Workaround?

In the test notebook above I'm applying all mentioned workarounds on the xarray dataset:

ds.time.encoding["_FillValue"] = None
ds.lon.encoding["_FillValue"] = None
ds.lat.encoding["_FillValue"] = None
ds.height.encoding["_FillValue"] = None

ds.lat_bnds.encoding["_FillValue"] = None
ds.lat_bnds.encoding["coordinates"] = None

ds.lon_bnds.encoding["_FillValue"] = None
ds.lon_bnds.encoding["coordinates"] = None

ds.time_bnds.encoding["_FillValue"] = None
ds.time_bnds.encoding["coordinates"] = None

Then I write the dataset as netcdf file:

ds.to_netcdf("/tmp/out.nc")

Both cdo sinfo and ncdump -h seem to be happy with the new netcdf file.

@cehbrecht cehbrecht self-assigned this Apr 22, 2022
@cehbrecht
Copy link
Collaborator Author

@Zeitsperre @sol1105 @agstephens thoughts?

@cehbrecht
Copy link
Collaborator Author

workaround can be added like this (from PR #204):

def _remove_redundant_fill_values(self, ds):
"""
Get coordinate variables and remove fill values added by xarray (CF conventions say that coordinate variables cannot have missing values).
Get bounds variables and remove fill values added by xarray.
"""
if isinstance(ds, xr.Dataset):
main_var = get_main_variable(ds)
for coord_id in ds[main_var].coords:
# remove fill value from coordinate variables
if ds.coords[coord_id].dims == (coord_id,):
ds[coord_id].encoding["_FillValue"] = None
# remove fill value from bounds variables if they exist
try:
bnd = ds.cf.get_bounds(coord_id).name
ds[bnd].encoding["_FillValue"] = None
except KeyError:
continue
return ds

@agstephens
Copy link
Collaborator

@cehbrecht: It looks like using the example code above (from PR #204) is the best place to clean up the dataset. Hopefully, it will only involve adding a few extra lines of code.

@cehbrecht
Copy link
Collaborator Author

Fixed in clisops by #225. Works also now in daops and rook.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants