Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decode CF bounds to coords #3689

Closed
rabernat opened this issue Jan 12, 2020 · 5 comments
Closed

Decode CF bounds to coords #3689

rabernat opened this issue Jan 12, 2020 · 5 comments

Comments

@rabernat
Copy link
Contributor

CF conventions define Cell Boundaries and specify how to encode the presence of cell boundary variables in dataset attributes.

To represent cells we add the attribute bounds to the appropriate coordinate variable(s). The value of bounds is the name of the variable that contains the vertices of the cell boundaries.

For example consider this dataset: http://esgf-data.ucar.edu/thredds/dodsC/esg_dataroot/CMIP6/CMIP/NCAR/CESM2/historical/r10i1p1f1/Amon/tas/gn/v20190313/tas_Amon_CESM2_historical_r10i1p1f1_gn_200001-201412.nc

url = 'http://esgf-data.ucar.edu/thredds/dodsC/esg_dataroot/CMIP6/CMIP/NCAR/CESM2/historical/r10i1p1f1/Amon/tas/gn/v20190313/tas_Amon_CESM2_historical_r10i1p1f1_gn_200001-201412.nc'
ds = xr.open_dataset(url)
ds

gives

<xarray.Dataset>
Dimensions:    (lat: 192, lon: 288, nbnd: 2, time: 180)
Coordinates:
  * lat        (lat) float64 -90.0 -89.06 -88.12 -87.17 ... 88.12 89.06 90.0
  * lon        (lon) float64 0.0 1.25 2.5 3.75 5.0 ... 355.0 356.2 357.5 358.8
  * time       (time) object 2000-01-15 12:00:00 ... 2014-12-15 12:00:00
Dimensions without coordinates: nbnd
Data variables:
    time_bnds  (time, nbnd) object ...
    lat_bnds   (lat, nbnd) float64 ...
    lon_bnds   (lon, nbnd) float64 ...
    tas        (time, lat, lon) float32 ...

Despite the presence of the bounds attributes

>>> print(ds.time.bounds, ds.lat.bounds, ds.lon.bounds)
time_bnds lat_bnds lon_bnds

The variables time_bnds, lat_bnds, and lon_bnds are not decoded as coordinates but as data variables. I believe that this is not in accordance with CF conventions.

Instead, we should decode all bounds variables to coordinates.

I cannot think of a single use case where one would want to treat these variables as data variables rather than coordinates. It would be easy to implement, but it is a breaking change.

Not that this is just a proposal to move bounds variables to the coords part of the dataset. It does not address the more difficult / complex question of how to actually use the bounds for indexing or plotting operations (see e.g. #1475, #1613), although it could be a first step in that direction.

Full ncdump of dataset

xarray.Dataset {
dimensions:
	lat = 192 ;
	lon = 288 ;
	nbnd = 2 ;
	time = 180 ;

variables:
	float64 lat(lat) ;
		lat:axis = Y ;
		lat:bounds = lat_bnds ;
		lat:standard_name = latitude ;
		lat:title = Latitude ;
		lat:type = double ;
		lat:units = degrees_north ;
		lat:valid_max = 90.0 ;
		lat:valid_min = -90.0 ;
		lat:_ChunkSizes = 192 ;
	float64 lon(lon) ;
		lon:axis = X ;
		lon:bounds = lon_bnds ;
		lon:standard_name = longitude ;
		lon:title = Longitude ;
		lon:type = double ;
		lon:units = degrees_east ;
		lon:valid_max = 360.0 ;
		lon:valid_min = 0.0 ;
		lon:_ChunkSizes = 288 ;
	object time(time) ;
		time:axis = T ;
		time:bounds = time_bnds ;
		time:standard_name = time ;
		time:title = time ;
		time:type = double ;
		time:_ChunkSizes = 512 ;
	object time_bnds(time, nbnd) ;
		time_bnds:_ChunkSizes = [1 2] ;
	float64 lat_bnds(lat, nbnd) ;
		lat_bnds:units = degrees_north ;
		lat_bnds:_ChunkSizes = [192   2] ;
	float64 lon_bnds(lon, nbnd) ;
		lon_bnds:units = degrees_east ;
		lon_bnds:_ChunkSizes = [288   2] ;
	float32 tas(time, lat, lon) ;
		tas:cell_measures = area: areacella ;
		tas:cell_methods = area: time: mean ;
		tas:comment = near-surface (usually, 2 meter) air temperature ;
		tas:description = near-surface (usually, 2 meter) air temperature ;
		tas:frequency = mon ;
		tas:id = tas ;
		tas:long_name = Near-Surface Air Temperature ;
		tas:mipTable = Amon ;
		tas:out_name = tas ;
		tas:prov = Amon ((isd.003)) ;
		tas:realm = atmos ;
		tas:standard_name = air_temperature ;
		tas:time = time ;
		tas:time_label = time-mean ;
		tas:time_title = Temporal mean ;
		tas:title = Near-Surface Air Temperature ;
		tas:type = real ;
		tas:units = K ;
		tas:variable_id = tas ;
		tas:_ChunkSizes = [  1 192 288] ;

// global attributes:
	:Conventions = CF-1.7 CMIP-6.2 ;
... [truncated]

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.7 | packaged by conda-forge | (default, Jul 2 2019, 02:07:37) [GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] python-bits: 64 OS: Darwin OS-release: 16.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.5 libnetcdf: 4.6.2

xarray: 0.14.0+19.gba48fbcd
pandas: 0.25.1
numpy: 1.17.2
scipy: 1.3.1
netCDF4: 1.5.1.2
pydap: None
h5netcdf: 0.7.4
h5py: 2.10.0
Nio: None
zarr: 2.3.2
cftime: 1.0.3.4
nc_time_axis: 1.2.0
PseudoNetCDF: None
rasterio: None
cfgrib: 0.9.7.1
iris: None
bottleneck: 1.2.1
dask: 2.4.0
distributed: 2.4.0
matplotlib: 3.1.1
cartopy: 0.17.0
seaborn: 0.9.0
numbagg: None
setuptools: 41.2.0
pip: 19.2.3
conda: None
pytest: 5.1.2
IPython: 7.8.0
sphinx: 1.6.5

@dcherian
Copy link
Contributor

Yes, we should move forward with #2844

@rabernat
Copy link
Contributor Author

Ak I had no idea there was even an open PR for this! Sorry!

@dcherian
Copy link
Contributor

dcherian commented Jan 13, 2020

@rabernat it would be helpful if you could comment there (#2844) on where we should store the coordinates and bounds attributes.

@DWesl
Copy link
Contributor

DWesl commented Feb 18, 2020

bounds and grid_mapping?

@dcherian
Copy link
Contributor

Closed by #2844

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants