Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.coarsen() method for the xarray.Dataset removes its attributes. #3376

Closed
jejjohnson opened this issue Oct 7, 2019 · 4 comments · Fixed by #3801
Closed

.coarsen() method for the xarray.Dataset removes its attributes. #3376

jejjohnson opened this issue Oct 7, 2019 · 4 comments · Fixed by #3801
Labels

Comments

@jejjohnson
Copy link

Hello,

I am not sure if this is a bug or a feature but when one calls the xarray.coarsen() on a dataset, then the attributes get removed.

Dataset Example

import xarray as xr
import numpy as np

var1 = np.linspace(10, 15, 100)
var2 = np.linspace(5, 10, 100)
coords = np.linspace(1, 10, 100)

dat = xr.Dataset(
    data_vars={'var1': ('coord', var1), 'var2': ('coord', var2)}, 
    coords={'coord': coords}
)
dat.attrs['model_id'] = 'model1'

# coarsen dataset
dat = dat.coarsen(coord=5).mean()

# print dataset
dat

Actual Output

<xarray.Dataset>
Dimensions:  (coord: 20)
Coordinates:
  * coord    (coord) float64 1.182 1.636 2.091 2.545 ... 8.455 8.909 9.364 9.818
Data variables:
    var1     (coord) float64 10.1 10.35 10.61 10.86 ... 14.14 14.39 14.65 14.9
    var2     (coord) float64 5.101 5.354 5.606 5.859 ... 9.141 9.394 9.646 9.899

Expected Output

<xarray.Dataset>
Dimensions:  (coord: 20)
Coordinates:
  * coord    (coord) float64 1.182 1.636 2.091 2.545 ... 8.455 8.909 9.364 9.818
Data variables:
    var1     (coord) float64 10.1 10.35 10.61 10.86 ... 14.14 14.39 14.65 14.9
    var2     (coord) float64 5.101 5.354 5.606 5.859 ... 9.141 9.394 9.646 9.899
Attributes:
    model_id:  1

Problem Description

I believe the attributes should stay within the xarray.Dataset no matter what the operations that are done on it. Obviously maybe for some operations an entry like model_id could change because it's no longer the model. But I believe that should be left up to the user. Perhaps a warning in the docs might be sufficient. The behaviour isn't consistent with the xarray.coarsen() function on the xarray.DataArray example where the attributes remain the same (see details below).

DataArray Example

data = np.random.rand(50, 3)
locs = ['IA', 'IL', 'IN']
times = pd.date_range('2000-01-01', periods=50)

foo = xr.DataArray(data, coords=[times, locs], dims=['time', 'space'])
foo.attrs['data_id'] = 'data1'
foo

Expected/Actual Output

<xarray.DataArray (time: 10, space: 3)>
array([[0.3537571 , 0.50698482, 0.35923528],
       [0.62127828, 0.41852822, 0.5617278 ],
       [0.38669858, 0.60446037, 0.45699182],
       [0.41538186, 0.81251298, 0.3919821 ],
       [0.67914214, 0.45866817, 0.58625095],
       [0.63560785, 0.53796635, 0.48231731],
       [0.60802724, 0.54003065, 0.38456255],
       [0.46492592, 0.78542293, 0.50788668],
       [0.53757801, 0.56765902, 0.52288412],
       [0.51085502, 0.51448292, 0.67426125]])
Coordinates:
  * time     (time) datetime64[ns] 2000-01-03 2000-01-08 ... 2000-02-17
  * space    (space) <U2 'IA' 'IL' 'IN'
Attributes:
    data_id:  data1

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.7.4 (default, Aug 13 2019, 20:35:49) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-327.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: None.None libhdf5: 1.10.4 libnetcdf: 4.6.1

xarray: 0.13.0
pandas: 0.25.1
numpy: 1.17.2
scipy: 1.3.1
netCDF4: 1.4.2
pydap: None
h5netcdf: None
h5py: 2.9.0
Nio: None
zarr: 2.3.2
cftime: 1.0.3.4
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.0.21
cfgrib: None
iris: None
bottleneck: None
dask: 2.4.0
distributed: None
matplotlib: 3.1.1
cartopy: 0.17.0
seaborn: None
numbagg: None
setuptools: 41.2.0
pip: 19.2.3
conda: None
pytest: None
IPython: 7.8.0
sphinx: None

@dcherian
Copy link
Contributor

dcherian commented Oct 7, 2019

Have you tried xr.set_options(keep_attrs=True)?

The behaviour for Datasets and DataArrays should be consistent in any case. Would you like to try and fix it?

@jejjohnson
Copy link
Author

jejjohnson commented Oct 7, 2019

Hello,

Yes, even with the option xr.set_options(keep_attrs=True), I still encounter the attributes changed. The full code with the set options is below.

Inputs

import xarray as xr
import numpy as np

xr.set_options(keep_attrs=True)

import xarray as xr
import numpy as np

var1 = np.linspace(10, 15, 100)
var2 = np.linspace(5, 10, 100)
coords = np.linspace(1, 10, 100)

dat = xr.Dataset(
    data_vars={'var1': ('coord', var1), 'var2': ('coord', var2)}, 
    coords={'coord': coords}
)
dat.attrs['model_id'] = 'model1'

# coarsen dataset
dat = dat.coarsen(coord=5, keep_attrs=True).mean()

# print dataset
dat

Actual Output

<xarray.Dataset>
Dimensions:  (coord: 20)
Coordinates:
  * coord    (coord) float64 1.182 1.636 2.091 2.545 ... 8.455 8.909 9.364 9.818
Data variables:
    var1     (coord) float64 10.1 10.35 10.61 10.86 ... 14.14 14.39 14.65 14.9
    var2     (coord) float64 5.101 5.354 5.606 5.859 ... 9.141 9.394 9.646 9.899

Expected Output

<xarray.Dataset>
Dimensions:  (coord: 20)
Coordinates:
  * coord    (coord) float64 1.182 1.636 2.091 2.545 ... 8.455 8.909 9.364 9.818
Data variables:
    var1     (coord) float64 10.1 10.35 10.61 10.86 ... 14.14 14.39 14.65 14.9
    var2     (coord) float64 5.101 5.354 5.606 5.859 ... 9.141 9.394 9.646 9.899
Attributes:
    model_id:  model1

I'm not too familiar with the xarray package internals. But I will look to see if I can do my part and do a pull request. I'll get back to you after having a look at the internals.

@dcherian dcherian added the bug label Oct 7, 2019
@dcherian
Copy link
Contributor

dcherian commented Oct 7, 2019

Thanks for the clear report. I'll mark this as a bug then.

@TomNicholas
Copy link
Member

@jejjohnson that means coarsen must have been missed when applying the keep_attrs logic in #2482. If you have a look there it should hopefully be clear how to copy the same logic for coarsen.

amcnicho added a commit to amcnicho/xarray that referenced this issue Feb 27, 2020
amcnicho added a commit to amcnicho/xarray that referenced this issue Feb 27, 2020
amcnicho added a commit to amcnicho/xarray that referenced this issue Feb 27, 2020
amcnicho added a commit to amcnicho/xarray that referenced this issue Feb 27, 2020
@amcnicho amcnicho mentioned this issue Feb 27, 2020
4 tasks
max-sixty pushed a commit that referenced this issue Mar 2, 2020
* Add test of DataWithCoords.coarsen() for #3376

* Add test of Variable.coarsen() for #3376

* Add keep_attrs kwarg to DataWithCoords.coarsen() for #3376

* Style and spelling fixes (#3376)

* Fix test_coarsen_keep_attrs by removing self from input

* Pass keep_attrs through to _coarsen_cls and _rolling_cls returns (#3376)

* Move keyword from coarsen to mean in test_coarsen_keep_attrs

* Start handling keep_attrs in rolling class constructors (#3376)

* Update Coarsen constructor and DatasetCoarsen class method (GH3376)

Assign keep_attrs keyword value to Coarsen objects in constructor
Add conditional inside _reduce_method.wrapped_func branching on self.keep_attrs and pass back to returned Dataset

* Incorporate code review from @max-sixty

* Fix Dataset.coarsen and Variable.coarsen for GH3376

Handle global keep_attrs setting inside Variable._coarsen_reshape

Pass attrs through consistently inside DatasetCoarsen._reduce_method

Don't pass Variable.coarsen a keyword argument it doesn't expect inside DataArrayCoarsen._reduce_method

* Update tests for GH3376

* Incorporate review changes to test_dataset for GH3376

Remove commented-out test from test_coarsen_keep_attrs

Add test_rolling_keep_attrs

* Change Rolling._dataset_implementation for GH3376

Return a Dataset object that results in test_rolling_keep_attrs Passing

* style fixes

* Remove duplicate variable assignment and document change (GH3776)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants