Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep attributes across operations #2582

Closed
MBlaschek opened this issue Nov 29, 2018 · 6 comments
Closed

Keep attributes across operations #2582

MBlaschek opened this issue Nov 29, 2018 · 6 comments
Labels

Comments

@MBlaschek
Copy link
Contributor

The Problem

When I have two DataArrays and I use a standard operation ( +, - ,*, /) the attributes vanish. I think that should not be the case. Even when using as suggested the set_options

import numpy as np
import xarray as xr
a = xr.DataArray(np.random.randn(3,3), dims=('x','y'), name='temp', attrs={'units':'K'})
b = xr.DataArray(np.random.randn(3,3), dims=('x','y'), name='temp', attrs={'units':'K'})
print(a)
<xarray.DataArray 'temp' (x: 3, y: 3)>
array([[ 1.207407, -1.9429  ,  3.168454],
       [-0.773912, -0.121835, -0.139538],
       [ 1.823002,  0.185846,  0.53569 ]])
Dimensions without coordinates: x, y
Attributes:
    units:    K
print(a-b)
<xarray.DataArray 'temp' (x: 3, y: 3)>
array([[ 1.280892, -1.097781,  2.150318],
       [-0.208202, -0.03856 ,  0.805856],
       [ 2.192506,  1.049181,  2.277078]])
Dimensions without coordinates: x, y

with xr.set_options(keep_attrs=True):
    print(a-b)

<xarray.DataArray 'temp' (x: 3, y: 3)>
array([[ 1.280892, -1.097781,  2.150318],
       [-0.208202, -0.03856 ,  0.805856],
       [ 2.192506,  1.049181,  2.277078]])
Dimensions without coordinates: x, y

Problem description

Attributes vanish when a normal operation is applied!
From docs of set_options:
keep_attrs: rule for whether to keep attributes on xarray
Datasets/dataarrays after operations. Either True to always keep
attrs, False to always discard them, or 'default' to use original
logic that attrs should only be kept in unambiguous circumstances.
Default: 'default'.

Expected Output

The Attributes should remain. Maybe keep only attributes from the left Array ?
Please adjust or advise me.

Output of xr.show_versions()

`` xr.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 3.6.6.final.0 python-bits: 64 OS: Linux OS-release: 4.15.0-39-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

xarray: 0.11.0
pandas: 0.23.4
numpy: 1.15.4
scipy: 1.1.0
netCDF4: 1.4.2
h5netcdf: None
h5py: 2.8.0
Nio: None
zarr: None
cftime: 1.0.2.1
PseudonetCDF: None
rasterio: None
iris: None
bottleneck: 1.2.1
cyordereddict: None
dask: 0.20.2
distributed: 1.24.2
matplotlib: 3.0.1
cartopy: 0.16.0
seaborn: 0.9.0
setuptools: 40.6.2
pip: 18.1
conda: 4.5.11
pytest: 4.0.0
IPython: 7.1.1
sphinx: 1.8.2
``

@shoyer shoyer added the bug label Nov 29, 2018
@shoyer
Copy link
Member

shoyer commented Nov 29, 2018

Thanks for the report! It looks like we definitely overlooked this in arithmetic operations. I agree that keep_attrs=True should mean that attributes are maintained in arithmetic.

Any interest in putting together a PR?

@MBlaschek
Copy link
Contributor Author

Thanks for the quick reply.
Not sure what a PR is. (Sorry I'm not that advanced in coding)
I figure, from code you have been using at other places, something like that

@staticmethod
def _binary_op(f, reflexive=False, **ignored_kwargs):
    @functools.wraps(f)
    def func(self, other):
        if isinstance(other, (xr.DataArray, xr.Dataset)):
            return NotImplemented
        self_data, other_data, dims = _broadcast_compat_data(self, other)
        # Add Attributes here ?
        keep_attrs = _get_keep_attrs(default=False)
        attrs = self._attrs if keep_attrs else None
        
        with np.errstate(all='ignore'):
            new_data = (f(self_data, other_data)
                        if not reflexive
                        else f(other_data, self_data))
        result = Variable(dims, new_data, attrs=attrs)
        return result
    return func

should do the trick. Right.
I cloned the recent version and tried out the new code. It works! :)

xr.show_versions()

INSTALLED VERSIONS

commit: 0d6056e
python: 3.6.6 |Anaconda, Inc.| (default, Oct 9 2018, 12:34:16)
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 4.15.0-39-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.2
libnetcdf: 4.6.1

xarray: 0.11.0+10.g0d6056e8.dirty
pandas: 0.23.4
numpy: 1.15.4
scipy: 1.1.0
netCDF4: 1.4.2
pydap: None
h5netcdf: None
h5py: 2.8.0
Nio: None
zarr: None
cftime: 1.0.2.1
PseudonetCDF: None
rasterio: None
cfgrib: installed
iris: None
bottleneck: 1.2.1
cyordereddict: None
dask: 0.20.2
distributed: 1.24.2
matplotlib: 3.0.1
cartopy: 0.16.0
seaborn: 0.9.0
setuptools: 40.6.2
pip: 18.1
conda: 4.5.11
pytest: 4.0.0
IPython: 7.1.1
sphinx: 1.8.2

When the option is not set, same behavior as before

print(a-b)                                                           
<xarray.DataArray 'temp' (x: 3, y: 3)>
array([[ 0.133102, -1.275794,  1.331784],
       [ 0.995555, -0.509624,  0.188597],
       [ 1.922048, -0.053253, -0.293245]])
Dimensions without coordinates: x, y

set the option:

with xr.set_options(keep_attrs=True): 
     print(a-b) 
                                                                     
<xarray.DataArray 'temp' (x: 3, y: 3)>
array([[ 0.133102, -1.275794,  1.331784],
       [ 0.995555, -0.509624,  0.188597],
       [ 1.922048, -0.053253, -0.293245]])
Dimensions without coordinates: x, y
Attributes:
    units:    K

works. Hope that helps you.

@max-sixty
Copy link
Collaborator

Not sure what a PR is. (Sorry I'm not that advanced in coding)

PR is a pull-request! If you can open a PR with your code, we can merge it to the repo. Would be greatly appreciated from xarray, and you'd be an xarray contributor. Let us know if we can help guide you through the mechanics.

@dcherian
Copy link
Contributor

@MBlaschek This might help: https://help.github.com/articles/proposing-changes-to-your-work-with-pull-requests/ . You'd start by creating a fork, then a branch with your changes, push your changes to github and then initiate a pull request.

@dcherian
Copy link
Contributor

dcherian commented Dec 3, 2018

Hi @MBlaschek, almost there! You'll need to open your pull request in this repository :).

You'll also need to add some tests to make sure your changes keep working as the code is updated in the future. E.g.

def test_reduce_keep_attrs(self):

@MBlaschek
Copy link
Contributor Author

Hi.
Ok Sorry. Had no idea what I was doing. So I hope I fixed it, the way you wanted. I added a test-routine test_binary_ops_keep_attrs
Created a new pull request, as I could not reopen the old one

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants