Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

drop and other shouldn't be mutually exclusive in DataWithCoords.where #6466

Closed
delgadom opened this issue Apr 10, 2022 · 0 comments · Fixed by #6467
Closed

drop and other shouldn't be mutually exclusive in DataWithCoords.where #6466

delgadom opened this issue Apr 10, 2022 · 0 comments · Fixed by #6467

Comments

@delgadom
Copy link
Contributor

Is your feature request related to a problem?

xr.Dataset.where and xr.DataArray.where currently do not allow providing both other and drop=True, stating that they are mutually exclusive. Conceptually, this doesn't strike me as true. Drop does not drop all points where cond is False, only those where an entire coordinate label evaluates to False. This is most important when trying to avoid type promotion, such as in the below example, where an integer FillValue might be preferred over dtypes.NA.

In [2]: da = xr.DataArray(np.arange(16).reshape(4, 4), dims=['x', 'y'])

In [3]: da
Out[3]:
<xarray.DataArray (x: 4, y: 4)>
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])
Dimensions without coordinates: x, y

If other is not provided, the array is promoted to float64 to accommodate the default,dtypes.NA:

In [4]: da.where(da > 6, drop=True)
Out[4]:
<xarray.DataArray (x: 3, y: 4)>
array([[nan, nan, nan,  7.],
       [ 8.,  9., 10., 11.],
       [12., 13., 14., 15.]])
Dimensions without coordinates: x, y

However, the combination of other and drop=True is not allowed:

In [5]: da.where(da > 6, -1, drop=True)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [5], in <module>
----> 1 da.where(da > 6, -1, drop=True)

File ~/miniconda3/envs/rhodium-env/lib/python3.10/site-packages/xarray/core/common.py:1268, in DataWithCoords.where(self, cond, other, drop)
   1266 if drop:
   1267     if other is not dtypes.NA:
-> 1268         raise ValueError("cannot set `other` if drop=True")
   1270     if not isinstance(cond, (Dataset, DataArray)):
   1271         raise TypeError(
   1272             f"cond argument is {cond!r} but must be a {Dataset!r} or {DataArray!r}"
   1273         )

ValueError: cannot set `other` if drop=True

Describe the solution you'd like

Current implementation

The current behavior is enforced within the block handling the drop argument (currently https://github.com/pydata/xarray/blob/main/xarray/core/common.py#L1266-L1268):

        if drop:
            if other is not dtypes.NA:
                raise ValueError("cannot set `other` if drop=True")

Proposed fix

I just removed the above if statement on a fork, and the example now works!

>>> import xarray as xr, numpy as np
>>> da = xr.DataArray(np.arange(16).reshape(4, 4), dims=['x', 'y'])
>>> da
<xarray.DataArray (x: 4, y: 4)>
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])
Dimensions without coordinates: x, y
>>> da.where(da > 6, drop=True)
<xarray.DataArray (x: 3, y: 4)>
array([[nan, nan, nan,  7.],
       [ 8.,  9., 10., 11.],
       [12., 13., 14., 15.]])
Dimensions without coordinates: x, y
>>> da.where(da > 6, -1, drop=True)
<xarray.DataArray (x: 3, y: 4)>
array([[-1, -1, -1,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])
Dimensions without coordinates: x, y

Making this change fails a single test in https://github.com/pydata/xarray/blob/main/xarray/tests/test_dataset.py#L4548-L4549, which explicitly checks for this behavior:

        with pytest.raises(ValueError, match=r"cannot set"):
            ds.where(ds > 1, other=0, drop=True)

This could easily be reworked to check for valid handling of both arguments.

Describe alternatives you've considered

No response

Additional context

I haven't yet investigated what would happen with chunked, sparse, or other complex arrays, or if it's compatible with trees and other things on the roadmap. It's possible this breaks things I'm not imagining. Currently, where(cond, other) and where(cond, drop=True) are well-tested, flexible operations, and I don't see why allowing their union would break anything, but I'll wait to hear from the experts on that front!

I'm definitely open to creating a pull request (and have the simple implementation I've outlined here ready to go).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant