Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Dataset.where(x, drop=True) behaves inconsistent #6227

Closed
headtr1ck opened this issue Feb 1, 2022 · 0 comments · Fixed by #6690
Closed

[Bug]: Dataset.where(x, drop=True) behaves inconsistent #6227

headtr1ck opened this issue Feb 1, 2022 · 0 comments · Fixed by #6690
Labels

Comments

@headtr1ck
Copy link
Collaborator

What happened?

I tried to reduce some dimensions using where (sel did not work in this case) and shorten the dimensions with "drop=True".
This works fine on DataArrays and Datasets with only a single dimension but fails as soon as you have a Dataset with two dimensions on different variables.
The dimensions are left untouched and you have NaNs in the data, just as if you were using "drop=False" (see example).

I am actually not sure what the expected behavior is, maybe I am wrong and it is correct due to some broadcasting rules?

What did you expect to happen?

I expected that relevant dims are shortened.
If the ds.where with "drop=False" all variables along a dimenions have some NaNs, then using "drop=True" I expect these dimensions to be shortened and the NaNs removed.

Minimal Complete Verifiable Example

import xarray as xr

# this works
ds = xr.Dataset({"a": ("x", [1, 2 ,3])})
ds.where(ds > 2, drop=True)

# returns:
# <xarray.Dataset>
# Dimensions:  (x: 1)
# Dimensions without coordinates: x
# Data variables:
#     a        (x) float64 3.0

# this doesn't
ds = xr.Dataset({"a": ("x", [1, 2 ,3]), "b": ("y", [2, 3, 4])})
ds.where(ds > 2, drop=True)

# returns:
# <xarray.Dataset>
# Dimensions:  (x: 3, y: 3)
# Dimensions without coordinates: x, y
# Data variables:
#     a        (x) float64 nan nan 3.0
#     b        (y) float64 nan 3.0 4.0

Relevant log output

No response

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS

commit: None
python: 3.9.1 (default, Jan 13 2021, 15:21:08)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-44)]
python-bits: 64
OS: Linux
OS-release: 3.10.0-1160.49.1.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.0
libnetcdf: 4.7.4

xarray: 0.20.2
pandas: 1.3.5
numpy: 1.21.5
scipy: 1.7.3
netCDF4: 1.5.8
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.5.1.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: 3.5.1
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
setuptools: 49.2.1
pip: 22.0.2
conda: None
pytest: 6.2.5
IPython: 8.0.0
sphinx: None

@headtr1ck headtr1ck added bug needs triage Issue that has not been reviewed by xarray team member labels Feb 1, 2022
@dcherian dcherian removed the needs triage Issue that has not been reviewed by xarray team member label Apr 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants