Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coordinate promotion workaround broken #6607

Closed
4 tasks done
aulemahal opened this issue May 13, 2022 · 4 comments · Fixed by #6999
Closed
4 tasks done

Coordinate promotion workaround broken #6607

aulemahal opened this issue May 13, 2022 · 4 comments · Fixed by #6999
Assignees
Labels

Comments

@aulemahal
Copy link
Contributor

aulemahal commented May 13, 2022

What happened?

Ok so this one is a bit weird. I'm not sure this is a bug, but code that worked before doesn't anymore, so it is some sort of regression.

I have a dataset with one dimension and one coordinate along that one, but they have different names. I want to transform this so that the coordinate name becomes the dimension name so it becomes are proper dimension-coordinate (I don't know how to call it). After renaming the dim to the coord's name, it all looks good in the repr, but the coord still is missing an index for that dimension (crd.indexes is empty, see MCVE). There was a workaround through reset_coords for this, but it doesn't work anymore.

Instead, the last line of the MCVE downgrades the variable, the final lon doesn't have coords anymore.

What did you expect to happen?

In the MCVE below, I show what the old "workaround" was. I expected lon.indexes to contain the indexes lon at the end of the procedure.

Minimal Complete Verifiable Example

import xarray as xr

# A dataset with a 1d variable along a dimension
ds = xr.Dataset({'lon': xr.DataArray([1, 2, 3], dims=('x',))})

# Promote to coord. This still is not a proper crd-dim (different name)
ds = ds.set_coords(['lon'])

# Rename dim:
ds = ds.rename(x='lon')

# Now do we have a proper coord-dim ? No. not yet because:
ds.indexes # is empty

# Workaround that was used up to the last release
lon = ds.lon.reset_coords(drop=True)

# Because of the missing indexes the next line fails on the master
lon - lon.diff('lon')

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

My guess is that this line is causing reset_coords to drop the coordinate from itself :

names = set(self.coords) - set(self._indexes)

It would be nice if the renaming was sufficient for the indexes to appear.

My example is weird I know. The real use case is a script where we receive a 2d coordinate but where all lines are the same, so we take the first line and promote it to a proper coord-dim. But the current code fails on the master on the lon - lon.diff('lon') step that happens afterwards.

Environment

INSTALLED VERSIONS

commit: None
python: 3.9.12 | packaged by conda-forge | (main, Mar 24 2022, 23:22:55)
[GCC 10.3.0]
python-bits: 64
OS: Linux
OS-release: 5.13.19-2-MANJARO
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: fr_CA.UTF-8
LOCALE: ('fr_CA', 'UTF-8')
libhdf5: None
libnetcdf: None

xarray: 2022.3.1.dev104+gc34ef8a6
pandas: 1.4.2
numpy: 1.22.2
scipy: 1.8.0
netCDF4: None
pydap: installed
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.5.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2022.02.1
distributed: 2022.2.1
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: 2022.3.0
cupy: None
pint: None
sparse: 0.13.0
setuptools: 59.8.0
pip: 22.0.3
conda: None
pytest: 7.0.1
IPython: 8.3.0
sphinx: None

@aulemahal aulemahal added bug needs triage Issue that has not been reviewed by xarray team member labels May 13, 2022
@keewis
Copy link
Collaborator

keewis commented May 13, 2022

this is a known issue, and one that we'd like to clean up (see #4825 for discussion). The short answer is that you should use swap_dims instead of rename:

ds.swap_dims({"x": "lon"})

@keewis keewis removed the needs triage Issue that has not been reviewed by xarray team member label May 13, 2022
@dcherian
Copy link
Contributor

@shoyer This was the regression I ran in to. We could raise an error asking the user to switch to swap_dims.

x is unindexed while lon is a coordinate variable. Then

ds = ds.rename(x='lon')

makes lon a dimension coordinate (though there is no entry in ._indexes)

image

@shoyer
Copy link
Member

shoyer commented May 14, 2022

We could raise an error asking the user to switch to swap_dims.

This seems like a good idea

In the long term, we like to decouple indexes from coordinate, and make something like the following work:

ds.set_coords(['lon']).rename(x='lon').set_index('lon')

@benbovy
Copy link
Member

benbovy commented May 14, 2022

We could raise an error asking the user to switch to swap_dims.

Shouldn't we raise a warning instead? There may be relevant use cases like the example above (at least in the long term) where an index is not really needed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants