Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Currently no way to create a Coordinates object without indexes for 1D variables #8704

Closed
5 tasks done
TomNicholas opened this issue Feb 4, 2024 · 4 comments · Fixed by #8711
Closed
5 tasks done

Comments

@TomNicholas
Copy link
Member

TomNicholas commented Feb 4, 2024

What happened?

The workaround described in #8107 (comment) does not seem to work on main, meaning that I think there is currently no way to create an xr.Coordinates object without 1D variables being coerced to indexes. This means there is no way to create a Dataset object without 1D variables becoming IndexVariables being coerced to indexes.

What did you expect to happen?

I expected to at least be able to use the workaround described in #8107 (comment), i.e.

xr.Coordinates({'x': ('x', uarr)}, indexes={})

where uarr is an un-indexable array-like.

Minimal Complete Verifiable Example

class UnindexableArrayAPI:
    ...


class UnindexableArray:
    """
    Presents like an N-dimensional array but doesn't support changes of any kind, 
    nor can it be coerced into a np.ndarray or pd.Index.
    """
    
    _shape: tuple[int, ...]
    _dtype: np.dtype
    
    def __init__(self, shape: tuple[int, ...], dtype: np.dtype) -> None:
        self._shape = shape
        self._dtype = dtype
        self.__array_namespace__ = UnindexableArrayAPI

    @property
    def dtype(self) -> np.dtype:
        return self._dtype
    
    @property
    def shape(self) -> tuple[int, ...]:
        return self._shape
    
    @property
    def ndim(self) -> int:
        return len(self.shape)

    @property
    def size(self) -> int:
        return np.prod(self.shape)

    @property
    def T(self) -> Self:
        raise NotImplementedError()

    def __repr__(self) -> str:
        return f"UnindexableArray(shape={self.shape}, dtype={self.dtype})"

    def _repr_inline_(self, max_width):
        """
        Format to a single line with at most max_width characters. Used by xarray.
        """
        return self.__repr__()

    def __getitem__(self, key, /) -> Self:
        """
        Only supports extremely limited indexing.
        
        I only added this method because xarray will apparently attempt to index into its lazy indexing classes even if the operation would be a no-op anyway.
        """
        from xarray.core.indexing import BasicIndexer
        
        if isinstance(key, BasicIndexer) and key.tuple == ((slice(None),) * self.ndim):
            # no-op
            return self
        else:
            raise NotImplementedError()

    def __array__(self) -> np.ndarray:
        raise NotImplementedError("UnindexableArrays can't be converted into numpy arrays or pandas Index objects")
uarr = UnindexableArray(shape=(3,), dtype=np.dtype('int32'))

xr.Variable(data=uarr, dims=['x'])  # works fine

xr.Coordinates({'x': ('x', uarr)}, indexes={})  # works in xarray v2023.08.0

but in versions after that it triggers the NotImplementedError in __array__:

---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
Cell In[59], line 1
----> 1 xr.Coordinates({'x': ('x', uarr)}, indexes={})

File ~/Documents/Work/Code/xarray/xarray/core/coordinates.py:301, in Coordinates.__init__(self, coords, indexes)
    299 variables = {}
    300 for name, data in coords.items():
--> 301     var = as_variable(data, name=name)
    302     if var.dims == (name,) and indexes is None:
    303         index, index_vars = create_default_index_implicit(var, list(coords))

File ~/Documents/Work/Code/xarray/xarray/core/variable.py:159, in as_variable(obj, name)
    152     raise TypeError(
    153         f"Variable {name!r}: unable to convert object into a variable without an "
    154         f"explicit list of dimensions: {obj!r}"
    155     )
    157 if name is not None and name in obj.dims and obj.ndim == 1:
    158     # automatically convert the Variable into an Index
--> 159     obj = obj.to_index_variable()
    161 return obj

File ~/Documents/Work/Code/xarray/xarray/core/variable.py:572, in Variable.to_index_variable(self)
    570 def to_index_variable(self) -> IndexVariable:
    571     """Return this variable as an xarray.IndexVariable"""
--> 572     return IndexVariable(
    573         self._dims, self._data, self._attrs, encoding=self._encoding, fastpath=True
    574     )

File ~/Documents/Work/Code/xarray/xarray/core/variable.py:2642, in IndexVariable.__init__(self, dims, data, attrs, encoding, fastpath)
   2640 # Unlike in Variable, always eagerly load values into memory
   2641 if not isinstance(self._data, PandasIndexingAdapter):
-> 2642     self._data = PandasIndexingAdapter(self._data)

File ~/Documents/Work/Code/xarray/xarray/core/indexing.py:1481, in PandasIndexingAdapter.__init__(self, array, dtype)
   1478 def __init__(self, array: pd.Index, dtype: DTypeLike = None):
   1479     from xarray.core.indexes import safe_cast_to_index
-> 1481     self.array = safe_cast_to_index(array)
   1483     if dtype is None:
   1484         self._dtype = get_valid_numpy_dtype(array)

File ~/Documents/Work/Code/xarray/xarray/core/indexes.py:469, in safe_cast_to_index(array)
    459             emit_user_level_warning(
    460                 (
    461                     "`pandas.Index` does not support the `float16` dtype."
   (...)
    465                 category=DeprecationWarning,
    466             )
    467             kwargs["dtype"] = "float64"
--> 469     index = pd.Index(np.asarray(array), **kwargs)
    471 return _maybe_cast_to_cftimeindex(index)

Cell In[55], line 63, in UnindexableArray.__array__(self)
     62 def __array__(self) -> np.ndarray:
---> 63     raise NotImplementedError("UnindexableArrays can't be converted into numpy arrays or pandas Index objects")

NotImplementedError: UnindexableArrays can't be converted into numpy arrays or pandas Index objects

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

No response

Anything else we need to know?

Context is #8699

Environment

Versions described above

@TomNicholas TomNicholas added bug needs triage Issue that has not been reviewed by xarray team member topic-indexing and removed needs triage Issue that has not been reviewed by xarray team member labels Feb 4, 2024
@benbovy
Copy link
Member

benbovy commented Feb 5, 2024

Thanks @TomNicholas. There is a possible fix for this implemented in #8124. I've hit some general issues with the approach taken in that PR (described in the top comment) but I'm wondering if we couldn't cherry pick some of the commits there: 26fa16a and 7741b2a. I'm not sure this will just work, though.

As an (ugly) workaround, xarray Variable objects can be constructed manually and then passed to xarray.Coordinates._construct_direct().

@TomNicholas
Copy link
Member Author

Thanks @benbovy!

xarray Variable objects can be constructed manually and then passed to xarray.Coordinates._construct_direct().

That's useful, thanks.

I'm wondering if we couldn't cherry pick some of the commits there: 26fa16a and 7741b2a.

That would be great.

I'm wondering if the change inside xr.merge on this line might also fix the other problem I'm having with indexes being automatically created - see the final error in this notebook. (I can make a separate issue to track that error if that would be helpful)

@TomNicholas
Copy link
Member Author

TomNicholas commented Feb 5, 2024

I'm wondering if we couldn't cherry pick some of the commits there: 26fa16a and 7741b2a.

I'm not sure this will just work, though.

I tried this in #8711 but it did not just work (yet) - see comment there.

@benbovy
Copy link
Member

benbovy commented Feb 6, 2024

I'm wondering if the change inside xr.merge on this line might also fix the other problem I'm having with indexes being automatically created - see the final error in this notebook.

Yes I think it will also fix this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants