Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

decide on how to handle empty_like #7940

Open
hmaarrfk opened this issue Jun 25, 2023 · 8 comments
Open

decide on how to handle empty_like #7940

hmaarrfk opened this issue Jun 25, 2023 · 8 comments

Comments

@hmaarrfk
Copy link
Contributor

Is your feature request related to a problem?

calling np.empty_like seems to be instantiating the whole array.

from xarray.tests import InaccessibleArray
import xarray as xr
import numpy as np

array = InaccessibleArray(np.zeros((3, 3), dtype="uint8"))
da = xr.DataArray(array, dims=["x", "y"])

np.empty_like(da)
Traceback (most recent call last):
  File "/home/mark/t.py", line 8, in <module>
    np.empty_like(da)
  File "/home/mark/mambaforge/envs/dev/lib/python3.9/site-packages/xarray/core/common.py", line 165, in __array__
    return np.asarray(self.values, dtype=dtype)
  File "/home/mark/mambaforge/envs/dev/lib/python3.9/site-packages/xarray/core/dataarray.py", line 732, in values
    return self.variable.values
  File "/home/mark/mambaforge/envs/dev/lib/python3.9/site-packages/xarray/core/variable.py", line 614, in values
    return _as_array_or_item(self._data)
  File "/home/mark/mambaforge/envs/dev/lib/python3.9/site-packages/xarray/core/variable.py", line 314, in _as_array_or_item
    data = np.asarray(data)
  File "/home/mark/mambaforge/envs/dev/lib/python3.9/site-packages/xarray/tests/__init__.py", line 151, in __array__
    raise UnexpectedDataAccess("Tried accessing data")
xarray.tests.UnexpectedDataAccess: Tried accessing data

Describe the solution you'd like

I'm not too sure. This is why I raised this as a "feature" and not a bug.

On one hand, it is pretty hard to "get" the underlying class.

Is it a:

  • numpy array
  • a lazy thing that looks like a numpy array?
  • a dask array when it is dask?

I think that there are also some nuances between:

  1. Loading an nc file from a file (where things might be handled by dask even though you don't want them to be)
  2. Creating your xarray from in memory.

Describe alternatives you've considered

for now, i'm trying to avoid empty_like or zeros_like.

In general, we haven't seen much benefit from dask and cuda still needs careful memory management.

Additional context

No response

@headtr1ck
Copy link
Collaborator

headtr1ck commented Jun 25, 2023

Edit: this comment was nonsense

@hmaarrfk
Copy link
Contributor Author

a little more context.

For some "slow to load" datasets, this can accidentally load the whole thing when one isn't ready for it.

@keewis
Copy link
Collaborator

keewis commented Jun 25, 2023

I might be wrong, but I think those are not ufuncs, and since xarray doesn't implement __array_function__ np.empty_like(da) results in something similar to np.empty_like(np.asarray(da)).

However, xr.zeros_like shows the same behavior, so there might be something else going on.

Edit: actually, does InaccessibleArray implement __array_function__ for np.zeros_like? If not, xr.zeros_like dispatches to np.zeros_like(inaccesible), which again results in something similar to np.zeros_like(np.asarray(inaccessible))

@hmaarrfk
Copy link
Contributor Author

@keewis
Copy link
Collaborator

keewis commented Jun 25, 2023

zeros_like is essentially empty_like

Note that there's a difference between xr.zeros_like and np.zeros_like.

Anyways, there might be something wrong with the way InaccessibleArray is implemented, at least for this use case: np.zeros_like(array) doesn't work (or really, any np.*_like), so np.zeros_like(da) doesn't work, either.

@hmaarrfk
Copy link
Contributor Author

Note that there's a difference between xr.zeros_like and np.zeros_like.

Ahhhh! thanks

@keewis
Copy link
Collaborator

keewis commented Jul 5, 2023

it seems to get this to work we would need to:

  • replace .data with ._data (or at least the ones in the is_chunked_array and np.full_like calls) in

    xarray/xarray/core/common.py

    Lines 1660 to 1691 in 0de7761

    from xarray.core.variable import Variable
    if fill_value is dtypes.NA:
    fill_value = dtypes.get_fill_value(dtype if dtype is not None else other.dtype)
    if (
    is_chunked_array(other.data)
    or chunked_array_type is not None
    or chunks is not None
    ):
    if chunked_array_type is None:
    chunkmanager = get_chunked_array_type(other.data)
    else:
    chunkmanager = guess_chunkmanager(chunked_array_type)
    if dtype is None:
    dtype = other.dtype
    if from_array_kwargs is None:
    from_array_kwargs = {}
    data = chunkmanager.array_api.full(
    other.shape,
    fill_value,
    dtype=dtype,
    chunks=chunks if chunks else other.data.chunks,
    **from_array_kwargs,
    )
    else:
    data = np.full_like(other.data, fill_value, dtype=dtype)
    return Variable(dims=other.dims, data=data, attrs=other.attrs)
  • use a test class that overrides np.full_like (obviously the implementation is only sufficient for this test):
def full_like(array, fill_value, *, dtype=None):
    if dtype is None:
        dtype = array.dtype
    return np.full(shape=array.shape, dtype=dtype, fill_value=fill_value)

functions = {np.full_like: full_like}

class CustomArray(InaccessibleArray):
    def __array_function__(self, func, types, args, kwargs):
        return functions[func](*args, **kwargs)

With that, a modified version of the code would succeed without accessing the actual data raising:

import xarray as xr
import numpy as np

array = CustomArray(np.zeros((3, 3), dtype="uint8"))
da = xr.DataArray(array, dims=["x", "y"])

xr.zeros_like(da)

@keewis
Copy link
Collaborator

keewis commented Jul 5, 2023

I didn't check if modifying _full_like_variable as proposed doesn't break anything else, though

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants