-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lazy indexing arrays as a stand-alone package #5081
Comments
I'm going to say, the |
As a followup question, is the LazilyIndexedArray part of the 'public api'. That is when you do decide to refactor, Will you try to warn us users that choose to
|
Is Personally I would not want to guarantee external stability/availability for this API in its current state. |
It is recommended to use it for lazy backends though: https://docs.xarray.dev/en/stable/internals/how-to-add-new-backend.html#how-to-support-lazy-loading |
FYI there is interesting work going on in dask-land on "dask-expressions" https://github.com/dask-contrib/dask-expr, which is an experiment in doing high-level "query optimization". Computations get represented as a tree of expressions, which will be optimised in various ways before execution. It doesn't yet support arrays, only dataframes, but a similar effort for arrays would potentially be a generalization of the lazy-array package idea. cc @phofl who I understand is working on dask expressions for dataframes, with whom I chatted about this briefly at SciPy. |
we are discussing something related over in zarrland: zarr-developers/zarr-python#1603 |
There is another discussion about a lazy arrays package as an implementation of the array API standard in data-apis/array-api#777 |
From @rabernat on Twitter:
The idea here is create a first-class "duck array" library for lazy indexing that could replace xarray's internal classes for lazy indexing. This would be in some ways similar to dask.array, but much simpler, because it doesn't have to worry about parallel computing.
Desired features:
xarray.encoding
) andA common feature of these operations is they can (and almost always should) be fused with indexing: if N elements are selected via indexing, only O(N) compute and memory is required to produce them, regards of the size of the original arrays as long as the number of applied operations can be treated as a constant. Memory access is significantly slower than compute on modern hardware, so recomputing these operations on the fly is almost always a good idea.
Out of scope: lazy computation when indexing could require access to many more elements to compute the desired value than are returned. For example,
mean()
probably should not be lazy, because that could involve computation of a very large number of elements that one might want to cache.This is valuable functionality for Xarray for two reasons:
Related issues:
The text was updated successfully, but these errors were encountered: