-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
map_blocks should dispatch to ChunkManager #8545
Comments
I think at present,
To go through the |
That makes a lot more sense, thanks @dcherian. I see now how this is more similar to xarray-beam's model.
I suspect we could imagine an alternative implementation of
What's the advantage to this over a FWIW I don't have particularly strong opinions on anything above, I'm just trying to give some food for thought 🙂 |
One, it requires that the function return arrays. It's nice to get away from the array abstraction for a lot of what map_blocks is used for. We are at present ~8 lines away from
dask isn't wrapping anything (it would if you were to implement as an object array of tuples of Datasets). It's executing tasks that create Xarray objects and passing them to the UDF, then taking the return value. The final step is when we extract dask arrays from the return value and construct the single Dataset output.
Accepting Datasets (or DataArrays) is the whole point of |
Fair point! That is much nicer from a user perspective.
Right yeah sure, I should have said executing instead of wrapping. My point is just that dask is being used in
Yeah it feels like this is missing from our API. But are there operations that could be expressed using |
Is there a PR for this? Rewriting |
Is your feature request related to a problem?
#7019 generalized most of xarrays internals to be able to use any chunked array type that we can create a
ChunkManagerEntrypoint
for. Most functions now go through this (e.g.apply_ufunc
), but I did not redirectxarray.map_blocks
to go throughChunkManagerEntrypoint
.This redirection works by dispatching to high-level dask.array primitives such as
dask.array.apply_gufunc
,dask.array.blockwise
, anddask.array.map_blocks
. However the current implementation ofxarray.map_blocks
is much lower-level, building a custom HLG, so it was not obvious how to swap it out.Describe the solution you'd like
I would like to either:
Replace the current internals of
xarray.map_blocks
with a simple call toChunkManagerEntrypoint.map_blocks
. This would be the cleanest separation of concerns we could do here. Presumably there is some obvious reason why this cannot or should not be done, but I have yet to understand what that reason is. (either @dcherian or @tomwhite can you enlighten me perhaps? 🙏)(More likely) refactor so that the existing guts of
xarray.map_blocks
are only called from theChunkManagerEntrypoint
, and a non-dask chunked array (i.e. cubed, but in theory other types too) would be able to specify how it wants to perform the map_blocks.Describe alternatives you've considered
Leaving it as the status quo breaks the nice abstraction and separation of concerns that #7019 introduced.
Additional context
Split off from #8414
The text was updated successfully, but these errors were encountered: