Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add map_overlap as a new core op #199

Closed
TomNicholas opened this issue Jun 2, 2023 · 4 comments · Fixed by #462
Closed

Add map_overlap as a new core op #199

TomNicholas opened this issue Jun 2, 2023 · 4 comments · Fixed by #462
Labels
enhancement New feature or request

Comments

@TomNicholas
Copy link
Member

TomNicholas commented Jun 2, 2023

It would be nice to add map_overlap alongside map_blocks, blockwise, rechunk, and apply_gufunc.

It's currently not directly used within xarray (even within xarray.map_blocks, which builds a HLG), but maybe it could / should be used there (cc @dcherian)

Regardless I think it should be on the wishlist as it is used in some other packages. For example xgcm.apply_as_grid_ufunc uses a pattern where dask.array.map_overlap is called from within the function supplied to xarray.apply_ufunc(), (it wraps the actual numpy function, so the kwarg is dask='allowed'). This is a trick that allows parallelism along all dimensions (both core dims and broadcast dims) for a large class of array algorithms of interest (e.g. differential functions).

dask.map_overlap is mostly implemented using map_blocks.

@TomNicholas TomNicholas added the enhancement New feature or request label Jun 2, 2023
@tomwhite
Copy link
Member

I think map_overlap could be implemented using Cubed's map_direct, which allows you to read arbitrary parts of Zarr arrays (it's used for indexing and concatenation already for example).

@tomwhite
Copy link
Member

I've started an implementation of map_overlap here: cd1a15c, and it seems to be fairly straightforward. It only supports constant boundary values, but it should be possible to implement some of the other cases too fairly easily.

The nice thing about using map_overlap is that the Cubed implementation is very efficient - essentially a single blockwise with no intermediate Zarr arrays. So for problems like pangeo-data/distributed-array-examples#1, which could use map_overlap to implement a derivative, this is very attractive. This is probably a better approach than using a combination of xp.diff and xp.pad (see #193) since Cubed would use several intermediate Zarr arrays, which would be very difficult to optimize.

@dcherian
Copy link

a combination of xp.diff and xp.pad since Cubed would use several intermediate Zarr arrays

This statement is confusing to me. Wouldn't diff just use map_overlap?

@tomwhite
Copy link
Member

a combination of xp.diff and xp.pad since Cubed would use several intermediate Zarr arrays

This statement is confusing to me. Wouldn't diff just use map_overlap?

That's certainly one way of implementing it. I assumed that Xarray did this, but it looks like it uses indexing instead; see #193 (comment).

But my point was that it would be harder to have Cubed optimize a combination of diff and pad atomic operations, compared to the more efficient implementation of map_overlap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants