Add map_overlap as a new core op #199

TomNicholas · 2023-06-02T13:06:26Z

It would be nice to add map_overlap alongside map_blocks, blockwise, rechunk, and apply_gufunc.

It's currently not directly used within xarray (even within xarray.map_blocks, which builds a HLG), but maybe it could / should be used there (cc @dcherian)

Regardless I think it should be on the wishlist as it is used in some other packages. For example xgcm.apply_as_grid_ufunc uses a pattern where dask.array.map_overlap is called from within the function supplied to xarray.apply_ufunc(), (it wraps the actual numpy function, so the kwarg is dask='allowed'). This is a trick that allows parallelism along all dimensions (both core dims and broadcast dims) for a large class of array algorithms of interest (e.g. differential functions).

dask.map_overlap is mostly implemented using map_blocks.

The text was updated successfully, but these errors were encountered:

tomwhite · 2023-09-12T10:33:06Z

I think map_overlap could be implemented using Cubed's map_direct, which allows you to read arbitrary parts of Zarr arrays (it's used for indexing and concatenation already for example).

tomwhite · 2023-09-26T15:58:09Z

I've started an implementation of map_overlap here: cd1a15c, and it seems to be fairly straightforward. It only supports constant boundary values, but it should be possible to implement some of the other cases too fairly easily.

The nice thing about using map_overlap is that the Cubed implementation is very efficient - essentially a single blockwise with no intermediate Zarr arrays. So for problems like pangeo-data/distributed-array-examples#1, which could use map_overlap to implement a derivative, this is very attractive. This is probably a better approach than using a combination of xp.diff and xp.pad (see #193) since Cubed would use several intermediate Zarr arrays, which would be very difficult to optimize.

dcherian · 2023-09-26T16:28:49Z

a combination of xp.diff and xp.pad since Cubed would use several intermediate Zarr arrays

This statement is confusing to me. Wouldn't diff just use map_overlap?

tomwhite · 2023-09-26T16:51:43Z

a combination of xp.diff and xp.pad since Cubed would use several intermediate Zarr arrays

This statement is confusing to me. Wouldn't diff just use map_overlap?

That's certainly one way of implementing it. I assumed that Xarray did this, but it looks like it uses indexing instead; see #193 (comment).

But my point was that it would be harder to have Cubed optimize a combination of diff and pad atomic operations, compared to the more efficient implementation of map_overlap.

TomNicholas added the enhancement New feature or request label Jun 2, 2023

tomwhite mentioned this issue May 16, 2024

Limited implementation of map_overlap #462

Merged

tomwhite closed this as completed in #462 May 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add map_overlap as a new core op #199

Add map_overlap as a new core op #199

TomNicholas commented Jun 2, 2023 •

edited

Loading

tomwhite commented Sep 12, 2023

tomwhite commented Sep 26, 2023

dcherian commented Sep 26, 2023

tomwhite commented Sep 26, 2023

Add map_overlap as a new core op #199

Add map_overlap as a new core op #199

Comments

TomNicholas commented Jun 2, 2023 • edited Loading

tomwhite commented Sep 12, 2023

tomwhite commented Sep 26, 2023

dcherian commented Sep 26, 2023

tomwhite commented Sep 26, 2023

TomNicholas commented Jun 2, 2023 •

edited

Loading