-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
map_blocks: Allow passing dask-backed objects in args #3818
Conversation
17b9936
to
b962053
Compare
I've started testing this out and have run into one problem. Here's a simple example that uses template and Dask backed DataArrays in ds = xr.tutorial.load_dataset('air_temperature')
def func(X, y):
''' a simple reduction (assume the output can't be inferred automatically) '''
return X.sum('time') + y.min('time')
ds = ds.chunk({'lat': 10, 'lon': 10})
X = ds['air']
y = ds['air'] **2
template = X.sum('time')
expected = func(X, y)
actual = xr.map_blocks(func, X, args=[y], template=template)
xr.testing.assert_identical(actual, expected) This raises: ---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-9-5491345c8970> in <module>
5
6 expected = func(X, y)
----> 7 actual = xr.map_blocks(func, X, args=[y], template=template)
8 xr.testing.assert_identical(actual, expected)
/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/core/parallel.py in map_blocks(func, obj, args, kwargs, template)
424 # even if length of dimension is changed by the applied function
425 expected["shapes"] = {
--> 426 k: output_chunks[k][v] for k, v in input_chunk_index.items()
427 }
428 expected["data_vars"] = set(template.data_vars.keys()) # type: ignore
/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/core/parallel.py in <dictcomp>(.0)
424 # even if length of dimension is changed by the applied function
425 expected["shapes"] = {
--> 426 k: output_chunks[k][v] for k, v in input_chunk_index.items()
427 }
428 expected["data_vars"] = set(template.data_vars.keys()) # type: ignore
KeyError: 'time' |
b962053
to
e99033e
Compare
I fixed that on the template branch. After a rebase, this example works with |
e99033e
to
a6838f8
Compare
aligned = align(*npargs[is_xarray], join="left") | ||
# assigning to object arrays works better when RHS is object array | ||
# https://stackoverflow.com/questions/43645135/boolean-indexing-assignment-of-a-numpy-array-to-a-numpy-array | ||
npargs[is_xarray] = to_object_array(aligned) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a better way to do this assignment?
np.array(args)
ends up computing things.
@dcherian - can you resolve conflicts here? |
Switch to use IndexVariables instead of Indexes so that attrs are preserved.
need a solution to preserve index attrs
a11bf48
to
04ffa6c
Compare
else: | ||
dataset = obj | ||
input_is_array = False | ||
npargs = to_object_array([obj] + list(args)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
converting to object array so that we can use boolean indexing to pull out xarray objects
indexes should just have indexes for output variable. When template was provided, I was initializing to indexes to contain all input indexes. It should just have the indexes from template. Otherwise indexes for any indexed dimensions removed by func will still be propagated.
This is ready for review. I've minimized the diff. Once this is merged, I'll do some refactoring. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dcherian, this is quite close. Would love to get either @shoyer or @TomAugspurger to look this over but everything seems good to me. Just a series of small comments.
Things seem to have died down here. I suggest we merge this as is. As a reminder, the |
+1
…On Tue, Jun 2, 2020 at 11:44 AM Joe Hamman ***@***.***> wrote:
Things seem to have died down here. I suggest we merge this as is. As a
reminder, the map_blocks function is still marked as an experimental
feature so I'm not too concerned about breaking things in the wild. Better
to get some early feedback and iterate on the design.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3818 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJJFVVSE6VIHAUV4IUX6H3RUVCARANCNFSM4K7UHXFA>
.
|
…o-combine * 'master' of github.com:pydata/xarray: (81 commits) use builtin python types instead of the numpy alias (pydata#4170) Revise pull request template (pydata#4039) pint support for Dataset (pydata#3975) drop eccodes in docs (pydata#4162) Update issue templates inspired/based on dask (pydata#4154) Fix failing upstream-dev build & remove docs build (pydata#4160) Improve typehints of xr.Dataset.__getitem__ (pydata#4144) provide a error summary for assert_allclose (pydata#3847) built-in accessor documentation (pydata#3988) Recommend installing cftime when time decoding fails. (pydata#4134) parameter documentation for DataArray.sel (pydata#4150) speed up map_blocks (pydata#4149) Remove outdated note from datetime accessor docstring (pydata#4148) Fix the upstream-dev pandas build failure (pydata#4138) map_blocks: Allow passing dask-backed objects in args (pydata#3818) keep attrs in reset_index (pydata#4103) Fix open_rasterio() for WarpedVRT with specified src_crs (pydata#4104) Allow non-unique and non-monotonic coordinates in get_clean_interp_index and polyfit (pydata#4099) update numpy's intersphinx url (pydata#4117) xr.infer_freq (pydata#4033) ...
isort -rc . && black . && mypy . && flake8
whats-new.rst
for all changes andapi.rst
for new APIIt parses
args
and breaks any xarray objects into appropriate blocks before passing them on to the user function.e.g.