Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nan length coordinates #5169

Open
chrisroat opened this issue Apr 16, 2021 · 2 comments
Open

nan length coordinates #5169

chrisroat opened this issue Apr 16, 2021 · 2 comments

Comments

@chrisroat
Copy link
Contributor

Is your feature request related to a problem? Please describe.

When using arrays with a nan shape, I'd like to provide a coordinate specification from a delayed object, which is my responsibility to make sure has the right chunks and length.

Describe the solution you'd like

Below are three examples that I think should work, where I give a coordinate via a dask array, a dask series, and a dask index. Currently, all three examples error out by computing the coordinate length (triggering an unwanted dask computation!), and then indicating that its different than the array's corresponding length, which is nan.

import dask
import dask.array as da
import dask.dataframe as dd
import numpy as np
import pandas as pd
import xarray as xr

def foo():
    return np.arange(4)

arr = da.from_delayed(dask.delayed(foo)(), shape=(np.nan,), dtype=int)

idx = da.from_delayed(dask.delayed(foo)(), shape=(np.nan,), dtype=int)

ddf = dd.from_pandas(pd.DataFrame({'y': np.arange(4)}), npartitions=1)

arr0 = xr.DataArray(arr, coords=[('z', idx)])
arr1 = xr.DataArray(arr, coords=[('z', ddf['y'])])
arr2 = xr.DataArray(arr, coords=[('z', ddf.index)])

Error:

ValueError: conflicting sizes for dimension 'z': length nan on the data but length 4 on coordinate 'z'

Describe alternatives you've considered

After computations to complete add the missing coordinate. This requires carrying around the delayed index with the delayed array.

@mathause
Copy link
Collaborator

dask or lazy coordinates are currently not allowed in xarrays data model, so unfortunately this cannot work - even a standard dask.array.array is turned into a numpy array:

xr.DataArray([1, 2, 3], coords=[('z', da.array([1, 2, 3]))]).z.data

@dcherian
Copy link
Contributor

dcherian commented Apr 24, 2021

Thanks for the nice examples @chrisroat . Closing as duplicate of #2801

EDIT: Oops meant to close the other one.

@dcherian dcherian reopened this Apr 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants