You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Users come to Zarr with a variety of array-like objects -- numpy arrays, or dask arrays, or xarray DataArrays, zarr v2 arrays, zarr v3 arrays, etc. Imagine a venn diagram of attributes / methods for these objects: shape, __getitem__, dtype would be in the shared middle, and chunks, chunksize, attrs, dims, codecs, filters in the disjoint periphery. How can we conveniently model an arbitrary array-like object as a Zarr array? In particular, how can we ensure that you can create a complete Zarr array from an existing array-like object (which might be already a zarr array) with a single function call?
If we agree on that objective, then here is a rough outline of what that function could look like:
we should have a top-level from_array method that creates a Zarr array from an existing array-like object.
# numpynp_arr=np.zeros(10)
zarr.from_array(np_arr) # memorystore-backed zarr v3 array with shape 10 and dtype float64, and default parameters for everything elsezarr.from_array(np_arr, zarr_format=2, compressor=Gzip(), attributes={'foo': 10}) # same as above, but v2, with gzip, and attributes# daskda_arr=da.zeros((10,), chunks=(1,))
zarr.from_array(da_arr) # inherits the `chunks` attribute from the arrayzarr.from_array(da_arr, chunking_bikeshed=(2,)) # overrides the chunks attribute, kwarg name tbd 🙃 # xarrayxr_arr=xarray.DataArray(np.zeros(10), attrs={'foo': 10}, dims=('dim_0',))
zarr.from_array(xr_arr) # zarr v3 array with dimension names inherited xr_arr.dims, attrs from xr_arr.attrs)# zarrzarr.from_array(zarr.zeros(10)) # makes a copy of the array
some open questions:
should we copy data? over in pydantic-zarr I implemented a from_array function that only creates array metadata, because users might not want to eagerly move 10 TB of data at array definition time. Perhaps this could be controlled with a keyword argument.
should we support creating v2 arrays through this API, or use a v2.from_array function for that? I'm fine either way.
How much work is required to implicitly model the different array-like libraries enough for the above functionality to be useful?
There is a similar question about zarr groups, but the set of "zarr-group-like objects" is a bit narrower than array-likes.
Thoughts?
The text was updated successfully, but these errors were encountered:
Big supporter here @d-v-b. Actually the main reason I use zarr-python more heavily than tensorstore is having to specify a spec in tensorstore 😂 🦥. Convenience functions with sane defaults are very important!
Users come to Zarr with a variety of array-like objects -- numpy arrays, or dask arrays, or xarray DataArrays, zarr v2 arrays, zarr v3 arrays, etc. Imagine a venn diagram of attributes / methods for these objects:
shape
,__getitem__
,dtype
would be in the shared middle, andchunks
,chunksize
,attrs
,dims
,codecs
,filters
in the disjoint periphery. How can we conveniently model an arbitrary array-like object as a Zarr array? In particular, how can we ensure that you can create a complete Zarr array from an existing array-like object (which might be already a zarr array) with a single function call?If we agree on that objective, then here is a rough outline of what that function could look like:
from_array
method that creates a Zarr array from an existing array-like object.some open questions:
pydantic-zarr
I implemented afrom_array
function that only creates array metadata, because users might not want to eagerly move 10 TB of data at array definition time. Perhaps this could be controlled with a keyword argument.v2.from_array
function for that? I'm fine either way.Thoughts?
The text was updated successfully, but these errors were encountered: