You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Often with xarray I find myself having to create a template Dataset or DataArray with dummy data in it just to specify the dimensions/sizes/coordinates/variable names that are required in some situation.
Describe the solution you'd like
It would be very useful to have a datatype that represents a shape specification (dimensions, sizes and coordinates) independently of the data so that we can do things like:
Implement xarray equivalents of functions like np.ones, np.zeros, np.random.normal(size=...) that are given a shape specification which the return value should conform to. (I have some more sophisticated / less trivial examples of this too, functions which currently need to be given templates for the return value but only depend on the shape of the template)
Test if two DataArrays / Datasets have the same shape
Memoize or cache things based on shape (this implies the shape spec would need to be hashable)
Make it easier to use xarray with libraries like tree / PyTree that can be used to flatten and unflatten a Dataset into its underlying arrays together with some specification of the shape of the data structure that can be used to unflatten it back again. (Right now I have to implement my own shape specification objects to do this)
Manipulate shape specifications e.g. by adding or removing dimensions from them without having to manipulate dummy template data in slightly arbitrary ways (e.g. template.isel(dim_to_be_dropped=0, drop=True)) in order to do this.
Describe alternatives you've considered
I realise that using lazy dask arrays largely removes the performance overhead of manipulating fake data, but (A) it still feels kinda ugly and adds boilerplate to construct the fake data, and (B) not everyone wants to depend on dask.
Additional context
No response
The text was updated successfully, but these errors were encountered:
Thanks, that looks interesting, although sounds like it's addressing a slightly different problem; I'm not so much interested in validation of external inputs, more just in having some basic datatypes that can be used to specify dims/shape/coords as templates for DataArray / Datasets internally within my codebase.
Some things I'm looking for but don't appear to be supported:
Support for specifying coords
__hash__, __eq__ etc for the Schema objects
Convenient APIs to alter and combine these Schemas in similar ways to what can be done with DataArray / Datasets themselves, e.g. adding/removing dimensions, broadcasting against eachother etc -- perhaps mirroring APIs like expand_dims that can be used on DataArray / Dataset themself, to the extent this makes sense.
Feels to me that it would make sense to have these basic datatypes inside xarray, perhaps with something like xarray-schema providing extra validation helpers etc on top of them? But just my 2 cents :)
Is your feature request related to a problem?
Often with xarray I find myself having to create a template Dataset or DataArray with dummy data in it just to specify the dimensions/sizes/coordinates/variable names that are required in some situation.
Describe the solution you'd like
It would be very useful to have a datatype that represents a shape specification (dimensions, sizes and coordinates) independently of the data so that we can do things like:
np.ones, np.zeros, np.random.normal(size=...)
that are given a shape specification which the return value should conform to. (I have some more sophisticated / less trivial examples of this too, functions which currently need to be given templates for the return value but only depend on the shape of the template)template.isel(dim_to_be_dropped=0, drop=True)
) in order to do this.Describe alternatives you've considered
I realise that using lazy dask arrays largely removes the performance overhead of manipulating fake data, but (A) it still feels kinda ugly and adds boilerplate to construct the fake data, and (B) not everyone wants to depend on dask.
Additional context
No response
The text was updated successfully, but these errors were encountered: