-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enriching datacube STAC items with more array metadata #18
Comments
Thanks. Having the all the information needed to construct a Dataset from a STAC item / list of items would be great. Some comments on the proposed fields:
|
Maybe I was trying to avoid a possible ambiguity regarding which |
Hello, I am interested in expanding the datacube STAC extension to support more multidimensional array metadata for assets, particularly array metadata found in NetCDF, HDF5, and GRIB2 files. I think I'm caught up on the great discussions of the past:
And the STAC catalog items that I've been working with are all hosted on Microsoft's Planetary Computer platform, specifically:
For context, my goal is to be one day be able to do something like this with Xarray:
In that example, assume that STAC items returned in the search contain
assets
which are the files themselves. I don't want to actually read the asset, I want the STAC item to contain enough information to create a manipulable dataset that Xarray understands. Reading comes after searching, merging, filtering, and projecting away the variables I'm not interested in.This proposal is heavily based on ZarrV3 though I believe any multidimensional array handling system will care to know the same information.
I propose the following additional properties on only
cube:variables
:data_type
string
numpy
parseable datatypechunk_shape
[number]
fill_value
number|string|null
dimensions
[string]
cube:dimensions
that index this variable. If not set, all dimensions index this variable. This may happen with single GRIB2 files that contain multiple datacubes.codecs
[object]
A new property that applies to either
cube:variables
orcube:dimensions
:attrs
object
In the previous discussion on this topic #8 , a suggestion was made to use the files extension to store chunk metadata, but I don't think that extension is appropriate for this purpose. Similarly, I don't think the Bands RFC radiantearth/stac-spec#1254 addresses this problem, it is solving something entirely different.
CC @TomAugspurger we can handle chunk manifests later, they are ultimately just assets. Similarly, coordinate transforms are separate and probably better to wait for GeoZarr to standardize.
I'd like to know your thoughts on this proposal, or if perhaps this something worth putting into a hypothetical Zarr extension instead. IMO, I think the only thing that is very Zarr specific is the
codecs
property, everything else is very mappable with the underlying source files (even then, the files themselves define codecs too though they may not call them that).The text was updated successfully, but these errors were encountered: