-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement Zarr v3 spec support #6475
Conversation
Does Zarr v3 have a notion of a "root" group? That feels like a more sensible default to me, both for Xarray and Zarr-Python
This sounds fine for now, but I am concerned that it will slow the adoption of Zarr v3. Eventually, we would presumably want to change the default to version 3, but this is difficult to do if it entirely breaks backwards compatibility. My preference would be for the default behavior to try opening Zarr v2, and fall back to opening in v3 mode, even if this requires attempting to open a file from the store. This is similar to how Xarray handles other Zarr versioning issues (e.g., for consolidated metadata). Perhaps Zarr-Python could raise an informative error that we could catch if the Zarr version is incorrect, or even handle this behavior itself? |
I think we likely need to introduce a separate One issue with relying only on
We did define
Yeah, something like this seems feasible on the Zarr side for convenience routines like |
is there an issue on the Zarr side where this is currently being discussed? |
I opened up zarr-developers/zarr-python#1039 |
In this case where create_zarr_target returns a string, we must specify zarr_version=3 when opening/writing a store to make sure a version 3 store will be created rather than the default of a version 2 store.
remove path='xarray' default for zarr v3 path=None should work as of Zarr v2.13
for more information, see https://pre-commit.ci
sorry about the long delay here. This has been updated for the V3 store paths used in Zarr >v2.12 and to remove the need for specifying To do:
A separate issue is that consolidated metadata isn't in the core Zarr v3 spec, so we will need to have a Zarr Enhancement Proposal to formally define how the metadata should be stored. In the experimental API, it behaves as for v2 and is stored at |
Done. And should be out on conda-forge later today. |
I think it would be fine to disallow consolidated metadata for v3 until there is a spec in place. This is going to be experimental for some time so I don't see the harm in raising an error when |
@grlee77 - I'm curious if you are planning to return to this PR or if it would be helpful if someone brought it to completion? |
I am happy for someone to take over if possible. Thank you. |
@grlee77, @rabernat, @joshmoore, and others - I think this is ready to review and/or merge. The Zarr-V3 tests are active in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, @jhamman. 👍
RTD failure is real.
Otherwise, is this ready to merge? |
This is ready to merge once #7300 is in. |
* upstream/main: (39 commits) Support the new compression argument in netCDF4 > 1.6.0 (pydata#6981) Remove setuptools-scm-git-archive, require setuptools-scm>=7 (pydata#7253) Fix mypy failures (pydata#7343) Docs: add example of writing and reading groups to netcdf (pydata#7338) Reset file pointer to 0 when reading file stream (pydata#7304) Enable mypy warn unused ignores (pydata#7335) Optimize some copying (pydata#7209) Add parse_dims func (pydata#7051) Fix coordinate attr handling in `xr.where(..., keep_attrs=True)` (pydata#7229) Remove code used to support h5py<2.10.0 (pydata#7334) [pre-commit.ci] pre-commit autoupdate (pydata#7330) Fix PR number in what’s new (pydata#7331) Enable `origin` and `offset` arguments in `resample` (pydata#7284) fix doctests: supress urllib3 warning (pydata#7326) fix flake8 config (pydata#7321) implement Zarr v3 spec support (pydata#6475) Fix polyval overloads (pydata#7315) deprecate pynio backend (pydata#7301) mypy - Remove some ignored packages and modules (pydata#7319) Switch to T_DataArray in .coords (pydata#7285) ...
This is a WIP PR that is intended for use only with a development branch of Zarr (specifically zarr-developers/zarr-python#1006). I am using it to test the Zarr v3 spec support that is currently being added to
zarr-python
.The primary changes needed were:
open_group
oropen_consolidated
. This PR currently just sets a default group name of'xarray'
if one is not specified via thegroup
kwarg toZarrStore.open_group
. I think that is convenient, but one could instead be stricter and raise an error in this case.store
, then it is not possible to infer which version of the zarr spec is desired. In this case, the user must specifyzarr_version
to choose the zarr protocol version. The default ofzarr_version=None
will infer the version from a zarrBaseStore
subclass when possible, otherwise defaulting tozarr_version=2
for backwards compatibility.The good news is that these changes are quite small overall. Most changed lines in the tests involve optionally passing
zarr_version
around so that we could test v3 support both with an explicit DirectoryStoreV3 store as well as with string-based paths.Other points that need consideration in regards to the spec
dtype=str
is used in some tests. Currently zarr-python uses a numcodecs filter VLenUTF8 in this case. The core zarr v3 spec no longer has a'filter'
entry as part of the metadata. A zarr v3 protocol extension needs to be defined to specify how this should be implemented. We do support this filter even for zarr v3 arrays currently, but it is done in a hacky way that needs to be standardized. This is the cause of theTODO
comment around the call toattributes.pop('filters', None)
.cc @joshmoore, @rabernat, @MSanKeys963
whats-new.rst
api.rst