-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Zarr Python 3 tracking issue #9515
Comments
Testing this outThere multiple active branches right now, but you can get a usable xarray + zarr-python 3.x with these two branches:
You can install these with:
Work ItemsWe require some changes on both zarr-python and xarray. I'm pushing the zarr ones to zarr-python
xarrayMost of these are in my PR at #9552
Fixed issues
Things to investigate:
|
@TomAugspurger are you able to open a WIP PR with in-progress work. It'd be nice to see what's needed |
Sure, #9552 has that. |
Question for the group: does anyone object to xarray continuing to write Zarr V2 datasets by default? I hesitate to have xarray's default be different from zarr-python's, but that would relive some pressure to address #5475 quickly, since v2 datasets should be round-tripable. |
I think that support for reading Zarr V2 datasets with zarr-python v3 is close to being ready. I updated #9515 (comment) with some instructions on how to install two branches if anyone is able to test that out: In [4]: xr.open_dataset("abfs://daymet-zarr/annual/hi.zarr", engine="zarr", storage_options={"account_name": "daymeteuwest"})
Out[4]:
<xarray.Dataset> Size: 137MB
Dimensions: (y: 584, x: 284, time: 41, nv: 2)
Coordinates:
lat (y, x) float32 663kB ...
lon (y, x) float32 663kB ...
* time (time) datetime64[ns] 328B 1980-07-01T12:00:00 ....
* x (x) float32 1kB -5.802e+06 ... -5.519e+06
* y (y) float32 2kB -3.9e+04 -4e+04 ... -6.22e+05
...
start_year: 1980
In [5]: xr.open_dataset("s3://cmip6-pds/CMIP6/ScenarioMIP/AS-RCEC/TaiESM1/ssp126/r1i1p1f1/Amon/clt/gn/v20201124", engine="zarr", storage_options={"anon": True})
Out[5]:
<xarray.Dataset> Size: 228MB
Dimensions: (time: 1032, lat: 192, lon: 288, bnds: 2)
Coordinates:
* lat (lat) float64 2kB -90.0 -89.06 -88.12 -87.17 ... 88.12 89.06 90.0
...
variant_label: r1i1p1f1 |
Thank you for all your work on Zarr V3 in Xarray! Super excited about the progress here 🎉 I've been using the https://github.com/pydata/xarray/tree/zarr-v3 branch to test out icechunk, and thought I'd share some odd behavior when writing data to S3 just in case it's not a known issue. I ran this code multiple times - the first time only the attributes were written and the second the testv3_store = "s3://nasa-veda-scratch/test-weight-caching/scratch/test-v3.zarr"
s = np.ones(10, dtype='float64')
c = np.ones(10, dtype='int32')
r = np.ones(10, dtype='int32')
ds = xr.Dataset(
data_vars=dict(
S = (["n_s"], s),
col = (["n_s"], c),
row = (["n_s"], r)
),
attrs=dict(n_in = 100, n_out = 200)
)
print(f"Original data vars: {ds.data_vars}")
ds.to_zarr(testv3_store, zarr_format=3, mode="w")
ds2 = xr.open_zarr(testv3_store, zarr_format=3).load()
print(f"Round tripped data vars: {ds2.data_vars}") Original data vars: Data variables:
S (n_s) float64 80B 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
col (n_s) int32 40B 1 1 1 1 1 1 1 1 1 1
row (n_s) int32 40B 1 1 1 1 1 1 1 1 1 1
Round tripped data vars: Data variables:
col (n_s) int32 40B 1 1 1 1 1 1 1 1 1 1
row (n_s) int32 40B 1 1 1 1 1 1 1 1 1 1 p.s., this issue didn't seem to happen when I was writing/reading from an icechunk store |
Thanks for the testing it out and reporting that issue. I'll see if I can reproduce it. |
This sounds like an issue with the |
I'm able to reproduce with the moto test server used in the zarr-python tests. It seems like all the files were written:
I think the array metadata is correct:
Ah, the consolidated metadata is missing though?
Some print statements shows that So right now it looks like an issue with Group.members using a remote store. I'll keep looking. |
@maxrjones when you get a chance, can you try with Assuming that is the issue, the problem is fsspec's dircache, which maintains a cache of directory listings. We must have previously listed the Edit: It looks like s3fs does invalidate its dircache at https://github.com/fsspec/s3fs/blob/dd75a1a076d176049ff1f3d9f616f1a724f794df/s3fs/core.py#L1173. zarr's Reported at fsspec/s3fs#903. So for now you'll need the |
Thanks for digging into this @TomAugspurger - @rabernat and I hit something similar a week ago. See zarr-developers/zarr-python#2348 for a zarr-side attempt at working around the fsspec cache issues. |
Yes, adding those storage options to the |
What is your issue?
Zarr-Python 3.0 is getting close to a full release. This issue tracks the integration of the 3.0 release with Xarray.
Here's a running list of issues we're solving upstream related to integration with Xarray:
Special shout out to @TomAugspurger has been front running a lot of this 🙌.
The text was updated successfully, but these errors were encountered: