Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zarr v3 #404

Open
wants to merge 38 commits into
base: master
Choose a base branch
from
Open

Zarr v3 #404

wants to merge 38 commits into from

Conversation

will-moore
Copy link
Member

@will-moore will-moore commented Nov 7, 2024

This updates ome-zarr-py to use zarr-python v3 but doesn't include support for writing Zarr v3 (OME-Zarr v0.5).
However, it does add support for reading OME-Zarr v0.5 (e.g. for use by napari-ome-zarr).

There were 4 tests that I have disabled as I couldn't understand exactly why they were failing.

 = 4 failed, 384 passed, 3 skipped, 6 xfailed, 1832 warnings in 86.38s (0:01:26) =

4 failing tests are:

  • TestWriter.test_writer[3D-scale-True-from_array-V01]
  • TestWriter.test_writer[3D-scale-True-from_array-V02]
  • TestWriter.test_writer[3D-scale-True-from_array-V03]
  • TestWriter.test_writer[3D-scale-True-from_array-V04]
          # check memory is contiguous, if so flatten
          if arr.flags.c_contiguous or arr.flags.f_contiguous:
              if flatten:
                  # can flatten without copy
                  arr = arr.reshape(-1, order="A")
          else:
  >           raise ValueError("an array with contiguous memory is required")
  E           ValueError: an array with contiguous memory is required
  
  .tox/py311/lib/python3.11/site-packages/numcodecs/compat.py:113: ValueError

These tests were passing before
* 109f71f6 (HEAD) Refactors v2 codec handling (#2425) (zarr-developers/zarr-python#2425)

In several places I've hard-coded zarr_version=2. This works because you can do zarr.open_group(store=self.__store, path="/", zarr_version=2) in parse_url() to create a group to write into, or find a group to read from without specifying whether you expect the group to already exist.

Without the zarr_version=2, zarr will create zarr.json if the mode of the store is w.

NB: I have another PR (in progress) to add support for writing OME-Zarr v0.5 at #413

Fixes #404

@will-moore
Copy link
Member Author

Zarr v3 is not supported on python 3.9 or 3.10. I'll remove them from the build...

Build on python 3.12 is failing tests with:

  = 106 failed, 282 passed, 3 skipped, 6 xfailed, 2270 warnings in 75.86s (0:01:15) =

@imagesc-bot
Copy link

This pull request has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/ome-zarr-py-development-status/104671/4

@joshmoore
Copy link
Member

@will-moore: what's the next step here?

@will-moore
Copy link
Member Author

I have 4 failing tests which I don't really understand:

FAILED tests/test_writer.py::TestWriter::test_writer[3D-scale-True-from_array-V01]
  FAILED tests/test_writer.py::TestWriter::test_writer[3D-scale-True-from_array-V02]
  FAILED tests/test_writer.py::TestWriter::test_writer[3D-scale-True-from_array-V03]
  FAILED tests/test_writer.py::TestWriter::test_writer[3D-scale-True-from_array-V04]
  = 4 failed, 384 passed, 3 skipped, 6 xfailed, 1832 warnings in 86.38s (0:01:26) =

All are from .tox/py311/lib/python3.11/site-packages/numcodecs/compat.py:151: in ensure_contiguous_ndarray ensure_contiguous_ndarray_like(buf, max_buffer_size=max_buffer_size, flatten=flatten)

raise ValueError("an array with contiguous memory is required")

But this PR was really just an investigation into what were the API changes of zarr-python v3 and what is needed for us to consume it.
With this PR, we can now read and write zarr v2 data, but this is hard-coded in a couple of places so we might want a different approach.
It may even "fix" the issue at #343 where we enforce store uses "/" dimension separator for v0.2-v0.4 data.

This PR has no support for OME-Zarr v0.5 and I'm not sure what that would look like yet. It may be that there's no value in this PR on it's own, without v0.5 support.
I could(should) continue to look at what would be needed for OME-Zarr v0.5 read/write?

Any and all reviews and feedback on this PR as it stands would be great to hear, especially an idea about the underlying cause of those failing tests.

We also have to wait for zarr-python v3 release!

@will-moore
Copy link
Member Author

will-moore commented Dec 9, 2024

With that previous commit (even without a FormatV05) reading works, including via napari-ome-zarr:

$ napari --plugin napari-ome-zarr https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.5/idr0062A/6001240_labels.zarr

Screenshot 2024-12-09 at 12 35 43

@will-moore
Copy link
Member Author

Plates also looking good:

$ napari --plugin napari-ome-zarr https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.5/idr0010/76-45.ome.zarr

Screenshot 2024-12-09 at 12 46 02

cc @joshmoore

Copy link

codecov bot commented Dec 9, 2024

Codecov Report

Attention: Patch coverage is 86.36364% with 6 lines in your changes missing coverage. Please review.

Project coverage is 85.14%. Comparing base (c99f0b3) to head (e021c13).

Files with missing lines Patch % Lines
ome_zarr/io.py 79.31% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #404      +/-   ##
==========================================
- Coverage   85.45%   85.14%   -0.32%     
==========================================
  Files          13       13              
  Lines        1540     1548       +8     
==========================================
+ Hits         1316     1318       +2     
- Misses        224      230       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@will-moore will-moore requested a review from joshmoore December 17, 2024 14:19
@will-moore
Copy link
Member Author

@joshmoore can you have a look at this now? See updated description. Thx.

Copy link
Member

@joshmoore joshmoore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few general comments and one or two more pointed ones (especially around dimension separator). In general, though, I think this is looking good. Thanks, @will-moore.

@@ -60,7 +60,7 @@ def matches(self, metadata: dict) -> bool: # pragma: no cover
raise NotImplementedError()

@abstractmethod
def init_store(self, path: str, mode: str = "r") -> FSStore:
def init_store(self, path: str, mode: str = "r") -> RemoteStore:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be | LocalStore as well?

self.__store: RemoteStore = (
path
if isinstance(path, RemoteStore)
else loader.init_store(self.__path, mode)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would also checking for LocalStore here work?

ome_zarr/io.py Outdated
@@ -104,7 +132,7 @@ def path(self) -> str:
return self.__path

@property
def store(self) -> FSStore:
def store(self) -> RemoteStore:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsure why this isn't also | LocalStore

@@ -67,16 +70,41 @@ def __init_metadata(self) -> None:
"""
Load the Zarr metadata files for the given location.
"""
self.zarray: JSONDict = self.get_json(".zarray")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can get_json() now be deprecated?

ome_zarr/writer.py Outdated Show resolved Hide resolved
@@ -13,6 +13,7 @@
"id": "blosc",
"shuffle": 1
},
"dimension_separator": "/",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This surprises me. We really should be able to pass with and without this metadata. (Two tests?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does zarr know to use the / dimension_separator if it's missing? (how was this working before)?

and storage_options_list
and array_constructor == da.array
):
return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use pytest's skip functionality here so this is clearer in the output

Otherwise, I don't immediately have an idea what's going on.

@joshmoore
Copy link
Member

joshmoore commented Dec 18, 2024

Also just tried out the branch to see if it fixes #231 and 👍

(Previously pasted error was from an older version of dask)

@joshmoore
Copy link
Member

Updated with #346.

@imagesc-bot
Copy link

This pull request has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/zarr-python-3-is-here/107024/5

@will-moore
Copy link
Member Author

I have tests passing locally with changes similar to dask/dask#11660 and dask/dask#11668

@will-moore
Copy link
Member Author

The first test failure tests/test_cli.py def test_astronaut_download()
is currently failing with:

    store = zarr.storage.FsspecStore.from_url(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/wmoore/opt/anaconda3/envs/zarrv3_py312/lib/python3.12/site-packages/zarr/storage/_fsspec.py", line 182, in from_url
    return cls(fs=fs, path=path, read_only=read_only, allowed_exceptions=allowed_exceptions)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/wmoore/opt/anaconda3/envs/zarrv3_py312/lib/python3.12/site-packages/zarr/storage/_fsspec.py", line 96, in __init__
    raise TypeError("Filesystem needs to support async operations.")

This can be replicated with

from skimage import data
import dask.array as da
da.to_zarr(da.from_array(data.coins(), chunks=(64, 64)), "test_dask_to_zarr.zarr", compute=True, storage_options={"chunks": (64, 64)})

Or, digging a bit deeper, this gives the same error:

import zarr
store = zarr.storage.FsspecStore.from_url("test_dask_to_zarr.zarr", read_only=True, storage_options={"chunks": (64, 64)})

And we can test fsspec directly with:

from fsspec.core import url_to_fs
fs, path = url_to_fs("test_dask_to_zarr.zarr")
# False! Why?
print("fs.async_impl", fs.async_impl)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants