Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix create_dataset with data kwarg #2638

Merged
merged 3 commits into from
Jan 4, 2025

Conversation

jni
Copy link
Contributor

@jni jni commented Jan 4, 2025

Closes #2631

create_dataset is a legacy method for zarr v2 compatibility to help libraries
migrate to zarr v3. #2463 was a little overenthusiastic about the capabilities
of the new create_array method 😂 and missed one of the kwargs of
create_dataset. This PR adds special handling of the data kwarg before
passing the remaining arguments to create_array.

I also added special handling of the dtype kwarg because it is optional in 2.x
if data is passed. (For some reason, shape is not optional, and I thought
about fixing it but it seems pointless because the function is going to be
removed anyway, so we don't want to add new conveniences to it.)

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/tutorial.rst
  • Changes documented in docs/release.rst
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

@jhamman jhamman enabled auto-merge (squash) January 4, 2025 19:50
@jhamman jhamman merged commit 3194534 into zarr-developers:main Jan 4, 2025
28 checks passed
@will-moore
Copy link
Contributor

Just to clarify... Is it intended that this change was only added to the deprecated group.create_dataset() and not to group.create_array()?

So we shouldn't expect this to be around too long and should migrate to doing this:

zarr_array = group.create_array(str(path), dtype=data.dtype, shape=data.shape, **options)
zarr_array[...] = data

@normanrz
Copy link
Member

So we shouldn't expect this to be around too long and should migrate to doing this:

zarr_array = group.create_array(str(path), dtype=data.dtype, shape=data.shape, **options)
zarr_array[...] = data

Yes, I would recommend that.

@d-v-b
Copy link
Contributor

d-v-b commented Jan 13, 2025

we should definitely have a group-level method that lets people create + fill an array from an existing array.

@normanrz
Copy link
Member

Maybe group.from_array(data, name, ..., write=True)? See #2622

@d-v-b
Copy link
Contributor

d-v-b commented Jan 13, 2025

that would be a perfect fit, but X.from_y() normally makes an instance of X. I wonder if there's a name re-arrangement that could make this not confusing

@d-v-b
Copy link
Contributor

d-v-b commented Jan 13, 2025

as_array I think is better than from_array in this context, thoughts? this would also affect #2622

@jni jni deleted the fix-create-dataset branch January 13, 2025 23:03
@jni
Copy link
Contributor Author

jni commented Jan 13, 2025

Is it intended that this change was only added to the deprecated group.create_dataset() and not to group.create_array()?

Not from my end. This was just what I needed to get the napari tests passing.

as_array I think is better than from_array in this context, thoughts?

I would find this confusing. create_array_from? create_from_array?

@jni
Copy link
Contributor Author

jni commented Jan 14, 2025

So we shouldn't expect this to be around too long and should migrate to doing this:

The issue is that create_array doesn't exist in 2.x, so it's impossible to support both 2.x and 3.x with the create_array API. This is why I advocated for a much longer deprecation cycle for create_dataset. So far there hasn't been much discussion around the length of the deprecation though.

@normanrz
Copy link
Member

What length of deprecation cycle would you recommend?

@jni
Copy link
Contributor Author

jni commented Jan 14, 2025

minimum 6mo but preferably 24mo. (The latter being in line with SPEC0.)

@normanrz
Copy link
Member

Thanks. I could get behind that but I think we need to have a discussion. I could also see different deprecation time scales for different APIs within the library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Group.create_dataset does not match 2.x API
5 participants