Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

top-level functions for reading, creating data #2463

Merged
merged 111 commits into from
Jan 2, 2025
Merged
Changes from 1 commit
Commits
Show all changes
111 commits
Select commit Hold shift + click to select a range
fd6ecd1
add functions for easy read-only data access
d-v-b Nov 4, 2024
fa343f5
sync funcs
d-v-b Nov 4, 2024
d95eba8
make read-only funcs top-level exports
d-v-b Nov 4, 2024
f6765bc
Merge branch 'main' into feat/read-funcs
d-v-b Nov 5, 2024
5d8445b
add create_array, create_group, and tests
d-v-b Nov 5, 2024
9526571
add top-level imports
d-v-b Nov 5, 2024
90bf421
Merge branch 'feat/read-funcs' of github.com:d-v-b/zarr-python into f…
d-v-b Nov 5, 2024
de280a7
add test for top-level exports
d-v-b Nov 5, 2024
d9878cf
add test for read
d-v-b Nov 5, 2024
e5217ce
add asserts
d-v-b Nov 5, 2024
40cc7af
Apply suggestions from code review
d-v-b Nov 5, 2024
d7ce58b
Merge branch 'main' into feat/read-funcs
d-v-b Nov 5, 2024
a0dfe18
Merge branch 'main' into feat/read-funcs
d-v-b Nov 5, 2024
98bc328
Merge branch 'main' into feat/read-funcs
d-v-b Nov 12, 2024
16f5cc2
Merge branch 'main' of github.com:zarr-developers/zarr-python into fe…
d-v-b Nov 29, 2024
4b45ebf
handle sharding in create_array
d-v-b Dec 10, 2024
750a439
Merge branch 'main' of github.com:zarr-developers/zarr-python into fe…
d-v-b Dec 10, 2024
7a5cbe7
tweak
d-v-b Dec 10, 2024
215ff96
Merge branch 'main' of github.com:zarr-developers/zarr-python into fe…
d-v-b Dec 18, 2024
489e2a2
make logic of _auto_partition better for shard shape
d-v-b Dec 18, 2024
05dd0d8
add dtype parsing, and tweak auto_partitioning func
d-v-b Dec 18, 2024
3fbfc21
sketch of docstring; remove auto chunks / shard shape
d-v-b Dec 19, 2024
b348737
Merge branch 'main' of github.com:zarr-developers/zarr-python into fe…
d-v-b Dec 19, 2024
5025ad6
tweak docstring
d-v-b Dec 19, 2024
e204a32
Merge branch 'main' of github.com:zarr-developers/zarr-python into fe…
d-v-b Dec 20, 2024
68465db
docstrings
d-v-b Dec 20, 2024
d7bb121
ensure tests pass
d-v-b Dec 20, 2024
99cc8f5
tuple -> list
d-v-b Dec 20, 2024
a39457f
allow data in create_array
d-v-b Dec 20, 2024
3f0a3e0
docstring
d-v-b Dec 20, 2024
26ced00
remove auto_partition
d-v-b Dec 20, 2024
af55ac4
make shape shapelike
d-v-b Dec 20, 2024
07f07ea
use create_array everywhere in group class
d-v-b Dec 20, 2024
bc552ce
remove readers
d-v-b Dec 20, 2024
74f731a
fix dodgy imports
d-v-b Dec 20, 2024
43877c0
compressors -> compression, auto chunking, auto sharding, auto compre…
d-v-b Dec 21, 2024
c693fb4
use sane shard shape when there are too few chunks
d-v-b Dec 21, 2024
4c18aaa
Merge branch 'main' into feat/read-funcs
jhamman Dec 22, 2024
dba2594
fix: allow user-specified filters and compression
d-v-b Dec 22, 2024
669ad72
np.dtype[np.generic] -> np.dtype[Any]
d-v-b Dec 22, 2024
ae1832d
handle singleton compressor / filters input
d-v-b Dec 22, 2024
5cb6dd8
default codec config now uses the full config dict
normanrz Dec 22, 2024
5dcd80b
test for auto sharding
d-v-b Dec 23, 2024
810ff9b
Merge branch 'feat/default-codecs' into feat/read-funcs
normanrz Dec 25, 2024
eab46a2
test
normanrz Dec 25, 2024
bcdc4cc
adds a shards property
normanrz Dec 25, 2024
4e978f9
add (typed) functions for resolving codecs
d-v-b Dec 26, 2024
a9850bf
better codec parsing
d-v-b Dec 26, 2024
2747d69
add warning if auto sharding is used
d-v-b Dec 26, 2024
023c16b
remove read_array
d-v-b Dec 26, 2024
de2c36e
rename compression to compressors, and make the docstring for create_…
d-v-b Dec 26, 2024
74d31ef
compression -> compressors, shard_shape -> shards, chunk_shape -> chunks
d-v-b Dec 26, 2024
470b60f
use typerror instead of valuerror; docstring
d-v-b Dec 27, 2024
e8b1ad1
default order is None
d-v-b Dec 27, 2024
b919483
Merge branch 'feat/chunks-shards' into feat/read-funcs
normanrz Dec 27, 2024
6fcd976
fix circular dep
normanrz Dec 27, 2024
d9c30a3
format
normanrz Dec 27, 2024
0bf4dd0
fix some tests
normanrz Dec 27, 2024
ea3ed0e
use filters=auto and compressors=auto in Group.create_array
normanrz Dec 27, 2024
54fd920
compression -> compressors
normanrz Dec 27, 2024
a4ba7db
Update src/zarr/core/group.py
d-v-b Dec 28, 2024
fb286a7
fix mypy
normanrz Dec 28, 2024
df35d13
narrow type of filters param and compression param
d-v-b Dec 28, 2024
80b5a10
Merge branch 'feat/read-funcs' of github.com:d-v-b/zarr-python into f…
d-v-b Dec 28, 2024
77f40a5
remove data kwarg to create_array
d-v-b Dec 28, 2024
235e246
mypy fixes
normanrz Dec 28, 2024
95348d6
ensure that we accept dict form of compressor in _parse_chunk_encodin…
d-v-b Dec 28, 2024
91a7916
Merge branch 'feat/read-funcs' of github.com:d-v-b/zarr-python into f…
d-v-b Dec 28, 2024
665037e
fix properties test
normanrz Dec 28, 2024
ae76bb3
Merge branch 'feat/read-funcs' of github.com:d-v-b/zarr-python into f…
normanrz Dec 28, 2024
0a983e6
add tests for compressors and filters kwargs to create_array
d-v-b Dec 28, 2024
2182793
add tests for codec inference
d-v-b Dec 28, 2024
c04d7cf
add test for illegal shards kwarg for v2 arrays
d-v-b Dec 28, 2024
144b2b7
remove redundant test function
d-v-b Dec 28, 2024
d407e5d
tests and types
normanrz Dec 29, 2024
1301c5f
rm print
normanrz Dec 29, 2024
31b3ad4
types
normanrz Dec 29, 2024
a0c1c95
merge
normanrz Dec 29, 2024
43b6774
resolve cyclic import
normanrz Dec 29, 2024
e55023a
add create_array to async and sync API
normanrz Dec 30, 2024
e24bdeb
docs for create_array
normanrz Dec 30, 2024
b564ae6
rename (Async)Array.create to _create
normanrz Dec 31, 2024
75b2197
adds array_bytes_codec kwarg
normanrz Dec 31, 2024
2f6f8a0
tests
normanrz Dec 31, 2024
c4330ef
tests for no filters+compressors
normanrz Dec 31, 2024
f926a5a
Merge branch 'main' into feat/read-funcs
d-v-b Jan 1, 2025
95ffadd
widen type of FiltersParam to include single numcodecs codec instances
d-v-b Jan 1, 2025
bbe3a94
don't alias None to default codecs in _create_v2
d-v-b Jan 1, 2025
856b40f
allow single codec instances for filters, and None for filters / comp…
d-v-b Jan 1, 2025
2aa3acc
add docstring for None
normanrz Jan 2, 2025
9fb8a33
single-item tuple for compressors in v2
normanrz Jan 2, 2025
99faa8e
Update src/zarr/core/array.py
normanrz Jan 2, 2025
108daa0
Merge branch 'main' into feat/read-funcs
normanrz Jan 2, 2025
305fdb7
merge
normanrz Jan 2, 2025
947f20e
tweaks
normanrz Jan 2, 2025
2d2af8f
Merge branch 'main' into feat/no-array-create
normanrz Jan 2, 2025
14c45cd
pr feedback 1
normanrz Jan 2, 2025
ff5e5cb
Merge branch 'feat/no-array-create' of github.com:zarr-developers/zar…
normanrz Jan 2, 2025
2afe940
tests
normanrz Jan 2, 2025
e3f1f33
mypy
normanrz Jan 2, 2025
1643983
rename array_bytes_codec to serializer
normanrz Jan 2, 2025
aad8e9d
Update src/zarr/api/asynchronous.py
d-v-b Jan 2, 2025
f29b2d9
docstrings
d-v-b Jan 2, 2025
4654cbd
*params -> *like
d-v-b Jan 2, 2025
5cdb515
*params -> *like, in tests
d-v-b Jan 2, 2025
0dc7dc6
merge
normanrz Jan 2, 2025
c5c761e
Merge remote-tracking branch 'origin/main' into feat/read-funcs
normanrz Jan 2, 2025
be60d73
adds deprecated compressor arg to Group.create_array
normanrz Jan 2, 2025
ae1aa2a
Merge remote-tracking branch 'origin/main' into feat/read-funcs
normanrz Jan 2, 2025
0a8b91c
docs
normanrz Jan 2, 2025
315ba88
Merge branch 'main' into feat/read-funcs
jhamman Jan 2, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
add create_array, create_group, and tests
  • Loading branch information
d-v-b committed Nov 5, 2024

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
commit 5d8445bd824eaf3fcc291e8a12397d741aa582cd
171 changes: 171 additions & 0 deletions src/zarr/api/asynchronous.py
Original file line number Diff line number Diff line change
@@ -644,6 +644,54 @@ async def group(
)


async def create_group(
*,
store: StoreLike,
path: str | None = None,
overwrite: bool = False,
zarr_format: ZarrFormat | None = None,
attributes: dict[str, Any] | None = None,
storage_options: dict[str, Any] | None = None,
) -> AsyncGroup:
"""Create a group.

Parameters
----------
store : Store or str
Store or path to directory in file system.
path : str, optional
Group path within store.
overwrite : bool, optional
If True, pre-existing data at ``path`` will be deleted before
creating the group.
zarr_format : {2, 3, None}, optional
The zarr format to use when saving.
storage_options : dict
If using an fsspec URL to create the store, these will be passed to
the backend implementation. Ignored otherwise.

Returns
-------
g : group
The new group.
"""

if zarr_format is None:
zarr_format = _default_zarr_version()

# TODO: fix this when modes make sense. It should be `w` for overwriting, `w-` otherwise
d-v-b marked this conversation as resolved.
Show resolved Hide resolved
mode: Literal["a"] = "a"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What isn't working here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the TODO as its from an era before some store mode refactoring.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@d-v-b Did you push your change?


store_path = await make_store_path(store, path=path, mode=mode, storage_options=storage_options)

return await AsyncGroup.from_store(
store=store_path,
zarr_format=zarr_format,
exists_ok=overwrite,
attributes=attributes,
)


async def open_group(
store: StoreLike | None = None,
*, # Note: this is a change from v2
@@ -752,6 +800,7 @@ async def open_group(

async def read_group(
store: StoreLike,
*,
path: str | None = None,
zarr_format: ZarrFormat | None = None,
storage_options: dict[str, Any] | None = None,
@@ -810,6 +859,127 @@ async def read_group(
)


async def create_array(
normanrz marked this conversation as resolved.
Show resolved Hide resolved
store: str | StoreLike,
*,
shape: ChunkCoords,
chunks: ChunkCoords | None = None, # TODO: v2 allowed chunks=True
dtype: npt.DTypeLike | None = None,
compressor: dict[str, JSON] | None = None, # TODO: default and type change
fill_value: Any | None = 0, # TODO: need type
order: MemoryOrder | None = None,
overwrite: bool = False,
path: PathLike | None = None,
filters: list[dict[str, JSON]] | None = None, # TODO: type has changed
dimension_separator: Literal[".", "/"] | None = None,
zarr_format: ZarrFormat | None = None,
attributes: dict[str, JSON] | None = None,
# v3 only
chunk_shape: ChunkCoords | None = None,
chunk_key_encoding: (
ChunkKeyEncoding
| tuple[Literal["default"], Literal[".", "/"]]
| tuple[Literal["v2"], Literal[".", "/"]]
| None
) = None,
codecs: Iterable[Codec | dict[str, JSON]] | None = None,
dimension_names: Iterable[str] | None = None,
storage_options: dict[str, Any] | None = None,
**kwargs: Any,
) -> AsyncArray[ArrayV2Metadata] | AsyncArray[ArrayV3Metadata]:
"""Create an array.

Parameters
----------
shape : int or tuple of ints
Array shape.
chunks : int or tuple of ints, optional
Chunk shape. If True, will be guessed from `shape` and `dtype`. If
False, will be set to `shape`, i.e., single chunk for the whole array.
If an int, the chunk size in each dimension will be given by the value
of `chunks`. Default is True.
dtype : str or dtype, optional
NumPy dtype.
compressor : Codec, optional
Primary compressor.
fill_value : object
Default value to use for uninitialized portions of the array.
order : {'C', 'F'}, optional
Memory layout to be used within each chunk.
Default is set in Zarr's config (`array.order`).
store : Store or str
Store or path to directory in file system or name of zip file.
overwrite : bool, optional
If True, delete all pre-existing data in `store` at `path` before
creating the array.
path : str, optional
Path under which array is stored.
filters : sequence of Codecs, optional
Sequence of filters to use to encode chunk data prior to compression.
dimension_separator : {'.', '/'}, optional
Separator placed between the dimensions of a chunk.
zarr_format : {2, 3, None}, optional
The zarr format to use when saving.
storage_options : dict
If using an fsspec URL to create the store, these will be passed to
the backend implementation. Ignored otherwise.

Returns
-------
z : array
The array.
"""

if zarr_format is None:
zarr_format = _default_zarr_version()

if zarr_format == 2 and chunks is None:
chunks = shape
elif zarr_format == 3 and chunk_shape is None:
if chunks is not None:
chunk_shape = chunks
chunks = None
else:
chunk_shape = shape

if dimension_separator is not None:
if zarr_format == 3:
raise ValueError(
"dimension_separator is not supported for zarr format 3, use chunk_key_encoding instead"
)
else:
warnings.warn(
"dimension_separator is not yet implemented",
RuntimeWarning,
stacklevel=2,
)

# TODO: fix this when modes make sense. It should be `w` for overwriting, `w-` otherwise
mode: Literal["a"] = "a"

store_path = await make_store_path(store, path=path, mode=mode, storage_options=storage_options)

return await AsyncArray.create(
store_path,
shape=shape,
chunks=chunks,
dtype=dtype,
compressor=compressor,
fill_value=fill_value,
exists_ok=overwrite,
filters=filters,
dimension_separator=dimension_separator,
zarr_format=zarr_format,
chunk_shape=chunk_shape,
chunk_key_encoding=chunk_key_encoding,
codecs=codecs,
dimension_names=dimension_names,
attributes=attributes,
order=order,
**kwargs,
)


async def create(
shape: ChunkCoords,
*, # Note: this is a change from v2
@@ -996,6 +1166,7 @@ async def create(

async def read_array(
normanrz marked this conversation as resolved.
Show resolved Hide resolved
store: StoreLike,
*,
path: str | None = None,
zarr_format: ZarrFormat | None = None,
storage_options: dict[str, Any] | None = None,
32 changes: 30 additions & 2 deletions src/zarr/api/synchronous.py
Original file line number Diff line number Diff line change
@@ -242,10 +242,34 @@ def open_group(
)


def create_group(
d-v-b marked this conversation as resolved.
Show resolved Hide resolved
store: StoreLike,
*,
path: str | None = None,
zarr_format: ZarrFormat | None = None,
overwrite: bool = False,
attributes: dict[str, Any] | None = None,
storage_options: dict[str, Any] | None = None,
) -> Group:
normanrz marked this conversation as resolved.
Show resolved Hide resolved
return Group(
sync(
async_api.create_group(
store=store,
path=path,
overwrite=overwrite,
storage_options=storage_options,
zarr_format=zarr_format,
attributes=attributes,
)
)
)


def read_group(
store: StoreLike | None = None,
store: StoreLike,
*,
path: str | None = None,
storage_options: dict[str, Any] | None = None, # not used in async api
storage_options: dict[str, Any] | None = None,
zarr_format: ZarrFormat | None = None,
use_consolidated: bool | str | None = None,
) -> Group:
@@ -264,6 +288,10 @@ def create(*args: Any, **kwargs: Any) -> Array:
return Array(sync(async_api.create(*args, **kwargs)))


def create_array(*args: Any, **kwargs: Any) -> Array:
return Array(sync(async_api.create_array(*args, **kwargs)))


def read_array(*args: Any, **kwargs: Any) -> Array:
return Array(sync(async_api.read_array(*args, **kwargs)))

7 changes: 4 additions & 3 deletions src/zarr/core/array.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
from __future__ import annotations

import json
import warnings
from asyncio import gather
from dataclasses import dataclass, field
from itertools import starmap
@@ -144,9 +145,9 @@ async def get_array_metadata(
(store_path / ZATTRS_JSON).get(),
)
if zarr_json_bytes is not None and zarray_bytes is not None:
# TODO: revisit this exception type
# alternatively, we could warn and favor v3
raise ValueError("Both zarr.json and .zarray objects exist")
# wwarn and favor v3
d-v-b marked this conversation as resolved.
Show resolved Hide resolved
msg = f"Both zarr.json (zarr v3) and .zarray (zarr v2) metadata objects exist at {store_path}."
normanrz marked this conversation as resolved.
Show resolved Hide resolved
warnings.warn(msg, stacklevel=1)
if zarr_json_bytes is None and zarray_bytes is None:
raise FileNotFoundError(store_path)
# set zarr_format based on which keys were found
6 changes: 3 additions & 3 deletions src/zarr/core/group.py
Original file line number Diff line number Diff line change
@@ -486,9 +486,9 @@ async def open(
(store_path / str(consolidated_key)).get(),
)
if zarr_json_bytes is not None and zgroup_bytes is not None:
# TODO: revisit this exception type
# alternatively, we could warn and favor v3
raise ValueError("Both zarr.json and .zgroup objects exist")
# we could warn and favor v3
d-v-b marked this conversation as resolved.
Show resolved Hide resolved
msg = f"Both zarr.json (zarr v3) and .zgroup (zarr v2) metadata objects exist at {store_path}."
warnings.warn(msg, stacklevel=1)
if zarr_json_bytes is None and zgroup_bytes is None:
raise FileNotFoundError(
f"could not find zarr.json or .zgroup objects in {store_path}"
Loading