Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nbytes_stored incorrect when dimension_separator="/" #2174

Open
dstansby opened this issue Sep 11, 2024 · 3 comments
Open

nbytes_stored incorrect when dimension_separator="/" #2174

dstansby opened this issue Sep 11, 2024 · 3 comments
Labels
bug Potential issues with the zarr-python library V2 Affects the v2 branch

Comments

@dstansby
Copy link
Contributor

Zarr version

2.18.2

Numcodecs version

0.13.0

Python Version

3.10.4

Operating System

macOS

Installation

conda

Description

When saving an array to disk and loading it again with dimension_separator="/", the number of stored bytes is incorrectly reported. In this case it is just reporting the size of the .zarray file.

Steps to reproduce

import numpy as np
import zarr

zarr_path = "test.zarr"
data = np.random.randint(0, 2**8, size=(64, 64, 64), dtype=np.uint8)

for dimension_separator in [".", "/"]:
    zarr.save_array(zarr_path, data, dimension_separator=dimension_separator)
    zarr_arr = zarr.open(zarr_path)
    print(f"{dimension_separator=}")
    print("nbytes_stored:", zarr_arr.nbytes_stored)
    print()
dimension_separator='.'
nbytes_stored: 262567

dimension_separator='/'
nbytes_stored: 391

Additional output

No response

@dstansby dstansby added the bug Potential issues with the zarr-python library label Sep 11, 2024
@dstansby dstansby changed the title nbytes_stored incorrect when dimension_separator="/" nbytes_stored incorrect when dimension_separator="/" Sep 11, 2024
@kabilar
Copy link

kabilar commented Sep 11, 2024

+1 Thank you, @dstansby.

@dstansby
Copy link
Contributor Author

Root cause of this is #253, but I'll leave this open as it gives a nice self contained example of the issue.

Note that for OME-zarr the deafult separator is /, so currently zarr-python v2 will report the wrong size for all OME-zarr arrays 😱

@kabilar
Copy link

kabilar commented Sep 11, 2024

I believe the Chunks initialized metadata is also incorrect. See example below.

import numpy as np
import zarr

zarr_path = "test.zarr"
data = np.random.randint(0, 2**8, size=(1000, 1000), dtype=np.uint8)

for dimension_separator in [".", "/"]:
    zarr.save_array(zarr_path, data, chunks=(100,100), dimension_separator=dimension_separator)
    zarr_arr = zarr.open(zarr_path)
    print(f"{dimension_separator=}")
    print("nbytes_stored:", zarr_arr.nbytes_stored)
    print("nchunks_initialized:", zarr_arr.nchunks_initialized)
    print()

Output

dimension_separator='.'
nbytes_stored: 1001973
nchunks_initialized: 100

dimension_separator='/'
nbytes_stored: 373
nchunks_initialized: 10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Potential issues with the zarr-python library V2 Affects the v2 branch
Projects
None yet
Development

No branches or pull requests

3 participants