Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add decompressed OME-Zarr dataset size to iohub info #248

Merged
merged 7 commits into from
Nov 6, 2024
Merged

Conversation

edyoshikun
Copy link
Contributor

@edyoshikun edyoshikun commented Sep 26, 2024

This addresses issue #247 by adding the store size and array size in GB. This is useful and simple metadata.

I wanted to know how much memory to request for caching datasets.

@ziw-liu
Copy link
Collaborator

ziw-liu commented Sep 26, 2024

Is this meant to represent the size on disk (compressed) or size in RAM (decompressed)?

@ziw-liu ziw-liu added enhancement New feature or request NGFF OME-NGFF (OME-Zarr format) labels Sep 26, 2024
@edyoshikun
Copy link
Contributor Author

I find it more use when it's decompressed rather than compressed. We can report both if needed. I think zarr.array does nbytes_stored. What do you guys think?

Copy link
Contributor

@talonchandler talonchandler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the uncompressed size is the most valuable.

The reported size is the expected size, not the true size (e.g. it hasn't been filled yet or there was an error). Naming is tricky---maybe "Expected uncompressed size (GB)", "Est. size in RAM (GB)", or "Est. size (GB)"?

iohub/reader.py Outdated Show resolved Hide resolved
@edyoshikun
Copy link
Contributor Author

ended up adding uncompressed size [GB]

iohub/reader.py Outdated Show resolved Hide resolved
@edyoshikun edyoshikun requested a review from ziw-liu October 15, 2024 01:25
@ziw-liu
Copy link
Collaborator

ziw-liu commented Oct 26, 2024

Need to add a test case before merging.

@ziw-liu
Copy link
Collaborator

ziw-liu commented Nov 6, 2024

Due to upstream issue zarr-developers/zarr-python#2174, nbytes_stored will be wrong for OME-Zarr. I think we should just remove this field since the zarr devs are probably not focusing on v2 bugs now.

@ziw-liu
Copy link
Collaborator

ziw-liu commented Nov 6, 2024

For example this compression ratio is clearly wrong:

No. bytes:               88473600 [84.4 MiB]
No. bytes stored:        419 [419 B]

@ziw-liu ziw-liu changed the title adding datastore size to iohub info Add decompressed OME-Zarr dataset size to iohub info Nov 6, 2024
@ziw-liu ziw-liu merged commit 16b5571 into main Nov 6, 2024
7 checks passed
@ziw-liu ziw-liu deleted the info_data_size branch November 6, 2024 19:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request NGFF OME-NGFF (OME-Zarr format)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants