Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Saving to compressed zarr yields lower or negative compression #119

Closed
mattphysics opened this issue Jul 15, 2022 · 1 comment · Fixed by #121
Closed

Saving to compressed zarr yields lower or negative compression #119

mattphysics opened this issue Jul 15, 2022 · 1 comment · Fixed by #121

Comments

@mattphysics
Copy link

  • xbitinfo version:
    xbitinfo 0.0.2 pypi_0 pypi

  • Python version:
    python 3.10.5 hdaaf3db_0_cpython conda-forge

  • Operating System:
    macOS Big Sur

Description

I ran the tutorial notebook for saving to .nc and .zarr (https://xbitinfo.readthedocs.io/en/latest/quick-start.html). I expected progressively smaller file sizes, with "original" > "compressed" > "bitrounded". For .nc, this is what happened, while for .zarr

  1. the file size of the "original" is 30% smaller than for .nc (2.8 MB).
  2. Doing "ds.to_compressed_zarr") increases the size by 60% (4.5 MB).
  3. Saving the bitrounded data to compressed_zarr yields (as expected) a smaller file size (804 KB), which is 64% larger than the equivalent .nc file
    Screenshot 2022-07-14 at 20 46 55
    Screenshot 2022-07-14 at 20 47 07

I have uploaded the notebook with results at https://github.com/mattphysics/xbitinfo/blob/main/tests/nb_xbitinfo_export.ipynb

@observingClouds
Copy link
Owner

observingClouds commented Jul 17, 2022

Hi @mattphysics,
Thanks for trying out xbitinfo and opening this PR. The tricky part here is that to_zarr compresses by default. This explains why the xbitinfo/original.zarr is much smaller than xbitinfo/original.nc. The to_compressed_zarr method that xbitinfo provides is rather optimised for bit rounded data and might have some overhead for a small file without bit rounding.

Check out

def test_to_compressed_zarr(rasm):
"""Test to_compressed_zarr reduces size on disk."""
ds = rasm
label = "file"
# save
encoding = {
var: {"compressor": None} for var in ds.data_vars
} # deactivate default compression
ds.to_zarr(f"./tmp_testdir/{label}.zarr", mode="w", encoding=encoding)
ds.to_compressed_zarr(f"./tmp_testdir/{label}_compressed.zarr", mode="w")
# check size reduction
ori_size = get_zarr_size(f"./tmp_testdir/{label}.zarr")
compressed_size = get_zarr_size(f"./tmp_testdir/{label}_compressed.zarr")

Here we explicitly set the compression to None for the to_zarr method.

We should adapt this in the notebook as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants