Adds checksum flag to zstd codec #519

normanrz · 2024-04-22T12:03:34Z

This PR adds the checksum flag to the zstd codec. This is necessary to support the proposed zstd codec for Zarr3. We need it for the v3 refactoring of zarr-python.

TODO:

Unit tests and/or doctests in docstrings
Tests pass locally
Docstrings and API docs for any new/modified user-facing classes and functions
Changes documented in docs/release.rst
Docs build locally
GitHub Actions CI passes
Test coverage to 100% (Codecov passes)

normanrz · 2024-05-08T19:07:09Z

I would love a review on this so that we can ship this in the upcoming zarr-python 3 release.

d-v-b · 2024-05-08T19:12:48Z

@mkitti would you mind giving this a look?

martindurant · 2024-05-08T19:19:00Z

A good time to switch to cramjam? Does this zstd provide anything that that one doesn't?

normanrz · 2024-05-08T19:25:49Z

A good time to switch to cramjam? Does this zstd provide anything that that one doesn't?

cramjam also does not expose the checksum option.

mkitti · 2024-05-08T20:05:58Z

I'm looking. We should normalize the implementation here such that negative compression levels are passed on to the C library and that the default compression level is the default compression level of the underlying C library.

mkitti · 2024-05-08T20:27:31Z

The default CLEVEL here should be changed to 0. That should be passed on the C library directly without further logic.

DEFAULT_CLEVEL = 0

…into zstd-checksum

normanrz · 2024-05-08T20:36:55Z

The default CLEVEL here should be changed to 0. That should be passed on the C library directly without further logic.

DEFAULT_CLEVEL = 0

I made that change. This is a breaking change, though.

mkitti · 2024-06-19T12:39:13Z

I made that change. This is a breaking change, though.

The change is that we used to default to compression level 1 explicitly, overriding the configuration of the underlying Zstandard C library.

Now we pass 0 as the compression level, which uses the default compression level of the C library. The default compression level is usually 3.

For reference see
https://facebook.github.io/zstd/zstd_manual.html#Chapter5

Do we provide access to ZSTD_defaultCLevel(void)?

normanrz · 2024-06-19T18:44:33Z

Do we provide access to ZSTD_defaultCLevel(void)?

We do not. For what would that be useful?

normanrz · 2024-06-19T18:47:02Z

Are there any objections for changing the default compression level from 1 to 3 (i.e. default of the underlying zstd c library) @zarr-developers/python-core-devs?

codecov · 2024-06-19T18:50:46Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.91%. Comparing base (bef2e16) to head (696e582).
Report is 28 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #519   +/-   ##
=======================================
  Coverage   99.91%   99.91%           
=======================================
  Files          59       59           
  Lines        2312     2319    +7     
=======================================
+ Hits         2310     2317    +7     
  Misses          2        2

Files with missing lines	Coverage Δ
numcodecs/tests/test_zstd.py	`100.00% <100.00%> (ø)`

mkitti · 2024-06-20T18:44:27Z

We do not. For what would that be useful?

That would be useful to checking the configuration of the Zstandard C Library. In other words, it would indicate what a compression level of 0 actually corresponds to. By default it is 3, but someone could have configured it differently at compile time:

https://github.com/facebook/zstd/blob/17b531501670781f37fc3e5070a29eede09bca3b/lib/zstd.h#L128-L130

normanrz · 2024-06-20T19:27:47Z

But that would mean that people have a modified version of numcodecs with a modified zstd library within blosc. Given that a main purpose of numcodecs is to deliver wheels, that seems pretty unlikely to me.

mkitti · 2024-06-24T13:07:00Z

Note that conda-forge builds Blosc with an independently configured zstandard library:
https://github.com/conda-forge/blosc-feedstock/blob/main/recipe%2Fmeta.yaml#L27

normanrz · 2024-06-24T13:57:38Z

@mkitti I added the Zstd.{default,min,max}_level properties.

mkitti · 2024-06-24T14:05:46Z

Currently, we depend on Zstd frame content size being encoded. However, we do know what the uncompressed size of a chunk should be from array information. Perhaps we should not generate an error if the frame content size is not "known" to Zstandard because it is, in fact, known to us.

The frame content size may not be encoded if the encoder is using streaming mode and did not pledge the full size of the content in beginning.

martindurant · 2024-06-24T14:14:31Z

we do know what the uncompressed size of a chunk should be from array information

Only if this is the only stage in the decompression pipeline. Multiple byte compression/encoding stages are allowed, or indeed, not shape-preserving array operations too.

Side note: I don't believe that zstd, zstandard or cramjam.zstd need the total size in order to decompress, but can use it as an optimization. Also, at least the latter can do decompress_into where the target could be the final array buffer (if it is a contiguous block).

normanrz · 2024-06-24T14:18:36Z

Currently, we depend on Zstd frame content size being encoded. However, we do know what the uncompressed size of a chunk should be from array information. Perhaps we should not generate an error if the frame content size is not "known" to Zstandard because it is, in fact, known to us.

The frame content size may not be encoded if the encoder is using streaming mode and did not pledge the full size of the content in beginning.

I think that change would be out of scope for this PR. We should continue this discussion in a new issue.

* expose checksum toggle for zstd * fixes zstd checksumming * less fixtures * write_checksum -> checksum * adds release notes * set default clevel to 0 * release * update fixtures * fix checksum flag * add test for checksum * adds wrapper codecs for the v2 codec pipeline * docstring

normanrz added 4 commits April 22, 2024 12:57

expose checksum toggle for zstd

f96cbbb

fixes zstd checksumming

791d5b7

less fixtures

3a56d16

write_checksum -> checksum

62e2acd

normanrz changed the title ~~Adds checksum flag to zsdt codec~~ Adds checksum flag to zstd codec Apr 22, 2024

adds release notes

600b455

normanrz mentioned this pull request May 5, 2024

Remove zstandard dependency in favor of numcodecs zarr-developers/zarr-python#1838

Merged

normanrz self-assigned this May 5, 2024

Merge branch 'main' into zstd-checksum

f0dd82f

normanrz added 4 commits May 8, 2024 22:31

set default clevel to 0

17d0e3d

release

2dabd56

Merge branch 'zstd-checksum' of github.com:zarr-developers/numcodecs …

8432e64

…into zstd-checksum

update fixtures

3758f78

Merge branch 'main' into zstd-checksum

f4b9ac1

Merge branch 'main' into zstd-checksum

c868f40

Merge branch 'main' into zstd-checksum

eef4676

fix checksum flag

3d9673a

add test for checksum

947dd2e

adds wrapper codecs for the v2 codec pipeline

e0dee3c

docstring

8f6297f

merge

696e582

normanrz mentioned this pull request Jun 24, 2024

Uses zstd from numcodecs zarr-developers/zarr-python#1984

Closed

normanrz merged commit 5b12b15 into main Jun 24, 2024
45 checks passed

normanrz deleted the zstd-checksum branch June 24, 2024 15:48

QuLogic mentioned this pull request Jul 21, 2024

classmethod properties are removed in Python 3.13 #553

Closed

mkitti mentioned this pull request Jul 26, 2024

zarr-python cannot read arrays saved by tensorstore using the zstd compressor zarr-developers/zarr-python#2056

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds checksum flag to zstd codec #519

Adds checksum flag to zstd codec #519

normanrz commented Apr 22, 2024 •

edited

Loading

normanrz commented May 8, 2024

d-v-b commented May 8, 2024

martindurant commented May 8, 2024

normanrz commented May 8, 2024

mkitti commented May 8, 2024

mkitti commented May 8, 2024

normanrz commented May 8, 2024

mkitti commented Jun 19, 2024 •

edited

Loading

normanrz commented Jun 19, 2024

normanrz commented Jun 19, 2024

codecov bot commented Jun 19, 2024 •

edited

Loading

mkitti commented Jun 20, 2024

normanrz commented Jun 20, 2024

mkitti commented Jun 24, 2024

normanrz commented Jun 24, 2024

mkitti commented Jun 24, 2024

martindurant commented Jun 24, 2024

normanrz commented Jun 24, 2024

Adds checksum flag to zstd codec #519

Adds checksum flag to zstd codec #519

Conversation

normanrz commented Apr 22, 2024 • edited Loading

normanrz commented May 8, 2024

d-v-b commented May 8, 2024

martindurant commented May 8, 2024

normanrz commented May 8, 2024

mkitti commented May 8, 2024

mkitti commented May 8, 2024

normanrz commented May 8, 2024

mkitti commented Jun 19, 2024 • edited Loading

normanrz commented Jun 19, 2024

normanrz commented Jun 19, 2024

codecov bot commented Jun 19, 2024 • edited Loading

Codecov Report

mkitti commented Jun 20, 2024

normanrz commented Jun 20, 2024

mkitti commented Jun 24, 2024

normanrz commented Jun 24, 2024

mkitti commented Jun 24, 2024

martindurant commented Jun 24, 2024

normanrz commented Jun 24, 2024

normanrz commented Apr 22, 2024 •

edited

Loading

mkitti commented Jun 19, 2024 •

edited

Loading

codecov bot commented Jun 19, 2024 •

edited

Loading