-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds checksum flag to zstd codec #519
Conversation
I would love a review on this so that we can ship this in the upcoming zarr-python 3 release. |
@mkitti would you mind giving this a look? |
A good time to switch to cramjam? Does this zstd provide anything that that one doesn't? |
|
I'm looking. We should normalize the implementation here such that negative compression levels are passed on to the C library and that the default compression level is the default compression level of the underlying C library. |
The default CLEVEL here should be changed to
|
I made that change. This is a breaking change, though. |
The change is that we used to default to compression level 1 explicitly, overriding the configuration of the underlying Zstandard C library. Now we pass For reference see Do we provide access to |
We do not. For what would that be useful? |
Are there any objections for changing the default compression level from 1 to 3 (i.e. default of the underlying zstd c library) @zarr-developers/python-core-devs? |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #519 +/- ##
=======================================
Coverage 99.91% 99.91%
=======================================
Files 59 59
Lines 2312 2319 +7
=======================================
+ Hits 2310 2317 +7
Misses 2 2
|
That would be useful to checking the configuration of the Zstandard C Library. In other words, it would indicate what a compression level of |
But that would mean that people have a modified version of numcodecs with a modified zstd library within blosc. Given that a main purpose of numcodecs is to deliver wheels, that seems pretty unlikely to me. |
Note that conda-forge builds Blosc with an independently configured zstandard library: |
@mkitti I added the |
Currently, we depend on Zstd frame content size being encoded. However, we do know what the uncompressed size of a chunk should be from array information. Perhaps we should not generate an error if the frame content size is not "known" to Zstandard because it is, in fact, known to us. The frame content size may not be encoded if the encoder is using streaming mode and did not pledge the full size of the content in beginning. |
Only if this is the only stage in the decompression pipeline. Multiple byte compression/encoding stages are allowed, or indeed, not shape-preserving array operations too. Side note: I don't believe that zstd, zstandard or cramjam.zstd need the total size in order to decompress, but can use it as an optimization. Also, at least the latter can do decompress_into where the target could be the final array buffer (if it is a contiguous block). |
I think that change would be out of scope for this PR. We should continue this discussion in a new issue. |
* expose checksum toggle for zstd * fixes zstd checksumming * less fixtures * write_checksum -> checksum * adds release notes * set default clevel to 0 * release * update fixtures * fix checksum flag * add test for checksum * adds wrapper codecs for the v2 codec pipeline * docstring
This PR adds the
checksum
flag to thezstd
codec. This is necessary to support the proposedzstd
codec for Zarr3. We need it for the v3 refactoring of zarr-python.TODO: