-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add an option in the zstd CLI to verify that a given .zst file matches an uncompressed file #3287
Comments
What about:
|
If FILE's size changed (very common when editing text) it will do (potentially a lot, if the change is very deep into file) needless work, instead of just checking size of FILE on disk vs. size stored in FILE.zst Right now |
OK, so you are looking for |
That'd enable scripts to use zstd cli to do what I mentioned. Only problem
I can imagine is multi-frame file with hash per frame.
El lun, 17 oct 2022 23:50, Yann Collet ***@***.***> escribió:
… OK, so you are looking for zstd -lv to report the actual value of the
content hash, not just the fact that it exists.
This is likely achievable.
—
Reply to this email directly, view it on GitHub
<#3287 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AASNTZ2CRYCWGKWXLAXDKALWDXC3DANCNFSM6AAAAAARDQ57VY>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
I've written the code to add printing checksum for single frame files with If you find it acceptable I can create a PR or you can just copy paste it. If it's not acceptable let me know what I can fix to make it fit this requirements. I skimmed https://github.com/facebook/zstd/blob/dev/CONTRIBUTING.md and tried to run I don't have any idea for how to print this hash per frame (and I personally don't think I need it, even for multi-gig files zstd seems to produce single frame file?). Here's an example of running it (notice
|
I like it, this is a good PR
It's obviously unrelated to your work.
The normal scenario is one single frame, whatever the size of input. It's fine if your PR only solves the "1-frame" scenario, |
I'm happy to hear the PR is good. Would you like to merge it? Do I need to reassign (c) to you or Facebook for this purpose? I noticed concatenating two zst files with cat creates a multi-frame zst file that uncompresses to original two files, concatenated. I guess it can be useful for concatenating files without recompression in between. |
Yes, |
Generally, the authors of the patches push the PR themselves, for this case though, I created : #3332 |
Thank you. Is there anything else I have to do? |
A nb of CI tests have been failing on #3332, |
Patch merged |
Is your feature request related to a problem? Please describe.
I'd like a switch or two in the CLI to verify that a given file matches the compressed .zst file hash and/or content, to verify if given .zst file is a compressed version of a given normal file.
Describe the solution you'd like
A new switch or two, that'd look and work as such:
It'd first check the filesize, and then hash/data (no point doing the latter if filesize doesn't match).
Describe alternatives you've considered
I've considered writing own C or Python program to do this, but I think it'd fit as part of zstd CLI and be useful in general. Zstd CLI also already has all the functionality: file IO, parsing zstd frames, xxhash, etc. Also
zstd -l
does display the frame count, sizes (human readable, not down to bytes), and that xxhash was used, but does not tell me the 4 low bytes of the 64-bit xxhash so I can't use that with xxhash myself either.Additional context
My use case is that I often work with big text files that I get as .zst, and sometimes I modify them. When I need to free up some space I go delete some of the unmodified files, but wouldn't want to delete a modified one. This option would let me check if given file and the same file + .zst are 'same', and if it's safe to delete the uncompressed one or not.
Another use case could be someone who is paranoid and wants to verify that, maybe it could be part of some extra
--rm
option for very careful people too (I don't know if--rm
now verifies the written file is correct or not).The text was updated successfully, but these errors were encountered: