-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure buffers are flushed to avoid decompression stream underread #251
base: main
Are you sure you want to change the base?
Conversation
When reading from a decompressor that was constructed using `Decoder::with_buffer`, the decoder may not consume all of the input by the time it returns all of the decompressed output. This means when you call `.finish()` on the decoder to get the underlying stream back, it is not pointing after the end of the compressed data. This commit adds a test that demonstrates the issue.
299cd5b
to
f2b80a6
Compare
Hi, and thanks for the PR! The comment from the zstd lib refers to internal buffers that need to be flushed: when no more input is required, but more output is available. I'm not against calling zstd again to make sure we fed it the entire frame, but I have a few concerns with the approach here:
What do you think? |
After the decoder stream has yielded all of the uncompressed data, it is possible for the input stream to still not be fully consumed. This means if we extract the inner stream at this point, it will not be pointing to the end of the compressed data. From the [zstd documentation](https://facebook.github.io/zstd/zstd_manual.html#Chapter9) for `ZSTD_decompressStream`: > But if `output.pos == output.size`, there might be some data left within internal buffers. > In which case, call ZSTD_decompressStream() again to flush whatever remains in the buffer. This is only necessary if the caller wants the stream back, so at that point we can force an additional call to `ZSTD_decompressStream` by reading to a zero-length buffer.
f2b80a6
to
ea02a36
Compare
Moving it to I do think this comment in the ZSTD documentation is relevant here. Specifically, in the test I added, while the compressed data is 21 bytes long, at the point where all data is yielded (i.e. This is the only way to successfully read up to the end of a compressed object in a stream with non-compressed data following it. If you continue to read past the end of the decompressed data, the stream doesn't end, but instead it attempts to decompress the non-compressed data, and panics ( |
Hi @gyscos. Do you have any thoughts on my updated PR? Hopefully I've addressed your concerns, but if not, please let me know what else you'd like to see addressed. |
When reading from a decompressor that was constructed using
Decoder::with_buffer
, the decoder may not consume all of the input by the time it returns all of the decompressed output. This means when you call.finish()
on the decoder to get the underlying stream back, it is not pointing after the end of the compressed data.This is expected from the way that the decoder makes use of
ZSTD_decompressStream
. From the zstd documentation forZSTD_decompressStream
:At the point the decoder stream has yielded all of the decompressed data, we have not done this additional call. It is only necessary if the caller wants the underlying stream back, so at that point we can force an additional call to
ZSTD_decompressStream
by reading to a zero-length buffer.This PR adds a test that demonstrates the issue, then adds the flush to prevent it.