-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add test for partial decompression #436
base: main
Are you sure you want to change the base?
Conversation
525e54c
to
8599909
Compare
Second commit reorders the tests so all the tests with zlib succeed and then the last test with miniz_oxide fails. |
Thanks a lot for putting the PR together, I think it nicely shows the difference in behaviour. However, I would also have hoped that the code tries to continue decode the input until it received the 'finished' status. To my mind, it's fair that an implementation chooses how much of the input to buffer internally, maybe. After all I really don't know enough about how it should work. CC @oyvindln if they have some thoughts. |
Will try calling again until |
Apparently
But this is not realistic as normally I don't know in advance the size of the output. Could try calling until it emits zero bytes into the output. |
The question here is if That way, at least with the non-miniz-oxide implementation there is a working version of this. From there it should be possible to figure out what's happening with the Thinking about it, |
No, it does not happen. I think This is the current state of the test that passes for #[test]
fn deflate_decoder_partial() {
let input = vec![
210, 82, 8, 12, 245, 15, 113, 12, 242, 247, 15, 81, 240, 244, 115, 242, 143, 80, 80, 10,
45, 78, 45, 82, 40, 44, 205, 47, 73, 84, 226, 229, 210, 130, 200, 163, 136, 42, 104, 4,
135, 248, 7, 57, 186, 187, 42, 152, 155, 41, 24, 27, 152, 27, 25, 24, 104, 242, 114, 57,
26, 24, 24, 24, 42, 248, 123, 43, 184, 167, 150, 128, 213, 21, 229, 231, 151, 40, 36, 231,
231, 22, 228, 164, 150, 164, 166, 40, 104, 24, 232, 129, 20, 104, 43, 128, 104, 3, 133,
226, 212, 228, 98, 77, 61, 94, 46, 0, 0, 0, 0, 255, 255,
];
let expected_output = b"* QUOTAROOT INBOX \"User quota\"\r\n* QUOTA \"User quota\" (STORAGE 76 307200)\r\nA0001 OK Getquotaroot completed (0.001 + 0.000 secs).\r\n";
// Create very small output buffer.
let mut output_buf = [0; 8];
let mut output: Vec<u8> = Vec::new();
let zlib_header = false;
let mut decompress = flate2::Decompress::new(zlib_header);
let flush_decompress = flate2::FlushDecompress::None;
loop {
let prev_out = decompress.total_out();
let status = decompress
.decompress(&input[decompress.total_in() as usize..], &mut output_buf, flush_decompress)
.unwrap();
output.extend_from_slice(&output_buf[..(decompress.total_out() - prev_out) as usize]);
eprintln!("{}", output.len());
// IMAP stream never ends.
assert_ne!(status, flate2::Status::StreamEnd);
if output.len() == expected_output.len() {
assert_eq!(status, flate2::Status::Ok);
break;
}
}
assert_eq!(output.as_slice(), expected_output);
} |
I made the test work with all implementations. So the right condition for stopping is I guess I can adjust async-compression crate to do similar thing now. |
StreamEnd happens in either backend if the deflate stream is well formed and the decompressor encounters the end of a block with the last block flag set. At least in case of zlib, BufError is returned if deflate is called with Z_FINISH as flush mode and it encounters the end of the input data but it the last block did not have an end flag. Thus the stream is technically not well formed but I presume it's treated here as not an error since it seems that this is common to encounter in many protocols and treating it as an error in the stream class may cause issues. miniz_oxide is supposed to behave the same way so if it isn't it's a bug. As for the choice whether to consume up to the size of the internal buffer, or just as much as would fit in the output I don't really know what's the most "correct behaviour. I have to investigate a bit more whether zlib always does the latter as it seems to at least in this situation or if it depends on some parameter/config. (Also not sure whether C miniz did here, whether it different from zlib or whether it was an accidental change during porting - I suspect the latter.) Can try to alter this behaviour in miniz_oxide if people wish. |
I managed to make async-compression work with miniz_oxide backend for flate2: Nullus157/async-compression#294 Still need some cleanup, but I think the change makes sense in any case and depending on how you look at it we may consider miniz_oxide consuming input into internal state NOTABUG/WONTFIX. flate2 documentation should probably not say that it is "consuming only as much input as needed" here: https://docs.rs/flate2/1.0.34/flate2/struct.Decompress.html#method.decompress |
If consuming more data and saving it into internal state is better for performance (no need to parse the same data second time) then I am fine with it, what is needed is better documentation for flate2 (saying that it might consume more than needed and save it into internal state) and maybe miniz_oxide. Also exact meaning for StreamEnd and BufError is not clear. |
I don't really know whether it would make performance worse or not, would have to test, miniz_oxide needs more performance work in any case. The behaviour could maybe also added as a parameter to the deflate function (though that may require a version bump for API change in miniz_oxide, or maybe added as an alternate function) |
Thanks everyone! It seems the only action that can be taken here is to update the documentation to specifically mention how
I like the current setup here where calls to backends seem to be the same, without special adaptations depending on the backend. But I might be wrong about that, and if so, As for this PR, it could certainly be merged with the documentation improvements, and with some clear-text description of what the test is validating. For instance, I would expect that the special handling can be removed once Does that make sense? |
Maybe instead of |
Both work, but I think it requires more comments to explain what is tested, and what the code should rather look like one day so eventually it can be adjusted to reach its final form. |
I did some testing and it seems the old C miniz backend behaved the same way as current miniz_oxide so while I would have to test the C functions to be 100% sure it does seem like this is a difference in behaviour between how the zlib (and I think zlib-ng but would have to check) seems to mainly write to the internal buffer window on exit and uses it as a sort of cache and back-buffer, operating mostly on the output buffer directly otherwise, while miniz/miniz_oxide always writes to an internal buffer and only flushes it when needed or on stream end unless deflate is called only once with the Need to do some more digging whether it's practical to alter miniz_oxide to limit the output to not write more than the size of the output buffer or whether that would require a substantial redesign of the internals. |
I see - this issue has always been present even in the
I wonder if there is any way to fix this here, or to fix |
It seems this PR is stuck and I wonder if we can or should do something about it. If I recall correctly the issue is that the test has to add a special case for Thinking about it, maybe there can be two versions of the test to make the distinction clear - one for |
This adds the test from #434