-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non-compliant decoding according to section 4? #502
Comments
This piece of code is very tricky; it allows continuation after break in decoding. |
I believe there is in a mismatch in the increments and decrements in the case where the implicit zero command is repeatedly used and the static dictionary case is hit. It seems that it will short-circuit the increment logic (which only occurs in the dynamic dictionary case), causing the handling of implicit zeros to accidentally decrement one too much each time. I compiled with debugging and logging enabled and ran
The |
Oh! I see. Thanks for the report. Going to fix that ASAP. |
Decoder may have produced invalid output if: * at offset 0..3 dictionary word with index 3..0 for some length N is used and distance is encoded with direct distance code 0, and * at least one of next 4 commands use value from distance ringbuffer
Fixed. Again, thank you! This bug is a new "survival champion" - it was introduced about 1.5 years ago =O |
Testing 0749d9c does not give me the same decompressed output as @dsnet proposes in his initial report of this issue. Decompressing hex
instead of the expected
@dsnet's "wrong" output before 0749d9c matches mine, so I am inclined to believe that his fixed output is also correct, in which case the fix is wrong. Could @dsnet or @eustas please shed some light? Thanks! PS: Will there be a test case? |
@ralfjunker, what implementation is yours? Mine repeats "orld" until the 10000 byte limit is hit. |
My interpretation of what should happen (assuming all implementations agree about huffman decoding of commands): There is a repeated sequence of commands of the form (length:5, dist: 0). According to section 4, dist:0 should result in the last distance being used, which defaults to 4 (without updating the distance ring buffer; not that it matters). Thus, this is really a repeated sequence of (length:5, dist:4) commands. For the first command, there is no output, so it falls under the static dictionary case according to:
This copies "world" to the output. For all subsequent commands, there is sufficient data in the output such that a distance of 4 is always satisfied by the dynamic dictionary. Thus, copying "orldo" the first time, "rldor" the second time, "ldorl" the third time, etc. |
For testing the fix we have internally added 3 synthetic tests: TEST(SynthTest, IntactDistanceRingBuffer0) {
uint8_t compressed[] = {
0x1b, 0x0a, 0x00, 0x00, 0x00, 0x00, 0x80, 0xe3, 0xb4, 0x0d, 0x00, 0x00,
0x07, 0x5b, 0x26, 0x31, 0x40, 0x02, 0x00, 0xe0, 0x4e, 0x1b, 0xa1, 0x80,
0x20, 0x00
};
Verify(
R"(
main_header
metablock_header_easy: 11, 1
command_inscopy_easy: 0, 7 // "himself" from dictionary
bits: "000000" // distance = 4 from RB; RB remains intact
command_inscopy_easy: 0, 4 // copy "self"
bits: "000000" // distance = 4 from RB; RB remains intact
)",
compressed, sizeof(compressed),
BROTLI_DECODER_SUCCESS,
"himselfself");
}
TEST(SynthTest, IntactDistanceRingBuffer1) {
uint8_t compressed[] = {
0x1b, 0x09, 0x00, 0x00, 0x00, 0x00, 0x80, 0xe3, 0xb4, 0x0d, 0x00, 0x00,
0x07, 0x5b, 0x26, 0x31, 0x40, 0x02, 0x00, 0xe0, 0x4e, 0x1b, 0x21, 0xa0,
0x20, 0x00
};
Verify(
R"(
main_header
metablock_header_easy: 10, 1
command_inscopy_easy: 0, 6 // "scroll" from dictionary
bits: "100000" // distance = 11 from RB; RB remains intact
command_inscopy_easy: 0, 4 // copy "roll"
bits: "000000" // distance = 4 from RB; RB remains intact
)",
compressed, sizeof(compressed),
BROTLI_DECODER_SUCCESS,
"scrollroll");
}
TEST(SynthTest, IntactDistanceRingBuffer2) {
uint8_t compressed[] = {
0x1b, 0x0f, 0x00, 0x00, 0x00, 0x00, 0x80, 0xe3, 0xb4, 0x0d, 0x00, 0x00,
0x07, 0x5b, 0x26, 0x31, 0x40, 0x02, 0x00, 0xe0, 0x4e, 0x1b, 0x41, 0x80,
0x20, 0x50, 0x10, 0x24, 0x08, 0x06
};
Verify(
R"(
main_header
metablock_header_easy: 16, 1
command_inscopy_easy: 0, 4 // "left" from dictionary (index = 3 = 4 - 1)
bits: "000000" // distance = 4 from RB; RB remains intact
command_inscopy_easy: 0, 4 // "data" from dictionary (index = 6 = 11 - 5)
bits: "100000" // distance = 11 from RB; RB remains intact
command_inscopy_easy: 0, 4 // "data" from dictionary (index = 6 = 15 - 9)
bits: "010000" // distance = 15 from RB; RB remains intact
command_inscopy_easy: 0, 4 // "left" from dictionary (index = 3 = 16 - 13)
bits: "110000" // distance = 16 from RB; RB remains intact
)",
compressed, sizeof(compressed),
BROTLI_DECODER_SUCCESS,
"leftdatadataleft");
} The first parameter of |
Ooops, missed the case when distance code is baked into command. |
I hope, now it is finally fixed with #506 =) |
Decoder may have produced invalid output if: * at offset 0..3 dictionary word with index 3..0 for some length N is used and distance is encoded with direct distance code 0, and * at least one of next 4 commands use value from distance ringbuffer
I was running fuzz testing on my Brotli implementation and noticed that the following input differs when decompressed by my version compared to the canonical one here.
The input causing issues (in hex):
e2e184886a563080e0391603e430f9
When decompressed by the
google/brotli
C implementation:worldwhitemediafirstirstirstirstirstirstirstirstirstirstirst...
When decompressed by my implementation:
worldorldorldorldorldorldorldorldorldorldorldorldorldorldorl...
According to section 4 of the RFC:
However, this seems contrary to what C decoder currently does:
The logic in the C version seems contrary to the stated prose in the RFC, which makes no mention that an implicit zero reverse rotates the distance ring-buffer.
The text was updated successfully, but these errors were encountered: