Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-compliant decoding according to section 4? #502

Closed
dsnet opened this issue Feb 6, 2017 · 11 comments
Closed

Non-compliant decoding according to section 4? #502

dsnet opened this issue Feb 6, 2017 · 11 comments

Comments

@dsnet
Copy link
Contributor

dsnet commented Feb 6, 2017

I was running fuzz testing on my Brotli implementation and noticed that the following input differs when decompressed by my version compared to the canonical one here.

The input causing issues (in hex): e2e184886a563080e0391603e430f9

When decompressed by the google/brotli C implementation:
worldwhitemediafirstirstirstirstirstirstirstirstirstirstirst...
When decompressed by my implementation:
worldorldorldorldorldorldorldorldorldorldorldorldorldorldorl...

According to section 4 of the RFC:

When a distance symbol 0 appears, the distance it represents (i.e.,
the last distance in the sequence of distances) is not pushed to the
ring buffer of last distances

However, this seems contrary to what C decoder currently does:

if (s->distance_code >= 0) {
	--s->dist_rb_idx;
	s->distance_code = s->dist_rb[s->dist_rb_idx & 3];
	goto postReadDistance;  /* We already have the implicit distance */
}

The logic in the C version seems contrary to the stated prose in the RFC, which makes no mention that an implicit zero reverse rotates the distance ring-buffer.

@eustas
Copy link
Collaborator

eustas commented Feb 6, 2017

This piece of code is very tricky; it allows continuation after break in decoding.
We use special value (-1) to specify that distance is not decoded yet.
If distance is decoded, we retake it from distance ring-buffer.
Please, take a look where s->dist_rb_idx is incremented and decremented. For distance == 0 it has one extra decrement, so after processing the command it remains unchanged.

@dsnet
Copy link
Contributor Author

dsnet commented Feb 6, 2017

I believe there is in a mismatch in the increments and decrements in the case where the implicit zero command is repeatedly used and the static dictionary case is hit. It seems that it will short-circuit the increment logic (which only occurs in the dynamic dictionary case), causing the handling of implicit zeros to accidentally decrement one too much each time.

I compiled with debugging and logging enabled and ran bro -d. The relevant snippet is:

...
[ProcessCommandsInternal] pos = 0 insert = 0 copy = 5
IMPLICIT_ZERO
[ProcessCommandsInternal] pos = 0 distance = 4
[ProcessCommandsInternal] dictionary word: [world]
[ProcessCommandsInternal] s->meta_block_remaining_len = 9995
[ProcessCommandsInternal] pos = 5 insert = 0 copy = 5
IMPLICIT_ZERO
[ProcessCommandsInternal] pos = 5 distance = 11
[ProcessCommandsInternal] dictionary word: [white]
[ProcessCommandsInternal] s->meta_block_remaining_len = 9990
[ProcessCommandsInternal] pos = 10 insert = 0 copy = 5
IMPLICIT_ZERO
[ProcessCommandsInternal] pos = 10 distance = 15
[ProcessCommandsInternal] dictionary word: [media]
[ProcessCommandsInternal] s->meta_block_remaining_len = 9985
[ProcessCommandsInternal] pos = 15 insert = 0 copy = 5
IMPLICIT_ZERO
[ProcessCommandsInternal] pos = 15 distance = 16
[ProcessCommandsInternal] dictionary word: [first]
[ProcessCommandsInternal] s->meta_block_remaining_len = 9980
[ProcessCommandsInternal] pos = 20 insert = 0 copy = 5
IMPLICIT_ZERO
[ProcessCommandsInternal] pos = 20 distance = 4
UPDATE_RING_BUFFER
[ProcessCommandsInternal] pos = 25 insert = 0 copy = 5
IMPLICIT_ZERO
[ProcessCommandsInternal] pos = 25 distance = 4
UPDATE_RING_BUFFER
[ProcessCommandsInternal] pos = 30 insert = 0 copy = 5
IMPLICIT_ZERO
...

The IMPLICIT_ZERO message was a printf added after L1706 of decode.c. In the log printout, you see that the distance is sweeping through the sequences [4, 11, 15, 16, 4, 4, 4, 4, 4, ...]. Notice it how it goes in reverse order through the initial distances in the ring buffer (i.e., 4, 11, 15, 16) even though only the zero code is used? After exhausting the ring buffer, it no longer performs static dictionary lookups, and starts doing dynamic dictionary lookups, causing it to properly stay at 4 for the rest of the time.

@eustas
Copy link
Collaborator

eustas commented Feb 7, 2017

Oh! I see. Thanks for the report. Going to fix that ASAP.

eustas added a commit that referenced this issue Feb 7, 2017
Decoder may have produced invalid output if:
 * at offset 0..3 dictionary word with index 3..0 for some length N is used
   and distance is encoded with direct distance code 0, and
 * at least one of next 4 commands use value from distance ringbuffer
@eustas eustas closed this as completed in 0749d9c Feb 7, 2017
@eustas
Copy link
Collaborator

eustas commented Feb 7, 2017

Fixed. Again, thank you! This bug is a new "survival champion" - it was introduced about 1.5 years ago =O

@ralfjunker
Copy link

Testing 0749d9c does not give me the same decompressed output as @dsnet proposes in his initial report of this issue.

Decompressing hex e2e184886a563080e0391603e430f9 results in

worldsmallmallmallmallmallmallmallmallmallmallmallmallma... (10000 bytes in total)

instead of the expected

worldorldorldorldorldorldorldorldorldorldorldorldorldorldor... (no total given).

@dsnet's "wrong" output before 0749d9c matches mine, so I am inclined to believe that his fixed output is also correct, in which case the fix is wrong.

Could @dsnet or @eustas please shed some light? Thanks!

PS: Will there be a test case?

@dsnet
Copy link
Contributor Author

dsnet commented Feb 7, 2017

@ralfjunker, what implementation is yours?

Mine repeats "orld" until the 10000 byte limit is hit.

@dsnet
Copy link
Contributor Author

dsnet commented Feb 7, 2017

My interpretation of what should happen (assuming all implementations agree about huffman decoding of commands):

There is a repeated sequence of commands of the form (length:5, dist: 0).

According to section 4, dist:0 should result in the last distance being used, which defaults to 4 (without updating the distance ring buffer; not that it matters).

Thus, this is really a repeated sequence of (length:5, dist:4) commands.

For the first command, there is no output, so it falls under the static dictionary case according to:

if distance is less than the max allowed distance plus one
	move backwards distance bytes in the uncompressed data,
	and copy CLEN bytes from this position to
	the uncompressed stream
else
	look up the static dictionary word, transform the word as
	directed, and copy the result to the uncompressed stream

This copies "world" to the output.

For all subsequent commands, there is sufficient data in the output such that a distance of 4 is always satisfied by the dynamic dictionary. Thus, copying "orldo" the first time, "rldor" the second time, "ldorl" the third time, etc.

@ralfjunker
Copy link

I am using the C implementation from this repository, in particular this check-in: 0749d9c.

It repeats mall until it reaches 10000 bytes.

Did you test @eustas' fix to this C implementation?

@eustas
Copy link
Collaborator

eustas commented Feb 8, 2017

For testing the fix we have internally added 3 synthetic tests:

TEST(SynthTest, IntactDistanceRingBuffer0) {
  uint8_t compressed[] = {
    0x1b, 0x0a, 0x00, 0x00, 0x00, 0x00, 0x80, 0xe3, 0xb4, 0x0d, 0x00, 0x00,
    0x07, 0x5b, 0x26, 0x31, 0x40, 0x02, 0x00, 0xe0, 0x4e, 0x1b, 0xa1, 0x80,
    0x20, 0x00
  };
  Verify(
      R"(
      main_header
      metablock_header_easy: 11, 1
      command_inscopy_easy: 0, 7 // "himself" from dictionary
      bits: "000000" // distance = 4 from RB; RB remains intact
      command_inscopy_easy: 0, 4 // copy "self"
      bits: "000000" // distance = 4 from RB; RB remains intact
      )",
      compressed, sizeof(compressed),
      BROTLI_DECODER_SUCCESS,
      "himselfself");
}

TEST(SynthTest, IntactDistanceRingBuffer1) {
  uint8_t compressed[] = {
    0x1b, 0x09, 0x00, 0x00, 0x00, 0x00, 0x80, 0xe3, 0xb4, 0x0d, 0x00, 0x00,
    0x07, 0x5b, 0x26, 0x31, 0x40, 0x02, 0x00, 0xe0, 0x4e, 0x1b, 0x21, 0xa0,
    0x20, 0x00
  };
  Verify(
      R"(
      main_header
      metablock_header_easy: 10, 1
      command_inscopy_easy: 0, 6 // "scroll" from dictionary
      bits: "100000" // distance = 11 from RB; RB remains intact
      command_inscopy_easy: 0, 4 // copy "roll"
      bits: "000000" // distance = 4 from RB; RB remains intact
      )",
      compressed, sizeof(compressed),
      BROTLI_DECODER_SUCCESS,
      "scrollroll");
}

TEST(SynthTest, IntactDistanceRingBuffer2) {
  uint8_t compressed[] = {
    0x1b, 0x0f, 0x00, 0x00, 0x00, 0x00, 0x80, 0xe3, 0xb4, 0x0d, 0x00, 0x00,
    0x07, 0x5b, 0x26, 0x31, 0x40, 0x02, 0x00, 0xe0, 0x4e, 0x1b, 0x41, 0x80,
    0x20, 0x50, 0x10, 0x24, 0x08, 0x06
  };
  Verify(
      R"(
      main_header
      metablock_header_easy: 16, 1
      command_inscopy_easy: 0, 4 // "left" from dictionary (index = 3 = 4 - 1)
      bits: "000000" // distance = 4 from RB; RB remains intact
      command_inscopy_easy: 0, 4 // "data" from dictionary (index = 6 = 11 - 5)
      bits: "100000" // distance = 11 from RB; RB remains intact
      command_inscopy_easy: 0, 4 // "data" from dictionary (index = 6 = 15 - 9)
      bits: "010000" // distance = 15 from RB; RB remains intact
      command_inscopy_easy: 0, 4 // "left" from dictionary (index = 3 = 16 - 13)
      bits: "110000" // distance = 16 from RB; RB remains intact
      )",
      compressed, sizeof(compressed),
      BROTLI_DECODER_SUCCESS,
      "leftdatadataleft");
}

The first parameter of Verify is DSL code snippet used to generate compressed.
The last parameter is an expected output string.

@eustas eustas reopened this Feb 8, 2017
@eustas
Copy link
Collaborator

eustas commented Feb 8, 2017

Ooops, missed the case when distance code is baked into command.

@eustas
Copy link
Collaborator

eustas commented Feb 8, 2017

I hope, now it is finally fixed with #506 =)

@eustas eustas closed this as completed Feb 8, 2017
danielrh added a commit to dropbox/rust-brotli-decompressor that referenced this issue Nov 4, 2018
juj pushed a commit to Unity-Technologies/brotli that referenced this issue May 13, 2023
Decoder may have produced invalid output if:
 * at offset 0..3 dictionary word with index 3..0 for some length N is used
   and distance is encoded with direct distance code 0, and
 * at least one of next 4 commands use value from distance ringbuffer
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants