Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dictionary_stream_round_trip corruption #3416

Closed
danlark1 opened this issue Jan 9, 2023 · 10 comments · Fixed by #3419
Closed

dictionary_stream_round_trip corruption #3416

danlark1 opened this issue Jan 9, 2023 · 10 comments · Fixed by #3419
Assignees

Comments

@danlark1
Copy link
Contributor

danlark1 commented Jan 9, 2023

Describe the bug

We found a fuzzer corruption with 5434de0

dictionary_stream_round_trip.c: 196: Assertion: `!FUZZ_memcmp(src, rBuf, size)' failed. Corruption!

Attaching a file and DEBUGLOG 10

This is the fuzzer file
testcase-4882119846133760-6f37f712c5input.txt

This is DEBUGLOG 10 without 5434de0

out_good.txt

This is DEBUGLOG 10 with 5434de0

out_bad.txt

To Reproduce
Steps to reproduce the behavior:

  1. Downloads data attached
  2. Run CC=clang-15 CXX=clang++-15 python3 fuzz.py build dictionary_stream_round_trip
  3. ./dictionary_stream_round_trip testcase-4882119846133760-6f37f712c5input.txt
  4. See error

We found this on our cluster but I was not able to reproduce it right away. For some reason fuzz data provider set param 1011 differently:

In the left picture is what I got from cluster, in the right what I got from local

2023-01-09-110350_1636x184_scrot

It's probably because of ZSTD_MULTITHREAD define. We fuzz it without it

After I set

--- a/tests/fuzz/zstd_helpers.c
+++ b/tests/fuzz/zstd_helpers.c
@@ -96,7 +96,7 @@ void FUZZ_setRandomParameters(ZSTD_CCtx *cctx, size_t srcSize, FUZZ_dataProducer
     setRand(cctx, ZSTD_c_nbWorkers, 0, 2, producer);
     setRand(cctx, ZSTD_c_rsyncable, 0, 1, producer);
 #endif
-    setRand(cctx, ZSTD_c_useRowMatchFinder, 0, 2, producer);
+    setRand(cctx, ZSTD_c_useRowMatchFinder, 2, 2, producer);
     setRand(cctx, ZSTD_c_enableDedicatedDictSearch, 0, 1, producer);
     setRand(cctx, ZSTD_c_forceMaxWindow, 0, 1, producer);
     setRand(cctx, ZSTD_c_literalCompressionMode, 0, 2, producer);

I was able to reproduce locally with clang-15 as well

Expected behavior
Fuzz passes

Screenshots and charts
If applicable, add screenshots and charts to help explain your problem.

Desktop (please complete the following information):

  • OS: Linux
  • Compiler: clang-15
  • Flags: fuzz flags
  • Build system: Make

Additional context
Hope it reproduces right away

@Cyan4973
Copy link
Contributor

Cyan4973 commented Jan 9, 2023

Thanks @danlark1.
That's Interesting. Changes in 5434de0 are supposed unrelated to ZSTD_c_useRowMatchFinder.
Anyway, let's investigate and reproduce .

@danlark1
Copy link
Contributor Author

danlark1 commented Jan 9, 2023

I don't think ZSTD_c_useRowMatchFinder is a culprit here, it's just me explaining the diff to make this reproducable -- we fuzz without ZSTD_MULTITHREAD and thus data fuzz producer might set some params differently (use more bits before the next setting)

#ifndef ZSTD_MULTITHREAD
    setRand(cctx, ZSTD_c_nbWorkers, 0, 0, producer);
    setRand(cctx, ZSTD_c_rsyncable, 0, 0, producer);
#else
    setRand(cctx, ZSTD_c_nbWorkers, 0, 2, producer);
    setRand(cctx, ZSTD_c_rsyncable, 0, 1, producer);
#endif

@Cyan4973 Cyan4973 self-assigned this Jan 9, 2023
@terrelln
Copy link
Contributor

terrelln commented Jan 9, 2023

@danlark1 yeah our OSS-Fuzz fuzzers are failing similarly, presumably with the same root cause.

Just to double check, you are fuzzing dev, but aren't actually running dev in production right?

@danlark1
Copy link
Contributor Author

danlark1 commented Jan 9, 2023

Well, we decided to live at head (dev) with thorough testing of all new commits across hundreds of clients. We found it quite stable to get all perf and compression benefits and if something fails, we rollback and report here. We pick up new changes once in a week-two

We do the same with most critical libraries like llvm, compilers, compressors, etc

@terrelln
Copy link
Contributor

terrelln commented Jan 9, 2023

@danlark1 just FYI, we base our development around stable releases. Meaning that while we try to avoid all bugs, we strongly rely on our fuzzers, running on dev, to catch subtle bugs. E.g. the normal development workflow is to get tests passing on commits, go through code review, then merge, but we normally don't fuzz extensively before landing commits. We rely on the continuous dev fuzzers.

I don't know exactly what your CI looks like, but if you have significant fuzzing on dev commits before you pick up the commit, that could be a very reasonable strategy.

Anyways, thanks for the report!

@Cyan4973
Copy link
Contributor

Cyan4973 commented Jan 11, 2023

@danlark1 : FYI, I've been trying to reproduce this error locally,
using the test case provided in this issue ( testcase-4882119846133760-6f37f712c5input.txt ).

For some reason, it doesn't fail with local fuzzer test.

./dictionary_stream_round_trip testcase-4882119846133760-6f37f712c5input.txt
Tested 1 files: Success!

I'll try other variations, in case the error condition depends on something beyond the test case,
in the meantime, could the fuzzer test file be verified ?

@danlark1
Copy link
Contributor Author

As I said, you need to tweak a little the row matching param in the zstd_helpers file as I described as we did not use entropy bits for setting ZSTD_c_nbWorkers and ZSTD_c_rsyncable and our fuzzing produced a slightly different file

I can try to extract the file for a clean repo

@danlark1
Copy link
Contributor Author

In other words

Clean:

& danilak @ danilak1 in ~/zstd/tests/fuzz ((5434de01…) …7) 0 [16:52:09]
~ ./dictionary_stream_round_trip ~/Downloads/testcase-4882119846133760-6f37f712c5input.txt                                                                                                             0 [16:52:09]
Tested 1 files: Success!

Changing:

~ git diff                                                                                                                                                                                             0 [16:52:32]
diff --git a/tests/fuzz/zstd_helpers.c b/tests/fuzz/zstd_helpers.c
index d5210627..23a913ab 100644
--- a/tests/fuzz/zstd_helpers.c
+++ b/tests/fuzz/zstd_helpers.c
@@ -96,7 +96,7 @@ void FUZZ_setRandomParameters(ZSTD_CCtx *cctx, size_t srcSize, FUZZ_dataProducer
     setRand(cctx, ZSTD_c_nbWorkers, 0, 2, producer);
     setRand(cctx, ZSTD_c_rsyncable, 0, 1, producer);
 #endif
-    setRand(cctx, ZSTD_c_useRowMatchFinder, 0, 2, producer);
+    setRand(cctx, ZSTD_c_useRowMatchFinder, 2, 2, producer);
     setRand(cctx, ZSTD_c_enableDedicatedDictSearch, 0, 1, producer);
     setRand(cctx, ZSTD_c_forceMaxWindow, 0, 1, producer);
     setRand(cctx, ZSTD_c_literalCompressionMode, 0, 2, producer);
~ ./dictionary_stream_round_trip ~/Downloads/testcase-4882119846133760-6f37f712c5input.txt                                                                                                             0 [16:52:34]
dictionary_stream_round_trip.c: 196: Assertion: `!FUZZ_memcmp(src, rBuf, size)' failed. Corruption!
fish: Job 1, './dictionary_stream_round_trip…' terminated by signal SIGABRT (Abort)

@danlark1
Copy link
Contributor Author

I believe the utmost correct diff is the following:

diff --git a/tests/fuzz/zstd_helpers.c b/tests/fuzz/zstd_helpers.c
index d5210627..c1684537 100644
--- a/tests/fuzz/zstd_helpers.c
+++ b/tests/fuzz/zstd_helpers.c
@@ -89,13 +89,8 @@ void FUZZ_setRandomParameters(ZSTD_CCtx *cctx, size_t srcSize, FUZZ_dataProducer
     setRand(cctx, ZSTD_c_ldmHashRateLog, ZSTD_LDM_HASHRATELOG_MIN,
             ZSTD_LDM_HASHRATELOG_MAX, producer);
     /* Set misc parameters */
-#ifndef ZSTD_MULTITHREAD
     setRand(cctx, ZSTD_c_nbWorkers, 0, 0, producer);
     setRand(cctx, ZSTD_c_rsyncable, 0, 0, producer);
-#else
-    setRand(cctx, ZSTD_c_nbWorkers, 0, 2, producer);
-    setRand(cctx, ZSTD_c_rsyncable, 0, 1, producer);
-#endif
     setRand(cctx, ZSTD_c_useRowMatchFinder, 0, 2, producer);
     setRand(cctx, ZSTD_c_enableDedicatedDictSearch, 0, 1, producer);
     setRand(cctx, ZSTD_c_forceMaxWindow, 0, 1, producer);

We are addressing this in #3417 so that reproducibility is easier

@Cyan4973
Copy link
Contributor

Thanks, I can reproduce the assertion failure with the proposed modification of zstd_helpers.c.

Cyan4973 added a commit that referenced this issue Jan 11, 2023
A minor change in 5434de0 changed a `<=` into a `<`,
and as an indirect consequence allowed compression attempt of literals when there are only 6 literals to compress
(previous limit was effectively 7 literals).

This is not in itself a problem, as the threshold is merely an heuristic,
but it emerged a bug that has always been there, and was just never triggered so far due to the previous limit.
This bug would make the literal compressor believes that all literals are the same symbol,
but for the exact case where nbLiterals==6, plus a pretty wild combination of other limit conditions,
this outcome could be false, resulting in data corruption.

Replaced the blind heuristic by an actual test for all limit cases,
so that even if the threshold is changed again in the future,
the detection of RLE mode will remain reliable.
Cyan4973 added a commit that referenced this issue Jan 11, 2023
A minor change in 5434de0 changed a `<=` into a `<`,
and as an indirect consequence allowed compression attempt of literals when there are only 6 literals to compress
(previous limit was effectively 7 literals).

This is not in itself a problem, as the threshold is merely an heuristic,
but it emerged a bug that has always been there, and was just never triggered so far due to the previous limit.
This bug would make the literal compressor believes that all literals are the same symbol,
but for the exact case where nbLiterals==6, plus a pretty wild combination of other limit conditions,
this outcome could be false, resulting in data corruption.

Replaced the blind heuristic by an actual test for all limit cases,
so that even if the threshold is changed again in the future,
the detection of RLE mode will remain reliable.
Cyan4973 added a commit that referenced this issue Jan 12, 2023
A minor change in 5434de0 changed a `<=` into a `<`,
and as an indirect consequence allowed compression attempt of literals when there are only 6 literals to compress
(previous limit was effectively 7 literals).

This is not in itself a problem, as the threshold is merely an heuristic,
but it emerged a bug that has always been there, and was just never triggered so far due to the previous limit.
This bug would make the literal compressor believes that all literals are the same symbol,
but for the exact case where nbLiterals==6, plus a pretty wild combination of other limit conditions,
this outcome could be false, resulting in data corruption.

Replaced the blind heuristic by an actual test for all limit cases,
so that even if the threshold is changed again in the future,
the detection of RLE mode will remain reliable.
Cyan4973 added a commit that referenced this issue Jan 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants