-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dictionary_stream_round_trip corruption #3416
Comments
I don't think #ifndef ZSTD_MULTITHREAD
setRand(cctx, ZSTD_c_nbWorkers, 0, 0, producer);
setRand(cctx, ZSTD_c_rsyncable, 0, 0, producer);
#else
setRand(cctx, ZSTD_c_nbWorkers, 0, 2, producer);
setRand(cctx, ZSTD_c_rsyncable, 0, 1, producer);
#endif |
@danlark1 yeah our OSS-Fuzz fuzzers are failing similarly, presumably with the same root cause. Just to double check, you are fuzzing dev, but aren't actually running dev in production right? |
Well, we decided to live at head (dev) with thorough testing of all new commits across hundreds of clients. We found it quite stable to get all perf and compression benefits and if something fails, we rollback and report here. We pick up new changes once in a week-two We do the same with most critical libraries like llvm, compilers, compressors, etc |
@danlark1 just FYI, we base our development around stable releases. Meaning that while we try to avoid all bugs, we strongly rely on our fuzzers, running on I don't know exactly what your CI looks like, but if you have significant fuzzing on Anyways, thanks for the report! |
@danlark1 : FYI, I've been trying to reproduce this error locally, For some reason, it doesn't fail with local fuzzer test.
I'll try other variations, in case the error condition depends on something beyond the test case, |
As I said, you need to tweak a little the row matching param in the zstd_helpers file as I described as we did not use entropy bits for setting ZSTD_c_nbWorkers and ZSTD_c_rsyncable and our fuzzing produced a slightly different file I can try to extract the file for a clean repo |
In other words Clean:
Changing:
|
I believe the utmost correct diff is the following: diff --git a/tests/fuzz/zstd_helpers.c b/tests/fuzz/zstd_helpers.c
index d5210627..c1684537 100644
--- a/tests/fuzz/zstd_helpers.c
+++ b/tests/fuzz/zstd_helpers.c
@@ -89,13 +89,8 @@ void FUZZ_setRandomParameters(ZSTD_CCtx *cctx, size_t srcSize, FUZZ_dataProducer
setRand(cctx, ZSTD_c_ldmHashRateLog, ZSTD_LDM_HASHRATELOG_MIN,
ZSTD_LDM_HASHRATELOG_MAX, producer);
/* Set misc parameters */
-#ifndef ZSTD_MULTITHREAD
setRand(cctx, ZSTD_c_nbWorkers, 0, 0, producer);
setRand(cctx, ZSTD_c_rsyncable, 0, 0, producer);
-#else
- setRand(cctx, ZSTD_c_nbWorkers, 0, 2, producer);
- setRand(cctx, ZSTD_c_rsyncable, 0, 1, producer);
-#endif
setRand(cctx, ZSTD_c_useRowMatchFinder, 0, 2, producer);
setRand(cctx, ZSTD_c_enableDedicatedDictSearch, 0, 1, producer);
setRand(cctx, ZSTD_c_forceMaxWindow, 0, 1, producer); We are addressing this in #3417 so that reproducibility is easier |
Thanks, I can reproduce the assertion failure with the proposed modification of |
A minor change in 5434de0 changed a `<=` into a `<`, and as an indirect consequence allowed compression attempt of literals when there are only 6 literals to compress (previous limit was effectively 7 literals). This is not in itself a problem, as the threshold is merely an heuristic, but it emerged a bug that has always been there, and was just never triggered so far due to the previous limit. This bug would make the literal compressor believes that all literals are the same symbol, but for the exact case where nbLiterals==6, plus a pretty wild combination of other limit conditions, this outcome could be false, resulting in data corruption. Replaced the blind heuristic by an actual test for all limit cases, so that even if the threshold is changed again in the future, the detection of RLE mode will remain reliable.
A minor change in 5434de0 changed a `<=` into a `<`, and as an indirect consequence allowed compression attempt of literals when there are only 6 literals to compress (previous limit was effectively 7 literals). This is not in itself a problem, as the threshold is merely an heuristic, but it emerged a bug that has always been there, and was just never triggered so far due to the previous limit. This bug would make the literal compressor believes that all literals are the same symbol, but for the exact case where nbLiterals==6, plus a pretty wild combination of other limit conditions, this outcome could be false, resulting in data corruption. Replaced the blind heuristic by an actual test for all limit cases, so that even if the threshold is changed again in the future, the detection of RLE mode will remain reliable.
A minor change in 5434de0 changed a `<=` into a `<`, and as an indirect consequence allowed compression attempt of literals when there are only 6 literals to compress (previous limit was effectively 7 literals). This is not in itself a problem, as the threshold is merely an heuristic, but it emerged a bug that has always been there, and was just never triggered so far due to the previous limit. This bug would make the literal compressor believes that all literals are the same symbol, but for the exact case where nbLiterals==6, plus a pretty wild combination of other limit conditions, this outcome could be false, resulting in data corruption. Replaced the blind heuristic by an actual test for all limit cases, so that even if the threshold is changed again in the future, the detection of RLE mode will remain reliable.
Describe the bug
We found a fuzzer corruption with 5434de0
Attaching a file and DEBUGLOG 10
This is the fuzzer file
testcase-4882119846133760-6f37f712c5input.txt
This is DEBUGLOG 10 without 5434de0
out_good.txt
This is DEBUGLOG 10 with 5434de0
out_bad.txt
To Reproduce
Steps to reproduce the behavior:
CC=clang-15 CXX=clang++-15 python3 fuzz.py build dictionary_stream_round_trip
./dictionary_stream_round_trip testcase-4882119846133760-6f37f712c5input.txt
We found this on our cluster but I was not able to reproduce it right away. For some reason fuzz data provider set param 1011 differently:
In the left picture is what I got from cluster, in the right what I got from local
It's probably because of ZSTD_MULTITHREAD define. We fuzz it without it
After I set
I was able to reproduce locally with clang-15 as well
Expected behavior
Fuzz passes
Screenshots and charts
If applicable, add screenshots and charts to help explain your problem.
Desktop (please complete the following information):
Additional context
Hope it reproduces right away
The text was updated successfully, but these errors were encountered: