-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix sequence validation and seqStore bounds check #3439
Fix sequence validation and seqStore bounds check #3439
Conversation
I presume there is no change in performance when sequence validation is not enabled ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job finding and fixing these bugs in the sequence compression API!
Looks good, I just have a couple minor comments.
I ran some benchmarks and there doesn't appear to be any change in performance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
lib/common/error_private.c
Outdated
@@ -53,6 +53,7 @@ const char* ERR_getErrorString(ERR_enum code) | |||
case PREFIX(dstBuffer_wrong): return "Destination buffer is wrong"; | |||
case PREFIX(srcBuffer_wrong): return "Source buffer is wrong"; | |||
case PREFIX(externalMatchFinder_failed): return "External matchfinder returned an error code"; | |||
case PREFIX(invalid_external_sequences): return "External matchfinder returned an error code"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please change this error message to something like "External sequences are not valid"?
79697ca
to
1b65727
Compare
lib/common/error_private.c
Outdated
@@ -53,6 +53,7 @@ const char* ERR_getErrorString(ERR_enum code) | |||
case PREFIX(dstBuffer_wrong): return "Destination buffer is wrong"; | |||
case PREFIX(srcBuffer_wrong): return "Source buffer is wrong"; | |||
case PREFIX(externalMatchFinder_failed): return "External matchfinder returned an error code"; | |||
case PREFIX(invalid_external_sequences): return "External sequences are not valid"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: even error codes have a naming scheme!
Looking at existing ones, the pattern is typically topic_qualifier
.
For this case, it would be something like :
externalSequences_invalid
.
While working on the Sequence Compression API fuzzer, I found bugs in in
ZSTD_validateSequence()
as well as inZSTD_copySequencesToSeqStoreExplicitBlockDelim()
that allow one additionalseqDef
to be written to theseqStore
, past the designated memory allocation.Bug 1
The validation method currently only checks that the
matchLength
of each sequence is greater than the global minimum match length of 3. This is not accurate in cases where theminMatch
in the cctx does not properly correspond with the true minimum match length of the sequences.In most cases, this is fine. However, we currently allocate less memory for
minMatch > 3
than forminMatch = 3
. This means thatZSTD_validateSequence()
does not always properly validate sequences given the memory constraints determined byminMatch
in the cctx.Fix
Fix: See changes in
ZSTD_validateSequence()
We now loosely check that match lengths respect the cctx's
minMatch
. Meaning, we check that all match lengths are greater than 3 whenminMatch
is greater than 3. This is enough to validate thatmaxNbSeq
is large enough to fit all sequences.Bug 2
ZSTD_validateSequence()
is optional, so we need to protect agains these cases even when it is not enabled. We have asserts that catch overwrites to theseqStore
, but these are not run in the production build. There is also a check beforeZSTD_storeSeq()
to ensure that we have not already passed the memory limit. However, this allows us to write one additional sequence pastMaxNbSeq
before throwing an error.We are able to roundtrip without error in cases where we only write one sequence past the given memory limit.
Fix
Fix: See changes in
ZSTD_copySequencesToSeqStoreExplicitBlockDelim()
andZSTD_copySequencesToSeqStoreNoBlockDelim()
. I have changed these checks to return an error if we have already reachedmaxNbSeq
, so that we do not attempt to write an additional sequence.