-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix long offset resolution #3460
fix long offset resolution #3460
Conversation
fb9e376
to
e10676a
Compare
e10676a
to
9e4c66b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! I will also benchmark this PR to make sure I also don't see regressions.
Thanks for adding the test! We might need to try to reduce the memory usage of the test a little bit, otherwise it could be flaky in 32-bit builds, because we could run out of address space.
Co-authored-by: Nick Terrell <[email protected]>
…ellerozenblit/zstd into fix-long-offsets-resolution-pointer
size_t const kNbSequences = 4; | ||
ZSTD_Sequence* sequences = malloc(sizeof(ZSTD_Sequence) * kNbSequences); | ||
void* const checkBuf = malloc(srcSize); | ||
const size_t largeDictSize = 1 << 25; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh nice, you got it to reproduce with 1 << 25
!
In that case we don't need to hide it behind bigTests
. But I still like having one 32-bit test in CI that does run with --big-tests
. That should help improve our 32-bit coverage a bit, without making our users tests flaky.
Co-authored-by: Nick Terrell <[email protected]>
…ellerozenblit/zstd into fix-long-offsets-resolution-pointer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thanks!
We currently determine whether or not
longOffsets
are present in a set of input sequences by checking if thewindowLog
is larger thanSTREAM_ACCUMULATOR_MIN
.This does not account for inputs with a dictionary, where it is legal for offsets to be larger than the window size. In these cases, we don't properly account for large offsets when flushing our bit accumulator in
ZSTD_encodeSequences_body()
.This PR introduces a
largeOffsets
field in theZSTD_symbolEncodingTypeStats_t
which records whether or not large offsets are present in the sequences. This value of this field is finalized inZSTD_seqToCodes()
.I ran some benchmarks for zstd level 1 & 3 and there does not appear to be any measurable change in compression speed.