fix long offset resolution #3460

daniellerozenblit · 2023-01-27T16:26:13Z

We currently determine whether or not longOffsets are present in a set of input sequences by checking if the windowLog is larger than STREAM_ACCUMULATOR_MIN.

This does not account for inputs with a dictionary, where it is legal for offsets to be larger than the window size. In these cases, we don't properly account for large offsets when flushing our bit accumulator in ZSTD_encodeSequences_body().

This PR introduces a largeOffsets field in the ZSTD_symbolEncodingTypeStats_t which records whether or not large offsets are present in the sequences. This value of this field is finalized in ZSTD_seqToCodes().

I ran some benchmarks for zstd level 1 & 3 and there does not appear to be any measurable change in compression speed.

lib/common/zstd_internal.h

terrelln

Looks good to me! I will also benchmark this PR to make sure I also don't see regressions.

Thanks for adding the test! We might need to try to reduce the memory usage of the test a little bit, otherwise it could be flaky in 32-bit builds, because we could run out of address space.

tests/zstreamtest.c

lib/compress/zstd_compress.c

Co-authored-by: Nick Terrell <[email protected]>

…ellerozenblit/zstd into fix-long-offsets-resolution-pointer

.github/workflows/dev-long-tests.yml

terrelln · 2023-01-27T22:43:48Z

tests/zstreamtest.c

+            size_t const kNbSequences = 4;
+            ZSTD_Sequence* sequences = malloc(sizeof(ZSTD_Sequence) * kNbSequences);
+            void* const checkBuf = malloc(srcSize);
+            const size_t largeDictSize = 1 << 25;


Oh nice, you got it to reproduce with 1 << 25!

In that case we don't need to hide it behind bigTests. But I still like having one 32-bit test in CI that does run with --big-tests. That should help improve our 32-bit coverage a bit, without making our users tests flaky.

Co-authored-by: Nick Terrell <[email protected]>

…ellerozenblit/zstd into fix-long-offsets-resolution-pointer

terrelln

Awesome, thanks!

fix long offset resolution

814f4bf

facebook-github-bot added the CLA Signed label Jan 27, 2023

initialize long offsets in decodecorpus

d210628

terrelln requested changes Jan 27, 2023

View reviewed changes

lib/common/zstd_internal.h Outdated Show resolved Hide resolved

daniellerozenblit force-pushed the fix-long-offsets-resolution-pointer branch from fb9e376 to e10676a Compare January 27, 2023 19:40

record long offsets in ZSTD_symbolEncodingTypeStats_t + add test case

9e4c66b

daniellerozenblit force-pushed the fix-long-offsets-resolution-pointer branch from e10676a to 9e4c66b Compare January 27, 2023 20:04

terrelln reviewed Jan 27, 2023

View reviewed changes

tests/zstreamtest.c Outdated Show resolved Hide resolved

tests/zstreamtest.c Show resolved Hide resolved

terrelln reviewed Jan 27, 2023

View reviewed changes

lib/compress/zstd_compress.c Outdated Show resolved Hide resolved

daniellerozenblit and others added 3 commits January 27, 2023 16:58

Update lib/compress/zstd_compress.c

2bde9fb

Co-authored-by: Nick Terrell <[email protected]>

update CI

da589a1

Merge branch 'fix-long-offsets-resolution-pointer' of github.com:dani…

5ec77ad

…ellerozenblit/zstd into fix-long-offsets-resolution-pointer

terrelln reviewed Jan 27, 2023

View reviewed changes

.github/workflows/dev-long-tests.yml Outdated Show resolved Hide resolved

terrelln reviewed Jan 27, 2023

View reviewed changes

daniellerozenblit and others added 3 commits January 28, 2023 12:14

Update .github/workflows/dev-long-tests.yml

295724b

Co-authored-by: Nick Terrell <[email protected]>

remove big test around large offset with small window size

66fae56

Merge branch 'fix-long-offsets-resolution-pointer' of github.com:dani…

0843d9b

…ellerozenblit/zstd into fix-long-offsets-resolution-pointer

daniellerozenblit marked this pull request as ready for review January 30, 2023 14:28

terrelln approved these changes Jan 30, 2023

View reviewed changes

daniellerozenblit merged commit 0017663 into facebook:dev Jan 30, 2023

daniellerozenblit deleted the fix-long-offsets-resolution-pointer branch March 6, 2023 19:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix long offset resolution #3460

fix long offset resolution #3460

daniellerozenblit commented Jan 27, 2023 •

edited

Loading

terrelln left a comment

terrelln Jan 27, 2023

terrelln left a comment

fix long offset resolution #3460

fix long offset resolution #3460

Conversation

daniellerozenblit commented Jan 27, 2023 • edited Loading

terrelln left a comment

Choose a reason for hiding this comment

terrelln Jan 27, 2023

Choose a reason for hiding this comment

terrelln left a comment

Choose a reason for hiding this comment

daniellerozenblit commented Jan 27, 2023 •

edited

Loading