-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Row hash tag space initialization speed regression #3528
Comments
yoniko
added a commit
to yoniko/zstd
that referenced
this issue
Mar 7, 2023
This helps to avoid regressions where consecutive compressions use the same tag space with similar data (running `zstd -b5e7 enwik8 -B128K` reproduces this regression).
yoniko
added a commit
to yoniko/zstd
that referenced
this issue
Mar 7, 2023
This helps to avoid regressions where consecutive compressions use the same tag space with similar data (running `zstd -b5e7 enwik8 -B128K` reproduces this regression).
yoniko
added a commit
to yoniko/zstd
that referenced
this issue
Mar 8, 2023
This helps to avoid regressions where consecutive compressions use the same tag space with similar data (running `zstd -b5e7 enwik8 -B128K` reproduces this regression).
This was
linked to
pull requests
Mar 8, 2023
yoniko
added a commit
to yoniko/zstd
that referenced
this issue
Mar 9, 2023
This helps to avoid regressions where consecutive compressions use the same tag space with similar data (running `zstd -b5e7 enwik8 -B128K` reproduces this regression).
yoniko
added a commit
that referenced
this issue
Mar 13, 2023
yoniko
added a commit
that referenced
this issue
Mar 13, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This issue will be used to track the work that has been started in #2971 and #3426 and further work.
Context
Row hash is a fast SIMD-based hash used by various strategies in Zstd.
Other than the normal hash entries it requires an additional space for tags that are hash based and allow further filtration of entries in a bucket.
When streaming data of unknown size (for example, using
ZSTD_compressStream
) we don't have a good way to choose a hashlog and so we pick a large one. This, in turn, makes it so we need to initialize a large tag space.This creates a noticeable regression when compressing small inputs.
A few attempts have been made to fix this, #2971 just removes the initialization but is problematic with Valgrind and might introduce another regression due to the consecutive compressions getting "false positives" from previous compressions' tags.
#3426 expands on #2971 and introduces memory regions that have been initialized at least once (thus not triggering Valgrind) and salts the hash to avoid collisions. However, it was rather complex.
Intended solution
Break down #3426 into multiple PRs and possibly remove some of the functionality introduced there.
This is a grandfather issue so we can tie the broken-down PRs together.
The text was updated successfully, but these errors were encountered: