Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[versions-test] Work around bug in dictionary builder for older versions #3436

Merged
merged 1 commit into from
Jan 20, 2023

Conversation

terrelln
Copy link
Contributor

@terrelln terrelln commented Jan 19, 2023

Older versions of zstandard have a bug in the dictionary builder, that
can cause dictionary building to fail. The process still exits 0, but
the dictionary is not created.

For reference, the bug is that it creates a dictionary that starts with
the zstd dictionary magic, in the process of writing the dictionary header,
but the header isn't fully written yet, and zstd fails compressions in
this case, because the dictionary is malformated. We fixed this later on
by trying to load the dictionary as a zstd dictionary, but if that fails
we fallback to content only (by default).

The fix is to:

  1. Make the dictionary determinsitic by sorting the input files.
    Previously the bug would only sometimes occur, when the input files
    were in a particular order.
  2. If dictionary creation fails, fallback to the head dictionary.

@terrelln terrelln force-pushed the 2023-01-19-version-test-debug branch from c45ec6c to 29f85a3 Compare January 19, 2023 20:15
@terrelln terrelln changed the title 2023 01 19 version test debug [versions-test] Work around bug in dictionary builder for older versions Jan 19, 2023
Older versions of zstandard have a bug in the dictionary builder, that
can cause dictionary building to fail. The process still exits 0, but
the dictionary is not created.

For reference, the bug is that it creates a dictionary that starts with
the zstd dictionary magic, in the process of writing the dictionary header,
but the header isn't fully written yet, and zstd fails compressions in
this case, because the dictionary is malformated. We fixed this later on
by trying to load the dictionary as a zstd dictionary, but if that fails
we fallback to content only (by default).

The fix is to:
1. Make the dictionary determinsitic by sorting the input files.
   Previously the bug would only sometimes occur, when the input files
   were in a particular order.
2. If dictionary creation fails, fallback to the `head` dictionary.
Copy link
Contributor

@Cyan4973 Cyan4973 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great investigation !

@terrelln terrelln merged commit 667eb6d into facebook:dev Jan 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants