Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bottomless: add zstd compression option #468

Merged
merged 5 commits into from
Oct 18, 2023

Conversation

psarna
Copy link
Collaborator

@psarna psarna commented Oct 17, 2023

Transplanted from libsql/sqld#780

@psarna
Copy link
Collaborator Author

psarna commented Oct 17, 2023

xz compression emprically showed ~20-25x better compression ratio than gzip. In my tests xz proved to actually be faster too, because instead of having to write ~200MiB to disk as gzip would, it wrote ~10MiB, at the cost of more cpu used. We should test first, but then strongly consider making xz default.
To set it up, it's enough to set

LIBSQL_BOTTOMLESS_COMPRESSION=zstd

as an env var

@psarna psarna marked this pull request as draft October 18, 2023 08:47
@psarna
Copy link
Collaborator Author

psarna commented Oct 18, 2023

@MarinPostma downgraded to draft, because I see some spontaneous errors when testing xz compression on large random data (one that does not compress well). I'll evalutate other compression implementations as well, because xz support looks a little abandoned in the async_compression crate

During stress tests, xz turned out to spontaneously fail to compress,
same with bzip2. All compression algos are supported by separate
crates, so these were simply ruled out.
Zstd proved to be:
 - fast
 - correct
 - more than acceptable on compression ratio
@psarna psarna changed the title bottomless: add xz compression option bottomless: add zstd compression option Oct 18, 2023
@psarna psarna marked this pull request as ready for review October 18, 2023 09:10
@psarna
Copy link
Collaborator Author

psarna commented Oct 18, 2023

Rebranded to zstd. It's a modern one that also powers ScyllaDB, and it first and foremost passed all correctness tests on which xz and bzip2 failed. It's also way faster than any of the other ones, while still providing compression ratios of 12-15 where gzip gives mere 2.

Gzip does not perform well on data in form of libSQL 4KiB pages,
and zstd performed uniformly better in all test cases I covered
locally (and not worse in case of random data with super high entropy).
@MarinPostma MarinPostma added this pull request to the merge queue Oct 18, 2023
Merged via the queue into tursodatabase:main with commit 02a3dfb Oct 18, 2023
7 checks passed
LucioFranco pushed a commit that referenced this pull request Oct 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants