bottomless: add zstd compression option #468

psarna · 2023-10-17T14:40:39Z

psarna · 2023-10-17T15:21:09Z

xz compression emprically showed ~20-25x better compression ratio than gzip. In my tests xz proved to actually be faster too, because instead of having to write ~200MiB to disk as gzip would, it wrote ~10MiB, at the cost of more cpu used. We should test first, but then strongly consider making xz default.
To set it up, it's enough to set

LIBSQL_BOTTOMLESS_COMPRESSION=zstd

as an env var

Transplanted from libsql/sqld#780

psarna · 2023-10-18T08:48:25Z

@MarinPostma downgraded to draft, because I see some spontaneous errors when testing xz compression on large random data (one that does not compress well). I'll evalutate other compression implementations as well, because xz support looks a little abandoned in the async_compression crate

During stress tests, xz turned out to spontaneously fail to compress, same with bzip2. All compression algos are supported by separate crates, so these were simply ruled out. Zstd proved to be: - fast - correct - more than acceptable on compression ratio

psarna · 2023-10-18T09:14:44Z

Rebranded to zstd. It's a modern one that also powers ScyllaDB, and it first and foremost passed all correctness tests on which xz and bzip2 failed. It's also way faster than any of the other ones, while still providing compression ratios of 12-15 where gzip gives mere 2.

Gzip does not perform well on data in form of libSQL 4KiB pages, and zstd performed uniformly better in all test cases I covered locally (and not worse in case of random data with super high entropy).

psarna requested review from penberg and MarinPostma October 17, 2023 14:40

psarna mentioned this pull request Oct 17, 2023

bottomless: add xz compression option libsql/sqld#780

Open

psarna added 3 commits October 18, 2023 10:21

bottomless: add xz compression option

0a44f6a

Transplanted from libsql/sqld#780

fixup: Decoder should be used instead of Encoder in read.rs

5efb42b

bottomless: update async_compression to 4.4

c2023d4

psarna force-pushed the xz_transplant branch from 7e23a29 to c2023d4 Compare October 18, 2023 08:21

MarinPostma approved these changes Oct 18, 2023

View reviewed changes

psarna marked this pull request as draft October 18, 2023 08:47

bottomless: actually, add zstd

6f96daa

During stress tests, xz turned out to spontaneously fail to compress, same with bzip2. All compression algos are supported by separate crates, so these were simply ruled out. Zstd proved to be: - fast - correct - more than acceptable on compression ratio

psarna changed the title ~~bottomless: add xz compression option~~ bottomless: add zstd compression option Oct 18, 2023

psarna marked this pull request as ready for review October 18, 2023 09:10

bottomless: switch to zstd as default

acda803

Gzip does not perform well on data in form of libSQL 4KiB pages, and zstd performed uniformly better in all test cases I covered locally (and not worse in case of random data with super high entropy).

MarinPostma added this pull request to the merge queue Oct 18, 2023

Merged via the queue into tursodatabase:main with commit 02a3dfb Oct 18, 2023
7 checks passed

LucioFranco pushed a commit that referenced this pull request Oct 18, 2023

disable nagle algorithm for http (#468)

ef8d22d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bottomless: add zstd compression option #468

bottomless: add zstd compression option #468

psarna commented Oct 17, 2023

psarna commented Oct 17, 2023 •

edited

Loading

psarna commented Oct 18, 2023

psarna commented Oct 18, 2023

bottomless: add zstd compression option #468

bottomless: add zstd compression option #468

Conversation

psarna commented Oct 17, 2023

psarna commented Oct 17, 2023 • edited Loading

psarna commented Oct 18, 2023

psarna commented Oct 18, 2023

psarna commented Oct 17, 2023 •

edited

Loading