bottomless: add xz compression option #780

psarna · 2023-10-16T10:17:04Z

Empirical testing shows, that gzip achieves mere x2 compression ratio even with very simple and repeatable data patterns. Since compression is very important for optimizing our egress traffic and throughput in general, .xz algorithm is hereby implemented as well. Ran with the same data set, it achieved ~x50 compression ratio, which is orders of magnitude better than gzip, at the cost of elevated CPU usage.

Note: with more algos implemented, we should also consider adding code that detects which compression methods was used when restoring a snapshot, to allow restoring from a gzip file, but continue new snapshots with xz. Currently, setting the compression methods via the env var assumes that both restore and backup use the same algorithm.

psarna · 2023-10-16T10:21:25Z

TODO: I still need to go over the code and check if there are no more hardcoded assumptions about using gzip for backups.

haaawk

LGTM. How do we choose which compression is used? Is there an env var?

Empirical testing shows, that gzip achieves mere x2 compression ratio even with very simple and repeatable data patterns. Since compression is very important for optimizing our egress traffic and throughput in general, .xz algorithm is hereby implemented as well. Ran with the same data set, it achieved ~x50 compression ratio, which is orders of magnitude better than gzip, at the cost of elevated CPU usage. Note: with more algos implemented, we should also consider adding code that detects which compression methods was used when restoring a snapshot, to allow restoring from a gzip file, but continue new snapshots with xz. Currently, setting the compression methods via the env var assumes that both restore and backup use the same algorithm.

The reasoning is as follows: 10000 uncompressed frames weigh 40MiB. Gzip is expected to create a ~20MiB file from them, while xz can compress it down to ~800KiB. The previous limit would make xz create a 50KiB file, which is less than the minimum 128KiB that S3-like services charge for when writing to an object store.

psarna · 2023-10-17T08:01:56Z

env var, LIBSQL_BOTTOMLESS_COMPRESSION=xz. But before we go ahead with this, I think I need to add code that detects the previous compression scheme on restore. Without that, it will be impossible to restore from a gz, but use xz for all new backups.

psarna · 2023-10-17T11:13:19Z

I'm getting corrupted .xz files produced with this crate in "Best" compression level. Let me try the default one, but that's off. The file compressed with the crate didn't properly unpack with xz -d shell command, which is suspicious.

psarna · 2023-10-17T11:14:17Z

(yep, regular compression level works, and looks only ~10% worse than Best)

Best level seems to produce corrupted files.

psarna · 2023-10-17T11:48:26Z

There's one more place where compression isn't correctly autodetected - in loading main db snapshots. I'l add the code

If the db snapshot is not found with given compression algo, other choices are checked too. This code will fire if somebody used to use Gzip, but then decided to restore a database that declares to use Xz for compressing bottomless.

psarna · 2023-10-17T14:29:13Z

k, done

Transplanted from libsql/sqld#780

psarna · 2023-10-17T14:40:55Z

Transplanted to the new repo: tursodatabase/libsql#468

Transplanted from libsql/sqld#780

psarna requested review from penberg and haaawk October 16, 2023 10:20

haaawk approved these changes Oct 16, 2023

View reviewed changes

psarna added 2 commits October 16, 2023 12:27

psarna force-pushed the xz branch from 6c6b1ac to 7512319 Compare October 16, 2023 10:27

bottomless: use default compression level for xz

1fa6774

Best level seems to produce corrupted files.

psarna added a commit to psarna/libsql that referenced this pull request Oct 17, 2023

bottomless: add xz compression option

6f6da06

Transplanted from libsql/sqld#780

psarna mentioned this pull request Oct 17, 2023

bottomless: add zstd compression option tursodatabase/libsql#468

Merged

psarna added a commit to psarna/libsql that referenced this pull request Oct 18, 2023

bottomless: add xz compression option

0a44f6a

Transplanted from libsql/sqld#780

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bottomless: add xz compression option #780

bottomless: add xz compression option #780

psarna commented Oct 16, 2023

psarna commented Oct 16, 2023

haaawk left a comment

psarna commented Oct 17, 2023

psarna commented Oct 17, 2023

psarna commented Oct 17, 2023

psarna commented Oct 17, 2023

psarna commented Oct 17, 2023

psarna commented Oct 17, 2023

bottomless: add xz compression option #780

Are you sure you want to change the base?

bottomless: add xz compression option #780

Conversation

psarna commented Oct 16, 2023

psarna commented Oct 16, 2023

haaawk left a comment

Choose a reason for hiding this comment

psarna commented Oct 17, 2023

psarna commented Oct 17, 2023

psarna commented Oct 17, 2023

psarna commented Oct 17, 2023

psarna commented Oct 17, 2023

psarna commented Oct 17, 2023