implement compression #14

tych0 · 2021-02-16T16:45:40Z

Right now all the metadata and file contents for puzzlefs are uncompressed. We should compress them.

The design format of puzzlefs constrains the use of compression a bit though: we need to be able to read data from an arbitrary offset (the compression community calls this a "seekable compression format").

People like zstd as a compression format, and it has out-of-tree implementations of this with rust bindings:

https://github.com/facebook/zstd/blob/dev/contrib/seekable_format/zstd_seekable_compression_format.md
https://crates.io/crates/zstd-seekable

which may be interesting.

tych0 · 2021-03-24T01:52:21Z

We have some basic support for this now with d52efd2; let's see how it goes.

ariel-miculas · 2023-02-14T14:31:23Z

from squashfs

For fast random access, compressed files are split up in fixed size blocks that are compressed separately. The block size can be set between 4k and 1M (default for squashfs-tools and squashfs-tools-ng is 128K).

from zstd_seekable

This is done by splitting up the input data into frames, each of which are compressed independently, and so can be decompressed independently. Decompression then takes advantage of a provided 'seek table', which allows the decompressor to immediately jump to the desired data.

Depending on the chunking parameters we choose, the seekable compression may not be worth it. For example, let's say we pick 16KB for the average chunk and 64KB for the maximum chunk. Then the maximum chunk size is less than the squashfs default block size of 128KB. Also, zstd seekable compression includes extra metadata for storing information about the offsets, resulting in a size increase for the compressed chunks. Plus we'll be stuck with zstd.

hallyn · 2023-02-14T14:39:00Z

Could you post your deduplication performance results, or maybe just a concise summary?

IIUC, tuning the parameters made it so that indeed compression may not be needed.

ariel-miculas · 2023-02-14T14:45:19Z

I still think we need compression, I'm only arguing against seekable compression.

hallyn · 2023-02-14T15:22:49Z

Sounds good.

ariel-miculas · 2023-02-15T10:09:58Z

An interesting point from squashfs is that they're not compressing blocks if this would result in a larger size, in this case they're storing the block uncompressed, perhaps we should also implement this.
Or even this:

By checking compression ratios and storing compressed only those chunks whose compression ratios meet a certain threshold, the amount of decompression involved during data access can be reduced
...
This allows for capturing most of the savings of chunk compression while eliminating the cost of decompression on the majority of chunks.

from the microsoft paper

ariel-miculas · 2023-02-15T12:16:09Z

Related: containers/image#1084

ariel-miculas · 2023-02-22T08:17:32Z

We can have optional support for compression and we need a new field in BlobRef to indicate whether the blob is compressed or not. The blob still needs to be content-addressed, so the sha256sum of the compressed blob needs to be computed.

ariel-miculas · 2023-02-22T15:00:17Z

We should also process the files in parallel during puzzlefs build in order to speed up the compression.

ariel-miculas · 2023-03-15T18:46:14Z

next steps:

measure read performance impact

ariel-miculas · 2023-03-17T09:09:09Z

Setup:

Build and mount the filesystem:

target/release/puzzlefs build ../test-puzzlefs/real_rootfs/barehost/rootfs /tmp/oci-barehost test
target/release/puzzlefs mount -f /tmp/oci-barehost test /tmp/puzzle

Reading an entire filesystem:

Compression disabled

$ time fd -tf -x cat > /dev/null
fd -tf -x cat > /dev/null  9.64s user 2.54s system 280% cpu 4.338 total

$ hyperfine --prepare 'sync; echo 3 | sudo tee /proc/sys/vm/drop_caches' 'fd -tf -x cat > /dev/null'

Benchmark 1: fd -tf -x cat > /dev/null
  Time (mean ± σ):      6.233 s ±  0.455 s    [User: 12.785 s, System: 4.290 s]
  Range (min … max):    5.503 s …  7.020 s    10 runs

Compression enabled

$ time fd -tf -x cat > /dev/null
fd -tf -x cat > /dev/null  8.59s user 2.13s system 57% cpu 18.549 total

$ hyperfine --prepare 'sync; echo 3 | sudo tee /proc/sys/vm/drop_caches' 'fd -tf -x cat > /dev/null'

Benchmark 1: fd -tf -x cat > /dev/null
  Time (mean ± σ):     20.681 s ±  1.989 s    [User: 10.174 s, System: 2.866 s]
  Range (min … max):   18.057 s … 24.472 s    10 runs

Comparison with squashfuse

squashfuse -f barehost.sqhs /tmp/squash

$ time fd -tf -x cat > /dev/null
fd -tf -x cat > /dev/null  11.50s user 3.83s system 96% cpu 15.816 total

$ hyperfine --prepare 'sync; echo 3 | sudo tee /proc/sys/vm/drop_caches' 'fd -tf -x cat > /dev/null'

Benchmark 1: fd -tf -x cat > /dev/null
  Time (mean ± σ):     13.134 s ±  1.194 s    [User: 10.361 s, System: 2.903 s]
  Range (min … max):   11.355 s … 16.096 s    10 runs

ariel-miculas self-assigned this Feb 14, 2023

ariel-miculas mentioned this issue Mar 31, 2023

Initial support for compressed data blobs with zstd compression #83

Merged

hallyn closed this as completed in #83 Apr 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implement compression #14

implement compression #14

tych0 commented Feb 16, 2021

tych0 commented Mar 24, 2021

ariel-miculas commented Feb 14, 2023

hallyn commented Feb 14, 2023

ariel-miculas commented Feb 14, 2023

hallyn commented Feb 14, 2023

ariel-miculas commented Feb 15, 2023 •

edited

Loading

ariel-miculas commented Feb 15, 2023

ariel-miculas commented Feb 22, 2023

ariel-miculas commented Feb 22, 2023

ariel-miculas commented Mar 15, 2023

ariel-miculas commented Mar 17, 2023 •

edited

Loading

implement compression #14

implement compression #14

Comments

tych0 commented Feb 16, 2021

tych0 commented Mar 24, 2021

ariel-miculas commented Feb 14, 2023

hallyn commented Feb 14, 2023

ariel-miculas commented Feb 14, 2023

hallyn commented Feb 14, 2023

ariel-miculas commented Feb 15, 2023 • edited Loading

ariel-miculas commented Feb 15, 2023

ariel-miculas commented Feb 22, 2023

ariel-miculas commented Feb 22, 2023

ariel-miculas commented Mar 15, 2023

ariel-miculas commented Mar 17, 2023 • edited Loading

Setup:

Compression disabled

Compression enabled

Comparison with squashfuse

ariel-miculas commented Feb 15, 2023 •

edited

Loading

ariel-miculas commented Mar 17, 2023 •

edited

Loading