-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement compression #14
Comments
We have some basic support for this now with d52efd2; let's see how it goes. |
Depending on the chunking parameters we choose, the seekable compression may not be worth it. For example, let's say we pick 16KB for the average chunk and 64KB for the maximum chunk. Then the maximum chunk size is less than the squashfs default block size of 128KB. Also, zstd seekable compression includes extra metadata for storing information about the offsets, resulting in a size increase for the compressed chunks. Plus we'll be stuck with zstd. |
Could you post your deduplication performance results, or maybe just a concise summary? IIUC, tuning the parameters made it so that indeed compression may not be needed. |
I still think we need compression, I'm only arguing against seekable compression. |
Sounds good. |
An interesting point from squashfs is that they're not compressing blocks if this would result in a larger size, in this case they're storing the block uncompressed, perhaps we should also implement this.
from the microsoft paper |
Related: containers/image#1084 |
We can have optional support for compression and we need a new field in BlobRef to indicate whether the blob is compressed or not. The blob still needs to be content-addressed, so the sha256sum of the compressed blob needs to be computed. |
We should also process the files in parallel during puzzlefs build in order to speed up the compression. |
next steps:
|
Setup:Build and mount the filesystem:
Reading an entire filesystem: Compression disabled
Compression enabled
Comparison with squashfuse
|
Right now all the metadata and file contents for puzzlefs are uncompressed. We should compress them.
The design format of puzzlefs constrains the use of compression a bit though: we need to be able to read data from an arbitrary offset (the compression community calls this a "seekable compression format").
People like zstd as a compression format, and it has out-of-tree implementations of this with rust bindings:
https://github.com/facebook/zstd/blob/dev/contrib/seekable_format/zstd_seekable_compression_format.md
https://crates.io/crates/zstd-seekable
which may be interesting.
The text was updated successfully, but these errors were encountered: