Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg/chunked: add support for sparse files #1102

Merged
merged 5 commits into from
Jan 13, 2022

Conversation

giuseppe
Copy link
Member

automatically detect holes in sparse files (the threshold is hardcoded
at 1kb for now) and add this information to the manifest file.

The receiver will create a hole (using unix.Seek and unix.Ftruncate)
instead of writing the actual zeros.

Closes: #1091

Signed-off-by: Giuseppe Scrivano [email protected]

@rhatdan
Copy link
Member

rhatdan commented Jan 12, 2022

Should the size 1kb be configurable within storage.conf?

pkg/chunked/compressor/compressor.go Outdated Show resolved Hide resolved
pkg/chunked/compressor/compressor.go Outdated Show resolved Hide resolved
@rhatdan
Copy link
Member

rhatdan commented Jan 12, 2022

@giuseppe giuseppe force-pushed the zstd-chunked-support-sparse-files branch 2 times, most recently from a03823b to 1aa08c4 Compare January 12, 2022 14:27
@rhatdan
Copy link
Member

rhatdan commented Jan 12, 2022

LGTM

@giuseppe giuseppe force-pushed the zstd-chunked-support-sparse-files branch 2 times, most recently from 24b763e to 63aaaf3 Compare January 12, 2022 19:39
@giuseppe giuseppe marked this pull request as draft January 13, 2022 09:22
@giuseppe giuseppe marked this pull request as ready for review January 13, 2022 09:41
@giuseppe giuseppe marked this pull request as draft January 13, 2022 09:43
@giuseppe giuseppe marked this pull request as ready for review January 13, 2022 09:56
@giuseppe giuseppe force-pushed the zstd-chunked-support-sparse-files branch from 59c62b9 to 73825de Compare January 13, 2022 11:13
Signed-off-by: Giuseppe Scrivano <[email protected]>
commit 10697a0 introduced the issue.

Signed-off-by: Giuseppe Scrivano <[email protected]>
commit 10697a0 introduced the issue.

Signed-off-by: Giuseppe Scrivano <[email protected]>
@giuseppe giuseppe force-pushed the zstd-chunked-support-sparse-files branch from 73825de to fbdb371 Compare January 13, 2022 11:15
Signed-off-by: Giuseppe Scrivano <[email protected]>
automatically detect holes in sparse files (the threshold is hardcoded
at 1kb for now) and add this information to the manifest file.

The receiver will create a hole (using unix.Seek and unix.Ftruncate)
instead of writing the actual zeros.

Closes: containers#1091

Signed-off-by: Giuseppe Scrivano <[email protected]>
@giuseppe giuseppe force-pushed the zstd-chunked-support-sparse-files branch from fbdb371 to 1988208 Compare January 13, 2022 12:32
@flouthoc
Copy link
Collaborator

LGTM

@rhatdan rhatdan merged commit 3ddd24a into containers:main Jan 13, 2022
Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note at least the holeFinderStateFound error path.

}
holeLen := f.zeros
f.zeros = 0
return holeLen, 0, nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICS this will end up endlessly returning 0, 0, nil. I don’t think err should be just ignored.

chunkSize := rcReader.WrittenOut - lastChunkOffset
if chunkSize > 0 {
chunkType := internal.ChunkTypeData
if rcReader.IsLastChunkZeros {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn’t this the only caller of rcReader? If so, why not return IsLastChunkZeros directly from the Read method, the only writer of that value, instead over worrying about semantics of a “state” in rollingChecksumReader?

rcReader := &rollingChecksumReader{
reader: bufio.NewReader(tr),
reader: hf,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Embedding holesFinder directly into rollingChecksumReader, without the pointer indirection, might help the compiler a bit. Or ReadByte could just be a private method of rollingChecksumReader — OTOH that might make testing it a bit harder.)

for i := 0; i < len(b); i++ {
n, err := rc.reader.ReadByte()
holeLen, n, err := rc.reader.ReadByte()
Copy link
Collaborator

@mtrmac mtrmac Jan 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does writeZstdChunkedStream show up on a CPU profile, by any chance?

I have measured nothing, but all of this fairly deep call stack and state management for every single byte (especially across a public module boundary to bufio.{ReadByte,UnreadByte}) seems that it could be quite costly, compared to filling a multi-kilobyte buffer and just doing a linear scan for a non-zero byte in a trivial (and easily-vectorizable) loop.

Of course, let’s not worry about it if it doesn’t matter.

fileTypeZstdChunked = iota
fileTypeEstargz
fileTypeNoCompression
fileTypeHole
Copy link
Collaborator

@mtrmac mtrmac Jan 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name doesn’t make sense.

At a first glance, without actually tracking down the full set of states/transitions in detail, the code should separate two concepts:

  • How to decompress compressed data in this file (a function pointer, or an interface for “start decompressing and return a Reader” + “clean up”). Notably the gzipReader/zstdReader members fairly strongly suggest chunkedDiffer is inlining ~independent object types`.
  • What’s the type of the current chunk (compressed, locally-available, a hole)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support of sparse files in container images
5 participants