-
Notifications
You must be signed in to change notification settings - Fork 247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pkg/chunked: add support for sparse files #1102
pkg/chunked: add support for sparse files #1102
Conversation
Should the size 1kb be configurable within storage.conf? |
a03823b
to
1aa08c4
Compare
LGTM |
24b763e
to
63aaaf3
Compare
59c62b9
to
73825de
Compare
Signed-off-by: Giuseppe Scrivano <[email protected]>
commit 10697a0 introduced the issue. Signed-off-by: Giuseppe Scrivano <[email protected]>
commit 10697a0 introduced the issue. Signed-off-by: Giuseppe Scrivano <[email protected]>
73825de
to
fbdb371
Compare
Signed-off-by: Giuseppe Scrivano <[email protected]>
automatically detect holes in sparse files (the threshold is hardcoded at 1kb for now) and add this information to the manifest file. The receiver will create a hole (using unix.Seek and unix.Ftruncate) instead of writing the actual zeros. Closes: containers#1091 Signed-off-by: Giuseppe Scrivano <[email protected]>
fbdb371
to
1988208
Compare
LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note at least the holeFinderStateFound
error path.
} | ||
holeLen := f.zeros | ||
f.zeros = 0 | ||
return holeLen, 0, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICS this will end up endlessly returning 0, 0, nil
. I don’t think err
should be just ignored.
chunkSize := rcReader.WrittenOut - lastChunkOffset | ||
if chunkSize > 0 { | ||
chunkType := internal.ChunkTypeData | ||
if rcReader.IsLastChunkZeros { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn’t this the only caller of rcReader
? If so, why not return IsLastChunkZeros
directly from the Read
method, the only writer of that value, instead over worrying about semantics of a “state” in rollingChecksumReader
?
rcReader := &rollingChecksumReader{ | ||
reader: bufio.NewReader(tr), | ||
reader: hf, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Embedding holesFinder
directly into rollingChecksumReader
, without the pointer indirection, might help the compiler a bit. Or ReadByte
could just be a private method of rollingChecksumReader
— OTOH that might make testing it a bit harder.)
for i := 0; i < len(b); i++ { | ||
n, err := rc.reader.ReadByte() | ||
holeLen, n, err := rc.reader.ReadByte() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does writeZstdChunkedStream
show up on a CPU profile, by any chance?
I have measured nothing, but all of this fairly deep call stack and state management for every single byte (especially across a public module boundary to bufio.{ReadByte,UnreadByte}
) seems that it could be quite costly, compared to filling a multi-kilobyte buffer and just doing a linear scan for a non-zero byte in a trivial (and easily-vectorizable) loop.
Of course, let’s not worry about it if it doesn’t matter.
fileTypeZstdChunked = iota | ||
fileTypeEstargz | ||
fileTypeNoCompression | ||
fileTypeHole |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This name doesn’t make sense.
At a first glance, without actually tracking down the full set of states/transitions in detail, the code should separate two concepts:
- How to decompress compressed data in this file (a function pointer, or an interface for “start decompressing and return a
Reader
” + “clean up”). Notably thegzipReader
/zstdReader
members fairly strongly suggestchunkedDiffer
is inlining ~independent object types`. - What’s the type of the current chunk (compressed, locally-available, a hole)
automatically detect holes in sparse files (the threshold is hardcoded
at 1kb for now) and add this information to the manifest file.
The receiver will create a hole (using unix.Seek and unix.Ftruncate)
instead of writing the actual zeros.
Closes: #1091
Signed-off-by: Giuseppe Scrivano [email protected]