-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor machine decompress.go #21592
Refactor machine decompress.go #21592
Conversation
7932a59
to
653fcb0
Compare
code generally LGTM @Luap99 what say you on the size addition? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why there is a dedicated pkg/machine/compression
? Is there a specific reason why we cannot (or do not) use c/storage/pkg/archive
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what exactly is this solving compared to the already existing decompression logic like pkg/archive or pkg/compression from c/image that is already used here via compression.AutoDecompress(r)
// Are we reading full image file? | ||
// Only few bytes are read to detect | ||
// the compression type | ||
srcFileContent, err := srcVMFile.Read() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this reads the entire file in memory which seem pretty bad for permanence reasons especially since the decompresser will read the same file again
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes; this should either use a buffer + Peek
(like archive.DecompressStream
), or combine the read data + stream (like compression.DetectCompression
).
If this package should be in the business of detecting formats at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I found it out when doing the refactoring but I didn't want to use this PR to make changes in the logic of the existing code (there are enough changes) and I added the comment. This should be fixed, but I would rather do it in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Throughout): “The existing code did it that way” is a fair point to some of my objections — the PR is already invasive enough that I wasn’t looking for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a brief skim of an unfamiliar part of the codebase.
The PR is changing a lot in one commit, so I didn’t review how it changes since the previous version at all.
return newZipDecompressor(compressedFilePath) | ||
case compressionType == archive.Uncompressed: | ||
return newUncompressedDecompressor(compressedFilePath) | ||
case compressionType == archive.Gzip && os == macOs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this OS-specific?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the current behavior. I haven't changed that. I am assuming that the gzip implementation in c/images/pkg/compression
doesn't support sparse files on macos. And we use crc/pkg/os
implementation instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICS CopySparse
consumes a plain io.Reader
, I can’t see an obvious reason why it couldn’t consume (the faster one used by) the c/image/pkg/compression
one.
Either way, maybe not this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is faster indeed (~17s vs ~27s on my mac) but if I try to run unit tests using GzipDecompressor
from c/image/pkg/compression with CopySparse
from crc/v2/pkg/os
, I get an error:
compression_test.go:128:
Error Trace: /Users/mloriedo/Git/podman/pkg/machine/compression/compression_test.go:128
Error: Not equal:
expected: "gzip\n"
actual : "gzip\x00"
This error doesn't happen when using the standard library gzip package with CopySparse from crc/v2/pkg/os. And it doesn't happen using GzipDecompressor
with io.Copy
. I suspect that this is a problem with CopySparse
. Anyway, not something I would fix in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the time being I have added the following comment:
// macOS gzipped VM images are sparse. As a result a
// special decompressor is required: it uses crc os.CopySparse
// instead of io.Copy and std lib gzip instead of klauspost/pgzip
// (even if it's slower).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m not too much of a fan of the platform dependency, but, *shrug*, preexisting.
I suspect that this is a problem with CopySparse. Anyway, not something I would fix in this PR.
Fair enough, but please at least file an issue if you are not already working on this; silent data corruption seems serious.
It’s quite possible that it would be fixed by using #21763 instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, let me try with #21763 because I suspect the problem was crc CopySparse indeed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the SparseWriter
introduced in #21763 fixed the problem. So I have been able to update this PR to use the gzip decompressor from c/image/pkg/compression
(that uses klauspost/pgzip
) and it's faster 🚀 . Although zstd
is used now rather than gzip
(through klauspost/compress/zstd
) but that's fast too.
// Are we reading full image file? | ||
// Only few bytes are read to detect | ||
// the compression type | ||
srcFileContent, err := srcVMFile.Read() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes; this should either use a buffer + Peek
(like archive.DecompressStream
), or combine the read data + stream (like compression.DetectCompression
).
If this package should be in the business of detecting formats at all.
done := make(chan bool) | ||
go func() { | ||
if _, err := io.Copy(w, read); err != nil { | ||
logrus.Error(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn’t any errors be propagated to the caller?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is main branch code, it hasn't been introduced by this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed #21772 for this.
if err != nil { | ||
return err | ||
} | ||
return cmd.Wait() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the time this happens, the separate goroutine might not have finished writing to w
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is main branch code, it hasn't been introduced by this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed #21772 for this.
Thank you @vrothberg, this is a good question but this PR is about simplifying the existing code of Anyway |
Thank you @llchan. This PR doesn't introduce any new logic. It's just a re-writing of existing code. For what it's worth |
Using the xz library is usually quite a lot slower than forking |
653fcb0
to
f8a0d61
Compare
@afbjorklund This PR is about cleaning up code inconsistencies and duplication in decompress.go. I am not changing any behavior. That's on purpose. This PR should not introduce any change about how we use |
return newZipDecompressor(compressedFilePath) | ||
case compressionType == archive.Uncompressed: | ||
return newUncompressedDecompressor(compressedFilePath) | ||
case compressionType == archive.Gzip && os == macOs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m not too much of a fan of the platform dependency, but, *shrug*, preexisting.
I suspect that this is a problem with CopySparse. Anyway, not something I would fix in this PR.
Fair enough, but please at least file an issue if you are not already working on this; silent data corruption seems serious.
It’s quite possible that it would be fixed by using #21763 instead.
return err | ||
} | ||
defer func() { | ||
if err := f.Close(); err != nil { | ||
logrus.Errorf("unable to close on compressed file %s: %q", compressedPath.GetPath(), err) | ||
if err := decompressedFileWriter.Close(); err != nil && !errors.Is(err, os.ErrClosed) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ErrClosed
indicates a programming bug (analogous to C use-after-free); if it is showing up in practice, it should be traced down and avoided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code exist in main (introduced last week).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, still a bug, and closing unowned file descriptors in a concurrency-heavy language like Go would be maddening to debug. I can, barely, live with deferring #21772, that’s “only” data corruption, but unpredictable in-process memory corruption is even more dangerous than that.
(As it happens, the Go internal implementation seems to atomically enforce only one Close
, but the interface spec explicitly says the behavior is undefined, so we can’t rely on that.)
AFAICS this refactor has already fixed the double-closes (and that can be enforced by the type system by not providing a *Closer
type to the individual decompressor implementations [see elsewhere about the sparse writer Close
handling]), so just removing this check should be sufficient to resolve.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m sorry, actually the io.Closer.Close
specification says effect of repeated calls is undefined, but specifically os.File.Close
documents that “Close will return an error if it has already been called.” (but, curiously, doesn’t commit to returning ErrClosed
in this case).
So this is, after all, safe enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#21977 will fix this.
pkg/machine/compression/gzip.go
Outdated
return err | ||
} | ||
d.gzReader = gzReader | ||
_, err = crcOs.CopySparse(w, gzReader) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Non-blocking: I’m tempted to say that code like this doesn’t need to be a subclass; parametrize genericDecompressor
with a
decompressStream func(io.Reader) (io.ReadCloser, error)
isSparse bool
… but then I would go on and argue that “is the output a sparse file” should be true either for all or no compression formats because it is very surprising for the sparseness of the output to depend on the compression algorithm … and we would drift into things that should happen in other PRs.)
f8a0d61
to
be28d1a
Compare
I have merged the changes to decompress.go introduced by #21768 with this commit (zstd and uncompressed use the new SparseWriter). I haven't removed the crc CopySparse dependency as it's still used by other packages cc @baude. The GH action |
The check uses rebase and checks commit by commit so you would need to do the same if you want to compare by numbers. I think for now we should just accept the grow, I don't see anything that can be done here to help much. |
if runtime.GOOS == "darwin" { | ||
return decompressGzWithSparse(prefix, localPath, uncompressedFileWriter) | ||
} | ||
fallthrough |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the record:
This code from #21768 works right now, but if Zstd ever grew a non-Darwin implementation, Gzip would start failing on Darwin. To me, that kind of a trap is at least worth a comment.
That’s fixed in this PR — but if this PR were abandoned for any reason, I would want to add that comment.
@baude @mheon I have addressed most of the comments on this PR. The check To be fair was able to test it locally on linux/macos/wsl but not on hyperv yet (I will try later). And I am not able to add the |
The machine resize failure seems legitimate, and not a flake. Code LGTM though |
Label added |
Code LGTM, hyperv sqlite test still twitching.... |
pkg/machine/compression/gzip.go
Outdated
sparseWriter := NewSparseWriter(w) | ||
_, err = io.Copy(sparseWriter, gzReader) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In every instance:
This never calls sparseWriter.Close()
, and therefore corrupts data (does not write the trailing zero block, if any).
It might be simpler (and more symmetric) to have w
just be an io.WriteSeeker
, and to wrap it with (a functional equivalent of) NopCloser
for the purpose of this function. … and because that repeats 3 times, perhaps do that in a local copySparse
utility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Also, again because sparseWriter.Close
is non-trivial, its return value should be checked, not ignored.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right thanks. I have added the calls to sparseWriter.Close()
and added new tests to check compressed files with trailing zeros.
It's worth noting that not calling sparseWriter.Close()
(i.e. ignoring the trailing zeros) makes podman machine init
still successful but 10s faster on my local macOS.
case compressionType == archive.Uncompressed: | ||
return newUncompressedDecompressor(compressedFilePath) | ||
// Using special compressors on MacOS because default ones | ||
// in c/image/pkg/compression are slow with sparse files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT the implementation is now the same, the difference is only that these two use sparseWriter
. Am I missing something?
The previous version said “macOS gzipped VM images are sparse”.
And #21592 (comment) suggested that previously we had to choose between sparse processing and fast decompression, so we choose the slower one on macOS only, where sparse was a requirement. Apparently with the new sparseWriter
that’s no longer the case, we can have both.
I have no idea what’s true, and I don’t insist on switching to always-sparse here and removing the platform dependency, but I’d prefer the comment to document as much of the intent and trade-offs (even if it is “some previous version made that distinction for an unknown reason”) as possible, because our knowledge will not get better as time passes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the implementation is now the same
I mean the decompression library being called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The AFAICT incorrect comment is still outstanding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about:
// Using special compressors on MacOS because generic
// uses `io.Writer` that is less performant than `SparseWriter`
// when writing applehv raw images.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Both implementations “use
io.Writer
”, that’s not quite the difference - Is it “less performant”? Only on macOS? AFAICT the comment is there to explain why macOS is a special case, why this is not conditioned only on the compression format. What’s the part that is macOS specific?
The previous version said “macOS gzipped VM images are sparse”.
Is that the reason? If so, then say so:
macOS gzipped VM images are sparse, so we need an implementation which can create sparse files. It is currently unknown why we have, historically, not been trying to create sparse files on other platform as well.
or something.
Or is the reason something else entirely? I have no idea. Some future maintainer might be looking at this 3 years later and trying to figure out what we were thinking.
(This directly relates to #21592 (comment) …)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me try to be more clear:
- On macOS podman machine uses applehv
- applehv VMs are distributed as raw disk files (for windows hyperv we use the vhdx format)
- FCOS raw disk files have a size of 10GB with only ~2GB of actual files (vhdx has a size of ~2GB)
- gzip and zstd don't have any specific optimization for such kind of sparse files (the compressed size is ~8GB)
- As a result, the decompression of FCOS is particularly slow on macOS,
- One way to make the decompression faster for these large sparse files is using SparseWriter
- The decompressor
generic
supportsgzip
,zstd
etc...but doesn't useSparseWriter
- The zsdt and gzip decompressors use
SparseWriter
and are used only for FCOS raw disk files (i.e. macOS)
My previous comment macOS gzipped VM images are sparse
was not accurate. It's not related to gzip but to the fact that applehv
VMs are imported from raw disk files. And decompressing such files can be optimized using SparseWriter.
@baude is this correct?
return err | ||
} | ||
defer func() { | ||
if err := f.Close(); err != nil { | ||
logrus.Errorf("unable to close on compressed file %s: %q", compressedPath.GetPath(), err) | ||
if err := decompressedFileWriter.Close(); err != nil && !errors.Is(err, os.ErrClosed) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, still a bug, and closing unowned file descriptors in a concurrency-heavy language like Go would be maddening to debug. I can, barely, live with deferring #21772, that’s “only” data corruption, but unpredictable in-process memory corruption is even more dangerous than that.
(As it happens, the Go internal implementation seems to atomically enforce only one Close
, but the interface spec explicitly says the behavior is undefined, so we can’t rely on that.)
AFAICS this refactor has already fixed the double-closes (and that can be enforced by the type system by not providing a *Closer
type to the individual decompressor implementations [see elsewhere about the sparse writer Close
handling]), so just removing this check should be sufficient to resolve.
Signed-off-by: Mario Loriedo <[email protected]>
Cockpit tests failed for commit 5970466. @martinpitt, @jelly, @mvollmer please check. |
Cockpit tests failed for commit 7b6d9a5. @martinpitt, @jelly, @mvollmer please check. |
These cockpit failures were both instances of the thing fixed in cockpit-project/cockpit-podman#1580 . So hopefully shouldn't repeat in the future, sorry for the noise. It is remarkably hard to reset podman to a pristine state in between tests.. Leaked processes/mounts, files (dis)appearing after killing processes, etc. OOI, how do your tests do the cleanup in teardown? |
Let's not have such discussion on some random PR, we should not leak anything assuming you stop things properly. (I know we do sometimes as we see similar issues in other CI). I suggest you file a issue, if you have a reproducer for this outside of you CI even better. |
@l0rd one of the test failures seems relevant to this PR. |
Signed-off-by: Mario Loriedo <[email protected]>
That works for me… implementation LGTM. I’d still want to see the documentation around #21592 (comment) to be updated / made accurate, matching what is known today. |
/lgtm |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: l0rd, Luap99 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
While working on containers#21592 we figured out that the the full VM File was loaded in memory when detecting the file format, but only a few bytes are needed. This commit address that. [NO NEW TESTS NEEDED] Signed-off-by: Mario Loriedo <[email protected]>
What does this PR do?
This PR includes #21591 commit.
Does this PR introduce a user-facing change?