-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add fallback behavior if performance optimizations fail #81
Conversation
When saving an image to the docker daemon we currently omit base image layers because docker does not require them and fetching them requires a slow image save. However, some runtimes that implement the daemon api have stricter validation so we will fallback to the slower behavior if the first save fails Signed-off-by: Emily Casey <[email protected]>
* Tests run faster * Avoids dockerhub rate limiting Signed-off-by: Emily Casey <[email protected]>
Fixes performance hack fallback edge case where GetLayer is called before Rebase. Signed-off-by: Emily Casey <[email protected]>
Signed-off-by: Emily Casey <[email protected]>
Signed-off-by: Emily Casey <[email protected]>
return nil, err | ||
} | ||
} | ||
return os.Open(i.layerPaths[l]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An edge case for sure but wouldn't it be better to check if i.layerPaths[l]
is valid/non-empty after downloading the base layers? Otherwise, a very cryptic error may be thrown if that edge case is ever encountered.
Signed-off-by: Emily Casey <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, except I'm running into some temp file or test image leakage or something - each time I run the local
tests on Windows, it seams about 1GB of data gets left behind. Still investigating but wanted to let you know in the meantime.
Update: nevermind, same thing is happening on main
, I'll file a separate issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Took me a while to get caught up on the issues and the impact of removing the optimization but this simplify things as well (also local
tests on Windows speed up by about 30%!).
Resolves #80
Resolves #63
When saving a daemon image we now first attempt a performant save with base layers omitted and, if that fails, fetch the base layers by
docker save
ing the image, and try again. I have also removed more complicated logic that was intended to omit layers reused from the previous image whenever possible. With the addition of the launch cache in lifecycle, neither lifecycle nor pack currently take advantage of this optimization and it adds complexity.While intended to resolve #80 this also resolves #63 as a side effect because we now save an image to disk by ID, avoiding situations where
manifest.json
might contain multiple entries. This previously happened because we were passing a tag to docker save without expanding an implicit:latest
tag. Whiledocker inspect
will treatmy/image
asmy/image:latest
docker save
will return all images with repositorymy/image
. While we could save using the expanded tag, ID is the most explicit and also avoid any potential race conditions.This PR is intended to provide fixes for immediate problems with as little disruption as possible. I intend to follow up in the next two weeks with a redesign of the imgutil interface that addresses some of the domain modeling problems and improves the usage.
I attempted to test this manually with
podman
but ran into issues where the result ofdocker inspect
was missing fields. I believe this is an incompatibility betweenpodman
and the docker client library rather than animgutil
problem. I did temporarily comment out the performance hack and rerun tests to ensure the fallback isn't broken. More robust/automated testing of the fallback will be easier to introduce after the refactor mentioned above so I have omitted it for now.