Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fallback behavior if performance optimizations fail #81

Merged
merged 6 commits into from
Dec 9, 2020

Conversation

ekcasey
Copy link
Member

@ekcasey ekcasey commented Dec 4, 2020

Resolves #80
Resolves #63

When saving a daemon image we now first attempt a performant save with base layers omitted and, if that fails, fetch the base layers by docker saveing the image, and try again. I have also removed more complicated logic that was intended to omit layers reused from the previous image whenever possible. With the addition of the launch cache in lifecycle, neither lifecycle nor pack currently take advantage of this optimization and it adds complexity.

While intended to resolve #80 this also resolves #63 as a side effect because we now save an image to disk by ID, avoiding situations where manifest.json might contain multiple entries. This previously happened because we were passing a tag to docker save without expanding an implicit :latest tag. While docker inspect will treat my/image as my/image:latest docker save will return all images with repository my/image. While we could save using the expanded tag, ID is the most explicit and also avoid any potential race conditions.

This PR is intended to provide fixes for immediate problems with as little disruption as possible. I intend to follow up in the next two weeks with a redesign of the imgutil interface that addresses some of the domain modeling problems and improves the usage.

I attempted to test this manually with podman but ran into issues where the result of docker inspect was missing fields. I believe this is an incompatibility between podman and the docker client library rather than an imgutil problem. I did temporarily comment out the performance hack and rerun tests to ensure the fallback isn't broken. More robust/automated testing of the fallback will be easier to introduce after the refactor mentioned above so I have omitted it for now.

When saving an image to the docker daemon we currently omit base image layers because docker does not require them and fetching them requires a slow image save. However, some runtimes that implement the daemon api have stricter validation so we will fallback to the slower behavior if the first save fails

Signed-off-by: Emily Casey <[email protected]>
* Tests run faster
* Avoids dockerhub rate limiting

Signed-off-by: Emily Casey <[email protected]>
@ekcasey ekcasey requested review from micahyoung and jromero December 4, 2020 17:47
@ekcasey ekcasey requested a review from a team as a code owner December 4, 2020 17:47
@ekcasey ekcasey marked this pull request as draft December 4, 2020 17:51
Fixes performance hack fallback edge case where GetLayer is called before Rebase.

Signed-off-by: Emily Casey <[email protected]>
Signed-off-by: Emily Casey <[email protected]>
@ekcasey ekcasey marked this pull request as ready for review December 4, 2020 18:19
return nil, err
}
}
return os.Open(i.layerPaths[l])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An edge case for sure but wouldn't it be better to check if i.layerPaths[l] is valid/non-empty after downloading the base layers? Otherwise, a very cryptic error may be thrown if that edge case is ever encountered.

@ekcasey ekcasey requested a review from jromero December 7, 2020 19:52
Copy link
Member

@micahyoung micahyoung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, except I'm running into some temp file or test image leakage or something - each time I run the local tests on Windows, it seams about 1GB of data gets left behind. Still investigating but wanted to let you know in the meantime.

Update: nevermind, same thing is happening on main, I'll file a separate issue.

@micahyoung micahyoung dismissed their stale review December 9, 2020 14:07

unrelated

Copy link
Member

@micahyoung micahyoung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Took me a while to get caught up on the issues and the impact of removing the optimization but this simplify things as well (also local tests on Windows speed up by about 30%!).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants