-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jib should avoid parallel image downloads for same image #2007
Comments
In version >= 1.5.0 JIB verifys if the base image exists on target registry and not download, doesn't it? If it's correct, the centralized folder to base images is only necessary when you don't use a registry. |
That is true, but when you don't target a registry, Jib still caches base images locally to skip unnecessary pulls. So I think this issue is really only a problem for clean multi-module builds. |
And this only matters when you enable the Maven parallel builds ( |
@chanseokoh assuming this can be closed after #2780? |
@mpeddada1 this issue is a different one. #2780 was only in the context of multi-arch support. |
I've been trying to incorporate Jib into our build process and one problem i'm encountering is that Jib doesn't seem to try and detect and de-dupe inflight build work. So if N requests to build the same new image come in at the same time, it will do the work N times (because the base/application cache is a cache miss). Obviously the cache directories exist to solve for avoiding work when the work has already been done, but in the scenario where the work hasn't been done yet but many requests to do the same thing are being processed concurrently there is potential for detecting and optimising this scenario I think? My scenario is a monorepo with N images with the base/first few layers with only the last layer differing per image, so a request to build all of them at once means jib builds the base/shared layers N times rather than once, which makes things take a lot longer than when doing the same thing under docker as it de-dupes work. My plan was to coordinate the calls to jib such that 1 build request goes in first, so the subsequent requests would get cache hits, but that is really just solving my problem and I thought there might be a desire to solve this in jib itself as this is likely smth others might be hitting? This isn't just related to image downloads, but the full set of things Jib might do.. There is a secondary question about how this might work with maven/gradle plugins as they make things a bit harder (could be different JVMs and thus might not be able to just coordinate intra-JVM..) but ignore that for now. |
@nhoughto thanks for the feedback. As mentioned in this issue, what you said makes sense, and this is something not ideal. It's only that it would be a pretty complex task to address the issue for multiple reasons, which may require a substantial overhaul of the async infrastructure (unless there are some quick hacks to enable a few relatively easy performance enhancements). But more importantly, I remember I gave some thoughts on this a long time ago, and de-duping some in-flight build work is like a fundamental design change that brings both pros and cons: notably, the current implementation is that all the lines of parallel work are very fine-grained, which enables great parallelism. For example, Jib can start uploading part of base images or application binaries while it is still downloading other part of base images or still building other application layers. So, in many cases this can be much faster compared to centralized locking to delay/block all threads until Jib verifies that all the layers these threads download or build are eventually going to be identical. So, the issue really needs deep insights, although there might be some easy areas for improvement. Unfortunately, we have other priorities, and improving this aspect is not in our roadmap. |
Thanks for the insight, I’ll look at putting something around jib then
…On 15 Dec 2021, 4:25 AM +1100, Chanseok Oh ***@***.***>, wrote:
@nhoughto thanks for the feedback. As mentioned in this issue, what you said makes sense and something not ideal. It's only that it would be a pretty complex task to address for multiple reasons, which may require a substantial overhaul of the async infrastructure (unless there are some quick hacks to enable a few relatively easy performance enhancements). But more importantly, I remember I gave some thoughts on this a long time ago, and de-duping some in-flight build work is like a fundamental design change that brings pros and cons: notably, the current implementation is that all the lines of parallel work are very fine-grained and enables great parallelism. For example, Jib can start uploading part of base images or application binaries while it is still downloading other part of base images or still building other application layers. So, in many cases this can be much faster compared to centralized locking to delay/block of all threads until Jib verifies that all the layers they download or building are eventually going to be identical. So, the issue really needs deep insights, although there might be some easy areas for improvement. Unfortunately, we have other priorities, and improving this aspect is not in our roadmap.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications on the go with GitHub Mobile for iOS or Android.
|
Digging into my scenarios for this ticket more, the slowness I was seeing was really due to the DockerDaemonImage behaviour more than anything, the base and app cache is pretty effective at everything other than the first run but the Docker Daemon integrate is very wasteful if you are coming from a Because DockerDaemonImage ends up just calling To workaround it i've effectively written a JibContainerBuilder -> Dockerfile serialiser, so am back to indirectly using |
Thank you for the investigation and detailed notes! |
docker run -d -p 5000:5000 --restart always --name registry registry:2
Yeah, this is very unfortunate. Unlike container registries, the Docker daemon (Docker Engine API) is very limited in that it doesn't provide a way to check, pull, or push individual layers. It's what we've been greatly lamented. OTOH, pushing to a registry with Jib is super-fast due to the strong reproducibility of Jib, so Spinning up a local registry could be an option, which is as easy as |
More findings on the actual original issue of this, parallel downloads of the same base image, this is actually causing me failures rather than just wasting cycles/bandwidth. Specifically against docker hub with a parallelism of 8 (and more than 8 projects sharing the same base image) some of the base image downloads would fail with an I'm not sure whether this is a docker hub API thing, rate limiting or similar? Or its a Jib race condition around authentication (is anything shared across threads/instances in Jib?) the logs don't show any errors relating to rate limiting. Feels a bit like its a Jib race condition. Either way it seems like there are meaningful problems with letting Jib download the base image in parallel and doing some de-duping will solve both problems, as a workaround atm i'm doing some locking before calls to Jib to ensure each base image is only pulled once but it would be nice if this was solved upstream 👍 |
@nhoughto Would you be interested in making a contribution for this? |
yep can do, any tips on preferred approach? |
You can see detailed registry interactions by enabling HTTP debug logging. (I think you can still keep BTW, as I alluded in #2007 (comment), this isn't something we can jump into work on it. |
@nhoughto apologies, it's been a while, but your observation was keen and correct. I totally forgot the following known issue: jib/jib-core/src/main/java/com/google/cloud/tools/jib/builder/steps/PullBaseImageStep.java Lines 137 to 140 in 7effb0d
According to another user's analysis, it seems that you may run into the UNAUTHORIZED issue when a base image is not yet cached and there are parallel downloads going on. |
Although jib is thread-safe, it should be smarter.
Jib doesn't currently lock the base image cache when downloading a base image, but instead downloads into a temporary directory and then attempts to moves the downloaded image into place. There is no locking to block other threads. So a Maven project with N modules that build images using the same base image (like
gcr.io/distroless/java
) may result in N simultaneous pulls of the same image. Maybe we should provide a component to centralize downloading images?Originally posted by @briandealwis in #1904 (comment)
The text was updated successfully, but these errors were encountered: