Jib should avoid parallel image downloads for same image #2007

briandealwis · 2019-09-16T18:59:50Z

Although jib is thread-safe, it should be smarter.

Jib doesn't currently lock the base image cache when downloading a base image, but instead downloads into a temporary directory and then attempts to moves the downloaded image into place. There is no locking to block other threads. So a Maven project with N modules that build images using the same base image (like gcr.io/distroless/java) may result in N simultaneous pulls of the same image. Maybe we should provide a component to centralize downloading images?

Originally posted by @briandealwis in #1904 (comment)

The text was updated successfully, but these errors were encountered:

raizoor · 2019-09-27T13:49:12Z

@briandealwis ,

In version >= 1.5.0 JIB verifys if the base image exists on target registry and not download, doesn't it? If it's correct, the centralized folder to base images is only necessary when you don't use a registry.

TadCordle · 2019-09-27T14:21:15Z

That is true, but when you don't target a registry, Jib still caches base images locally to skip unnecessary pulls. So I think this issue is really only a problem for clean multi-module builds.

chanseokoh · 2019-09-27T14:44:26Z

And this only matters when you enable the Maven parallel builds (-T <thread counts>) or run multiple Jib processes simultaneously on the same host. (And only when the base image layers have never been cached.) Does not affect correctness even in those cases.

mpeddada1 · 2020-12-09T22:04:11Z

@chanseokoh assuming this can be closed after #2780?

chanseokoh · 2020-12-09T22:11:12Z

@mpeddada1 this issue is a different one. #2780 was only in the context of multi-arch support.

nhoughto · 2021-12-13T22:41:52Z

I've been trying to incorporate Jib into our build process and one problem i'm encountering is that Jib doesn't seem to try and detect and de-dupe inflight build work. So if N requests to build the same new image come in at the same time, it will do the work N times (because the base/application cache is a cache miss).

Obviously the cache directories exist to solve for avoiding work when the work has already been done, but in the scenario where the work hasn't been done yet but many requests to do the same thing are being processed concurrently there is potential for detecting and optimising this scenario I think?

My scenario is a monorepo with N images with the base/first few layers with only the last layer differing per image, so a request to build all of them at once means jib builds the base/shared layers N times rather than once, which makes things take a lot longer than when doing the same thing under docker as it de-dupes work.

My plan was to coordinate the calls to jib such that 1 build request goes in first, so the subsequent requests would get cache hits, but that is really just solving my problem and I thought there might be a desire to solve this in jib itself as this is likely smth others might be hitting? This isn't just related to image downloads, but the full set of things Jib might do..

There is a secondary question about how this might work with maven/gradle plugins as they make things a bit harder (could be different JVMs and thus might not be able to just coordinate intra-JVM..) but ignore that for now.

chanseokoh · 2021-12-14T17:25:31Z

@nhoughto thanks for the feedback. As mentioned in this issue, what you said makes sense, and this is something not ideal. It's only that it would be a pretty complex task to address the issue for multiple reasons, which may require a substantial overhaul of the async infrastructure (unless there are some quick hacks to enable a few relatively easy performance enhancements). But more importantly, I remember I gave some thoughts on this a long time ago, and de-duping some in-flight build work is like a fundamental design change that brings both pros and cons: notably, the current implementation is that all the lines of parallel work are very fine-grained, which enables great parallelism. For example, Jib can start uploading part of base images or application binaries while it is still downloading other part of base images or still building other application layers. So, in many cases this can be much faster compared to centralized locking to delay/block all threads until Jib verifies that all the layers these threads download or build are eventually going to be identical. So, the issue really needs deep insights, although there might be some easy areas for improvement. Unfortunately, we have other priorities, and improving this aspect is not in our roadmap.

nhoughto · 2021-12-14T20:09:01Z

Thanks for the insight, I’ll look at putting something around jib then

…

On 15 Dec 2021, 4:25 AM +1100, Chanseok Oh ***@***.***>, wrote: @nhoughto thanks for the feedback. As mentioned in this issue, what you said makes sense and something not ideal. It's only that it would be a pretty complex task to address for multiple reasons, which may require a substantial overhaul of the async infrastructure (unless there are some quick hacks to enable a few relatively easy performance enhancements). But more importantly, I remember I gave some thoughts on this a long time ago, and de-duping some in-flight build work is like a fundamental design change that brings pros and cons: notably, the current implementation is that all the lines of parallel work are very fine-grained and enables great parallelism. For example, Jib can start uploading part of base images or application binaries while it is still downloading other part of base images or still building other application layers. So, in many cases this can be much faster compared to centralized locking to delay/block of all threads until Jib verifies that all the layers they download or building are eventually going to be identical. So, the issue really needs deep insights, although there might be some easy areas for improvement. Unfortunately, we have other priorities, and improving this aspect is not in our roadmap. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

nhoughto · 2021-12-17T04:18:04Z

Digging into my scenarios for this ticket more, the slowness I was seeing was really due to the DockerDaemonImage behaviour more than anything, the base and app cache is pretty effective at everything other than the first run but the Docker Daemon integrate is very wasteful if you are coming from a docker build .. world and switching to jib.

Because DockerDaemonImage ends up just calling docker load -i image.tar essentially there is zero dedupe / cache checking like there would be if you did docker build .. and the full image is passed to the docker daemon each time, in my scenario where I have N containers being built concurrently sharing much of the layers this results in zero layer reuse and N full tar pushed to the daemon, making even small changes incur a large penalty. Not much jib can do about that, but maybe a note on the DockerDaemonImage that it is functional but has some caveats.

To workaround it i've effectively written a JibContainerBuilder -> Dockerfile serialiser, so am back to indirectly using docker build . where required and jib everywhere else (from one JibContainerBuilder definition).

elefeint · 2021-12-28T15:21:18Z

Thank you for the investigation and detailed notes!

chanseokoh · 2022-01-04T22:34:37Z

docker run -d -p 5000:5000 --restart always --name registry registry:2

Because DockerDaemonImage ends up just calling docker load -i image.tar essentially there is zero dedupe / cache checking like there would be if you did docker build .. and the full image is passed to the docker daemon each time

Yeah, this is very unfortunate. Unlike container registries, the Docker daemon (Docker Engine API) is very limited in that it doesn't provide a way to check, pull, or push individual layers. It's what we've been greatly lamented. OTOH, pushing to a registry with Jib is super-fast due to the strong reproducibility of Jib, so Spinning up a local registry could be an option, which is as easy as docker run -p 5000:5000 registry:2.

nhoughto · 2022-01-07T03:03:11Z

More findings on the actual original issue of this, parallel downloads of the same base image, this is actually causing me failures rather than just wasting cycles/bandwidth. Specifically against docker hub with a parallelism of 8 (and more than 8 projects sharing the same base image) some of the base image downloads would fail with an UNAUTHORIZED error whilst others would succeed, leading to an overall failure of the build.
Once the base image is there obviously this isn't a problem, but when its not it causes the build to fail consistently.

I'm not sure whether this is a docker hub API thing, rate limiting or similar? Or its a Jib race condition around authentication (is anything shared across threads/instances in Jib?) the logs don't show any errors relating to rate limiting. Feels a bit like its a Jib race condition.

Either way it seems like there are meaningful problems with letting Jib download the base image in parallel and doing some de-duping will solve both problems, as a workaround atm i'm doing some locking before calls to Jib to ensure each base image is only pulled once but it would be nice if this was solved upstream 👍

meltsufin · 2022-01-07T16:05:43Z

@nhoughto Would you be interested in making a contribution for this?

nhoughto · 2022-01-08T03:06:36Z

yep can do, any tips on preferred approach?

chanseokoh · 2022-01-10T19:26:23Z

You can see detailed registry interactions by enabling HTTP debug logging. (I think you can still keep -Djib.serialize=true, since I believe what you are doing is running multiple Jib builds concurrency.) That way, you should be able to pinpoint where and when exactly the server return UNAUTHORIZED and what was the auth information given to the server.

BTW, as I alluded in #2007 (comment), this isn't something we can jump into work on it.

chanseokoh · 2022-08-18T14:55:23Z

@nhoughto apologies, it's been a while, but your observation was keen and correct. I totally forgot the following known issue:

jib/jib-core/src/main/java/com/google/cloud/tools/jib/builder/steps/PullBaseImageStep.java

Lines 137 to 140 in 7effb0d

    
           // TODO: passing noAuthRegistryClient may be problematic. It may return 401 unauthorized 
        
           // if layers have to be downloaded. 
        
           // https://github.com/GoogleContainerTools/jib/issues/2220 
        
           return new ImagesAndRegistryClient(images, noAuthRegistryClient);

Possible 401 Unauthorized when base image is referred to by a digest and manifest is cached #2220

According to another user's analysis, it seems that you may run into the UNAUTHORIZED issue when a base image is not yet cached and there are parallel downloads going on.

briandealwis added area/jib-core priority: p2 labels Sep 18, 2019

mpeddada1 added the enhancement label Dec 9, 2020

chanseokoh mentioned this issue Jan 6, 2021

Enable cache configuration in maven mojos #2970

Open

chanseokoh mentioned this issue Dec 13, 2021

Jib-core: Approach to de-duping parallel build work / locking strategy #3527

Closed

emmileaf added priority:p3 and removed priority: p2 labels Aug 9, 2022

tommyulfsparre mentioned this issue Aug 18, 2022

Multi-module maven project with parallel builds fail with UNAUTHORIZED. #3733

Closed

elefeint assigned emmileaf Aug 29, 2022

emmileaf mentioned this issue Sep 6, 2022

Check for existence of layers in cache before returning cached base image #3767

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jib should avoid parallel image downloads for same image #2007

Jib should avoid parallel image downloads for same image #2007

briandealwis commented Sep 16, 2019

raizoor commented Sep 27, 2019

TadCordle commented Sep 27, 2019 •

edited by chanseokoh

Loading

chanseokoh commented Sep 27, 2019

mpeddada1 commented Dec 9, 2020

chanseokoh commented Dec 9, 2020

nhoughto commented Dec 13, 2021

chanseokoh commented Dec 14, 2021 •

edited

Loading

nhoughto commented Dec 14, 2021 via email

nhoughto commented Dec 17, 2021

elefeint commented Dec 28, 2021

chanseokoh commented Jan 4, 2022

nhoughto commented Jan 7, 2022

meltsufin commented Jan 7, 2022

nhoughto commented Jan 8, 2022

chanseokoh commented Jan 10, 2022 •

edited

Loading

chanseokoh commented Aug 18, 2022 •

edited

Loading

Jib should avoid parallel image downloads for same image #2007

Jib should avoid parallel image downloads for same image #2007

Comments

briandealwis commented Sep 16, 2019

raizoor commented Sep 27, 2019

TadCordle commented Sep 27, 2019 • edited by chanseokoh Loading

chanseokoh commented Sep 27, 2019

mpeddada1 commented Dec 9, 2020

chanseokoh commented Dec 9, 2020

nhoughto commented Dec 13, 2021

chanseokoh commented Dec 14, 2021 • edited Loading

nhoughto commented Dec 14, 2021 via email

nhoughto commented Dec 17, 2021

elefeint commented Dec 28, 2021

chanseokoh commented Jan 4, 2022

nhoughto commented Jan 7, 2022

meltsufin commented Jan 7, 2022

nhoughto commented Jan 8, 2022

chanseokoh commented Jan 10, 2022 • edited Loading

chanseokoh commented Aug 18, 2022 • edited Loading

TadCordle commented Sep 27, 2019 •

edited by chanseokoh

Loading

chanseokoh commented Dec 14, 2021 •

edited

Loading

chanseokoh commented Jan 10, 2022 •

edited

Loading

chanseokoh commented Aug 18, 2022 •

edited

Loading