Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-module maven project with parallel builds fail with UNAUTHORIZED. #3733

Closed
tommyulfsparre opened this issue Aug 18, 2022 · 5 comments
Closed

Comments

@tommyulfsparre
Copy link

tommyulfsparre commented Aug 18, 2022

Environment:

  • Jib version: 3.2.1
  • Build tool: Maven - 3.8.5
  • OS: Linux Ubuntu/MacOS

Description of the issue:

Building a multi-module Maven project with parallelized builds fails with UNAUTHORIZED when each container uses the same base image. The build needs to run with an empty cache.

Maven command:
mvn clean verify -Djava.util.logging.config.file=logging.properties -T 1C

We enabled jib debug logs and could see that the request to fetch the layer didn’t contain an authorization token for one or more of the concurrent build steps.

We were able to reproduce this issue locally and are fairly certain that this happens when the cache is empty and then gets partially populated (manifests_configs.json exists but not all the layers). This led us to this section: https://github.com/GoogleContainerTools/jib/blob/master/jib-core/src/main/java/com/google/cloud/tools/jib/builder/steps/PullBaseImageStep.java#L134-L140 that links to: #2220 which describes the issue we are observing. Removing this branch makes the parallel build pass.

We are fairly certain that this is the same issue as seen in: #2007 (comment)

Expected behavior:

Multiple parallel builds should correctly use the same base image (either cached or not) and build the proper tar artifacts.

Steps to reproduce:

  1. Configure a multi-module maven project (4 submodules were used for reproducing this issue)
  2. Configura all those module’s pom.xml to use jib and also use the same base image.
  3. Run maven: mvn clean verify -Djava.util.logging.config.file=logging.properties -DskipTests -T 1C
  4. It might take more than one try to reproduce this issue (we injected an artificial sleep in the StepRunner to force this race to happen).

jib-maven-plugin Configuration:

<plugin>
    <groupId>com.google.cloud.tools</groupId>
    <artifactId>jib-maven-plugin</artifactId>
    <configuration>
        <from>
            <image>gcr.io/images/baseimage:2022.03-2@sha256:<sha256></image>
            <credHelper>gcloud</credHelper>
        </from>
            <to>
                <image>gcr.io/${project}/${artifactId}:${version}</image>
            </to>
    </configuration>
</plugin>

Log output:

Caused by: com.google.cloud.tools.jib.api.RegistryUnauthorizedException: Unauthorized for gcr.io/baseimage
    at com.google.cloud.tools.jib.registry.RegistryEndpointCaller.call (RegistryEndpointCaller.java:163)
    at com.google.cloud.tools.jib.registry.RegistryEndpointCaller.call (RegistryEndpointCaller.java:114)
    at com.google.cloud.tools.jib.registry.RegistryClient.callRegistryEndpoint (RegistryClient.java:623)
    at com.google.cloud.tools.jib.registry.RegistryClient.lambda$pullBlob$3 (RegistryClient.java:494)
    at com.google.cloud.tools.jib.hash.Digests.computeDigest (Digests.java:104)
    at com.google.cloud.tools.jib.blob.WritableContentsBlob.writeTo (WritableContentsBlob.java:37)
    at com.google.cloud.tools.jib.cache.CacheStorageWriter.writeCompressedLayerBlobToDirectory (CacheStorageWriter.java:392)
    at com.google.cloud.tools.jib.cache.CacheStorageWriter.writeCompressed (CacheStorageWriter.java:226)
    at com.google.cloud.tools.jib.cache.Cache.writeCompressedLayer (Cache.java:130)
    at com.google.cloud.tools.jib.builder.steps.ObtainBaseImageLayerStep.call (ObtainBaseImageLayerStep.java:141)
    at com.google.cloud.tools.jib.builder.steps.ObtainBaseImageLayerStep.call (ObtainBaseImageLayerStep.java:39)
    at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly (TrustedListenableFutureTask.java:131)
    at com.google.common.util.concurrent.InterruptibleTask.run (InterruptibleTask.java:74)
    at com.google.common.util.concurrent.TrustedListenableFutureTask.run (TrustedListenableFutureTask.java:82)
    at java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1128)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:628)
    at java.lang.Thread.run (Thread.java:829)
@elefeint
Copy link
Contributor

@tommyulfsparre Thank you -- great investigation!

@chanseokoh What would be the best approach to fixing this in your opinion -- 1) only using the cache if all layers are available, 2) keeping current caching behavior but finding real auth credentials, 3) something else?

@chanseokoh
Copy link
Member

chanseokoh commented Aug 18, 2022

Hmm... it's hard to say what is the best approach unless I review the code holistically, which I don't think I'll do. The problem I think is that Jib assumes a base image is cached as long as there exists a manifest JSON (saved and retrieved via Cache.writeMetadata() and Cache.retrieveMetadata).

But the direction of 1) doesn't sound bad. It may not be that hard to add a logic to check if all the layer files described in a manifest are present, but I don't really know. 2) is also thinkable, but there'll will be situations where it retrieves credentials only to never use it. Retrieving credentials may cause frictions, and it delays the start of downloading anything from a registry.

Another option I can think of is to save the manifest (calling Cache.writeMetadata()) only after all layers are downloaded. It may not be that difficult to add another async Step in StepsRunnder that depends on all layer-downloading Steps, but I don't know.

@mpeddada1
Copy link
Contributor

Thank you for the thorough explanation @chanseokoh! +1 to both your and @elefeint's suggestions. Option 1 where we add an additional check to verify that all the layer files are present along with checking if the manifest exists sounds like a viable change!

For the option to add another async Step, I'm guess this new step will be different from BuildManifestListOrSingleManifestStep?

@emmileaf
Copy link
Contributor

Closing via #3767 which implements approach 1 to check for layers presence in cache, which should address the issue for this race condition. Please re-open if needed, and thanks again for the investigation!

@emmileaf
Copy link
Contributor

jib-core 0.23.0, jib-maven-plugin 3.3.1, and jib-gradle-plugin 3.3.1 have been released with this fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants