Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition when using cache-mounts with multi-arch builds. #549

Closed
nigelgbanks opened this issue Feb 21, 2021 · 9 comments
Closed

Race condition when using cache-mounts with multi-arch builds. #549

nigelgbanks opened this issue Feb 21, 2021 · 9 comments

Comments

@nigelgbanks
Copy link

nigelgbanks commented Feb 21, 2021

Issue

I've encountered intermittent failures using --mount=type=cache to share apk package manager cache when doing multi-platform builds using buildx with the docker-container driver. It appears buildx attempts to use the same cache even though they have different identifiers resulting in a failure for the package manager as it expects the files in the cached folder to be platform specific. This is intermittent though there are times where in buildx correctly creates two separate caches and the build is successful.

System Info

docker info                                                                                                                                
Client:
 Context:    default
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Build with BuildKit (Docker Inc., v0.5.1-docker)
  ecs: Docker ECS (Docker Inc., v1.0.0-beta.1)

Server:
 Containers: 2
  Running: 2
  Paused: 0
  Stopped: 0
 Images: 30
 Server Version: 20.10.3
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc io.containerd.runc.v2 io.containerd.runtime.v1.linux
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 269548fa27e0089a8b8278fc4fc781d7f65a939b
 runc version: ff819c7e9184c13b7c2607fe6c30ae19403a7aff
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 5.8.0-43-generic
 Operating System: Ubuntu 20.04.2 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 32
 Total Memory: 62.76GiB
 Name: shadow
 ID: KEHF:5IP6:4YGJ:7IBE:DNJP:63AT:7QW5:7QTB:GAWU:5ZBX:R6ZI:5QZH
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Username: nigelgbanks
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  isle-buildkit.registry
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: No blkio weight support
WARNING: No blkio weight_device support

Steps to reproduce.

Setup the builder:

docker buildx create --driver docker-container --driver-opt image=moby/buildkit:v0.8.1 --name mybuilder --use

Enable multi-arch builds via Qemu:

docker run --rm --privileged multiarch/qemu-user-static --reset -p yes

Attempt to build:

cat <<EOF | docker buildx build --platform linux/amd64,linux/arm64 --progress=plain - 2>&1 | tee /tmp/buildx.log
# syntax=docker/dockerfile:1.2.1
FROM alpine:3.11.6
ARG TARGETARCH

RUN --mount=type=cache,id=apk-${TARGETARCH},sharing=locked,target=/var/cache/apk \
    ln -s /var/cache/apk /etc/apk/cache && \
    ls -lah /var/cache/apk && \
    apk --update add bash

RUN uname -a
EOF

See the error in the logs in which apk cannot be shared across architectures.

time="2021-02-21T12:53:34Z" level=warning msg="No output specified for docker-container driver. Build result will only remain in the build cache. To push result image into registry use --push or to load image into docker use --load"
#1 [internal] load build definition from Dockerfile
#1 sha256:3112bc96a6b56a07758931d2d87df6a4b8538e9ef12e6d51a35d9ef155059e13
#1 transferring dockerfile: 307B done
#1 DONE 0.0s

#2 [internal] load .dockerignore
#2 sha256:d20b74503e1898f5f0b2e929754a85ea4c726e5b9fef59b9ad29e06b6ad4bff6
#2 transferring context: 2B done
#2 DONE 0.0s

#3 resolve image config for docker.io/docker/dockerfile:1.2.1
#3 sha256:e2eea6e332c2951d136dc2800ec0a1e761200c2009e8ecdbe04a72e1d1f0101e
#3 DONE 0.3s

#4 docker-image://docker.io/docker/dockerfile:1.2.1@sha256:e2a8561e419ab1ba6b2fe6cbdf49fd92b95912df1cf7d313c3e2230a333fdbcc
#4 sha256:8175da8a6e0e74a79495957618a3a106bbaee92edb98ae4056ef9be5a4cc7bfb
#4 resolve docker.io/docker/dockerfile:1.2.1@sha256:e2a8561e419ab1ba6b2fe6cbdf49fd92b95912df1cf7d313c3e2230a333fdbcc done
#4 CACHED

#5 [linux/amd64 internal] load metadata for docker.io/library/alpine:3.11.6
#5 sha256:72e8bbabb3ade051a97252e0d75092a60fd5058efe39a29fd14e233788ce2fd2
#5 DONE 1.6s

#6 [linux/arm64 internal] load metadata for docker.io/library/alpine:3.11.6
#6 sha256:33b1d3696399a3a6c7a4478d2956360ccf1467ae78e0768b0ac0740f149c356b
#6 DONE 1.6s

#10 [linux/arm64 stage-0 1/3] FROM docker.io/library/alpine:3.11.6@sha256:9a839e63dad54c3a6d1834e29692c8492d93f90c59c978c1ed79109ea4fb9a54
#10 sha256:9652e6178784dc0a7233dee9b77e47003f97dcb13b9bbf4372481699bc9c8e63
#10 resolve docker.io/library/alpine:3.11.6@sha256:9a839e63dad54c3a6d1834e29692c8492d93f90c59c978c1ed79109ea4fb9a54 0.0s done
#10 sha256:29e5d40040c18c692ed73df24511071725b74956ca1a61fe6056a651d86a13bd 0B / 2.72MB 0.2s
#10 ...

#7 [linux/amd64 stage-0 1/3] FROM docker.io/library/alpine:3.11.6@sha256:9a839e63dad54c3a6d1834e29692c8492d93f90c59c978c1ed79109ea4fb9a54
#7 sha256:a62f2966486d24bb70844e943deaecb08763cc132b58cd96b5962e2fd24364a2
#7 resolve docker.io/library/alpine:3.11.6@sha256:9a839e63dad54c3a6d1834e29692c8492d93f90c59c978c1ed79109ea4fb9a54 0.0s done
#7 sha256:cbdbe7a5bc2a134ca8ec91be58565ec07d037386d1f1d8385412d224deafca08 2.81MB / 2.81MB 2.7s done
#7 extracting sha256:cbdbe7a5bc2a134ca8ec91be58565ec07d037386d1f1d8385412d224deafca08 0.1s done
#7 DONE 2.8s

#10 [linux/arm64 stage-0 1/3] FROM docker.io/library/alpine:3.11.6@sha256:9a839e63dad54c3a6d1834e29692c8492d93f90c59c978c1ed79109ea4fb9a54
#10 sha256:9652e6178784dc0a7233dee9b77e47003f97dcb13b9bbf4372481699bc9c8e63
#10 sha256:29e5d40040c18c692ed73df24511071725b74956ca1a61fe6056a651d86a13bd 1.05MB / 2.72MB 4.4s
#10 ...

#8 [linux/amd64 stage-0 2/3] RUN --mount=type=cache,id=apk-amd64,sharing=private,target=/var/cache/apk     ln -s /var/cache/apk /etc/apk/cache &&     ls -lah /var/cache/apk &&     apk --update add bash
#8 sha256:c3334a2f44bf0c62149131b97c5a14f38fb320031fd1359322e8fdfaa48d6f52
#8 0.093 total 8K     
#8 0.093 drwxr-xr-x    2 root     root        4.0K Feb 21 12:53 .
#8 0.093 drwxr-xr-x    4 root     root        4.0K Apr 23  2020 ..
#8 0.096 fetch http://dl-cdn.alpinelinux.org/alpine/v3.11/main/x86_64/APKINDEX.tar.gz
#8 0.871 fetch http://dl-cdn.alpinelinux.org/alpine/v3.11/community/x86_64/APKINDEX.tar.gz
#8 1.796 (1/4) Installing ncurses-terminfo-base (6.1_p20200118-r4)
#8 1.833 (2/4) Installing ncurses-libs (6.1_p20200118-r4)
#8 2.056 (3/4) Installing readline (8.0.1-r0)
#8 2.181 (4/4) Installing bash (5.0.11-r1)
#8 2.584 Executing bash-5.0.11-r1.post-install
#8 2.586 Executing busybox-1.31.1-r9.trigger
#8 2.590 OK: 8 MiB in 18 packages
#8 DONE 2.7s

#10 [linux/arm64 stage-0 1/3] FROM docker.io/library/alpine:3.11.6@sha256:9a839e63dad54c3a6d1834e29692c8492d93f90c59c978c1ed79109ea4fb9a54
#10 sha256:9652e6178784dc0a7233dee9b77e47003f97dcb13b9bbf4372481699bc9c8e63
#10 ...

#9 [linux/amd64 stage-0 3/3] RUN uname -a
#9 sha256:9f16e2418f0bd8ce33be1f2902e6c32e8fd9c6e9a60f9cc35b9a235cf7cba1c3
#9 0.099 Linux buildkitsandbox 5.8.0-43-generic #49~20.04.1-Ubuntu SMP Fri Feb 5 09:57:56 UTC 2021 x86_64 Linux
#9 DONE 0.1s

#10 [linux/arm64 stage-0 1/3] FROM docker.io/library/alpine:3.11.6@sha256:9a839e63dad54c3a6d1834e29692c8492d93f90c59c978c1ed79109ea4fb9a54
#10 sha256:9652e6178784dc0a7233dee9b77e47003f97dcb13b9bbf4372481699bc9c8e63
#10 sha256:29e5d40040c18c692ed73df24511071725b74956ca1a61fe6056a651d86a13bd 2.10MB / 2.72MB 5.9s
#10 sha256:29e5d40040c18c692ed73df24511071725b74956ca1a61fe6056a651d86a13bd 2.72MB / 2.72MB 6.2s
#10 sha256:29e5d40040c18c692ed73df24511071725b74956ca1a61fe6056a651d86a13bd 2.72MB / 2.72MB 6.2s done
#10 extracting sha256:29e5d40040c18c692ed73df24511071725b74956ca1a61fe6056a651d86a13bd 0.1s done
#10 DONE 6.3s

#11 [linux/arm64 stage-0 2/3] RUN --mount=type=cache,id=apk-arm64,sharing=private,target=/var/cache/apk     ln -s /var/cache/apk /etc/apk/cache &&     ls -lah /var/cache/apk &&     apk --update add bash
#11 sha256:7e80004997278004490c5cdc97158eb9bb7b2f7b61a11c3286f5c036ff37d378
#11 0.095 total 2M     
#11 0.098 drwxr-xr-x    2 root     root        4.0K Feb 21 12:53 .
#11 0.098 drwxr-xr-x    4 root     root        4.0K Apr 23  2020 ..
#11 0.098 -rw-r--r--    1 root     root      704.8K Feb 21 12:53 APKINDEX.70f61090.tar.gz
#11 0.098 -rw-r--r--    1 root     root      834.6K Feb 21 12:53 APKINDEX.ca2fea5b.tar.gz
#11 0.098 -rw-r--r--    1 root     root      399.3K Nov 21  2019 bash-5.0.11-r1.e628afa7.apk
#11 0.098 -rw-r--r--    1 root     root           0 Feb 21 12:53 installed
#11 0.098 -rw-r--r--    1 root     root      203.1K Apr 29  2020 ncurses-libs-6.1_p20200118-r4.6ca4068f.apk
#11 0.098 -rw-r--r--    1 root     root       19.0K Apr 29  2020 ncurses-terminfo-base-6.1_p20200118-r4.db5715ef.apk
#11 0.098 -rw-r--r--    1 root     root      114.6K Nov 21  2019 readline-8.0.1-r0.cd7077a6.apk
#11 0.130 WARNING: Ignoring APKINDEX.70f61090.tar.gz: UNTRUSTED signature
#11 0.131 WARNING: Ignoring APKINDEX.ca2fea5b.tar.gz: UNTRUSTED signature
#11 0.134 ERROR: unsatisfiable constraints:
#11 0.135   bash (missing):
#11 0.140     required by: world[bash]
#11 ERROR: executor failed running [/dev/.buildkit_qemu_emulator /bin/sh -c ln -s /var/cache/apk /etc/apk/cache &&     ls -lah /var/cache/apk &&     apk --update add bash]: exit code: 1
------
 > [linux/arm64 stage-0 2/3] RUN --mount=type=cache,id=apk-arm64,sharing=private,target=/var/cache/apk     ln -s /var/cache/apk /etc/apk/cache &&     ls -lah /var/cache/apk &&     apk --update add bash:
------
Dockerfile:5
--------------------
   4 |     
   5 | >>> RUN --mount=type=cache,id=apk-${TARGETARCH},sharing=private,target=/var/cache/apk \
   6 | >>>     ln -s /var/cache/apk /etc/apk/cache && \
   7 | >>>     ls -lah /var/cache/apk && \
   8 | >>>     apk --update add bash
   9 |     
--------------------
error: failed to solve: rpc error: code = Unknown desc = executor failed running [/dev/.buildkit_qemu_emulator /bin/sh -c ln -s /var/cache/apk /etc/apk/cache &&     ls -lah /var/cache/apk &&     apk --update add bash]: exit code: 1

Note the outputs of ls -lah /var/cache/apk differs as amd64 wrote to the directory before arm64 read from it.

@tonistiigi
Copy link
Member

You can't use variables in --mount atm. moby/buildkit#815

@nigelgbanks
Copy link
Author

nigelgbanks commented Feb 26, 2021

@tonistiigi I take it then that it's impossible to use a --mount=type=cache in a safe way when doing multi-arch builds? If there is a workaround or different strategy I would love to know. Thanks!

@tonistiigi
Copy link
Member

The simplest is to put your cache in a subdir of the mount so it doesn't collide. Another way is to use from= to a base stage that is different for separate architectures.

@pirate
Copy link

pirate commented Nov 1, 2023

For anyone landing here from google, this solution works well (Edit: see better solution in later comment) for multi-arch builds without race conditions or multi-arch binary conflicts inside the cache.

RUN rm -f /etc/apt/apt.conf.d/docker-clean; echo 'Binary::apt::APT::Keep-Downloaded-Packages "true";' > /etc/apt/apt.conf.d/keep-cache

...

RUN --mount=type=cache,target=/var/cache/apt,sharing=private \
    echo "[+] Installing APT base system dependencies for $TARGETPLATFORM..." \
    && apt-get update -qq \
    && apt-get install -qq -y --no-install-recommends \
        apt-transport-https ca-certificates apt-utils gnupg2 curl wget \
        ...
    && rm -rf /var/lib/apt/lists/*
  • note the ,sharing=private at the end of the --mount=type=cache... line. the other option is sharing=locked (which allows sharing between builds but limits it to only one accessor at a time). I don't recommend locked for multi-arch builds as it prevents your builds from running in parallel at that point, and also you dont necessarily want the cache shared between different architectures (it will only cause trouble if it contains binaries for one arch and a different arch tries to load them)
  • also note the change to /etc/apt/apt.conf.d/keep-cache at the top is important to maximize Apt's use of the cache

More info here: moby/buildkit#1673

With these changes I achieved a >20x speed gain when rebuilding a frequently changed complex docker image with lots of apt lines.

@tonistiigi
Copy link
Member

@pirate If you are using this pattern for multi-platform then add $TARGETARCH to the id to scope it per arch. This makes sure that you get the cache for the correct architecture (that is not otherwise guaranteed) and then you can use locked as well with multiple platforms still building in parallel.

@pirate
Copy link

pirate commented Nov 1, 2023

Cool! Just noticed they merged support for variables in mount arguments: moby/buildkit#815

So the updated version would look like this? I added the $TARGETVARIANT to the id as well because I don't think arm/v7 and arm/v6 can share compiled packages, so it may best to separate them.

ARG TARGETPLATFORM
ARG TARGETOS
ARG TARGETARCH
ARG TARGETVARIANT

...

RUN rm -f /etc/apt/apt.conf.d/docker-clean; echo 'Binary::apt::APT::Keep-Downloaded-Packages "true";' > /etc/apt/apt.conf.d/keep-cache

...

RUN --mount=type=cache,id=apt-$TARGETARCH$TARGETVARIANT,sharing=locked,target=/var/cache/apt \
    && apt-get update -qq \
    && apt-get install -qq -y --no-install-recommends \
        apt-transport-https ca-certificates apt-utils gnupg2 curl wget ... \
    && rm -rf /var/lib/apt/lists/*

Edit: confirmed this works 👍 (build speeds are now 20x faster for small changes) and is now how we do it in ArchiveBox's Dockerfile

@mrquincle
Copy link

@pirate In your latest example you are using sharing=locked again. Is that on purpose?

@pirate
Copy link

pirate commented Dec 6, 2023

Yes, following @tonistiigi's comment the additional cache keys allow us to go back to using locked for faster builds.

@sanmai-NL
Copy link

sanmai-NL commented Dec 20, 2023

@pirate If you put the apt/lists in a cache too, you don't need to delete them anymore outside of batchwise garbage collection.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants