Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation/example for how to do multi-arch builds on a multi-arch Kubernetes cluster without emulation #516

Closed
adamnovak opened this issue Jan 27, 2021 · 14 comments

Comments

@adamnovak
Copy link

In #370, support was added for --append with the Kubernetes driver. This lets you add an amd64 builder on Kubernetes targeted to amd64 nodes, and ARM builders targeted to ARM nodes. This is only sort of hinted at in the documentation.

A bit of an example was given in the PR:

docker buildx create --use --name=buildkit --platform=linux/amd64 --node=buildkit-amd64 --driver=kubernetes --driver-opt="nodeselector=kubernetes.io/arch=amd64"
docker buildx create --append --name=buildkit --platform=linux/arm64 --node=buildkit-arm64 --driver=kubernetes --driver-opt="nodeselector=kubernetes.io/arch=arm64"

However, it would be helpful to have a fully worked example in the documentation, from builder creation through docker buildx build command. There should also be some more prose about how this allows you to build each image on an actual host of the appropriate architecture, if available, and push them all together to the same tag at the end.

The Right Way to handle 32-bit ARM would be nice to see here as well; can it just be another platform on the 64-bit ARM hosts?

It would also be good to show how/whether other client machines can connect to the same builder on the Kubernetes cluster, or if (some of?) the setup needs to be repeated for e.g. each CI job that wants to build a multi-arch image.

@tonistiigi
Copy link
Member

@morlay

@morlay
Copy link
Collaborator

morlay commented Jan 29, 2021

@adamnovak

builder need to create in each CI job before build.

With kubernetes driver, the deployment with same name (--node) in same namespace of buildkit will be shared.

in fact, docker buildx create just create metadata of driver.
the deployment of buildkit only deploy if it not exists in the assigned namespace of the k8s cluster.
even, you could use kubectl apply the deployment to k8s cluster first (i use this way)

so in each CI job, before build, could just run, for connect docker buildx client to buildkit in k8s cluster

docker buildx create --use --name=buildkit --platform=linux/amd64 --node=buildkit-amd64 --driver=kubernetes --driver-opt="namespace=buildkit,nodeselector=kubernetes.io/arch=amd64"
docker buildx create --append --name=buildkit --platform=linux/arm64 --node=buildkit-arm64 --driver=kubernetes --driver-opt="namespace=buildkit,nodeselector=kubernetes.io/arch=arm64"


# not like x86, i386 could be run on x86_64 host.
# without emulation like qemu, arm64 host only support arm64
# so have to add a arm32 node into k8s cluster, 
# and it should be append too
docker buildx create --append --name=buildkit --platform=linux/arm/v7 --node=buildkit-arm --driver=kubernetes --driver-opt="namespace=buildkit,nodeselector=kubernetes.io/arch=arm"
# not sure nodeselector is correct.  i haven't arm32 host.

after build, don't rundocker buildx rm.

if you don't want to run the scripts in each build,
could copy the created builder info file from ~/.docker/buildx/instances/<name>, after local create.
and keep KUBECONFIG file in same path in CI

$ cat ~/.docker/buildx/instances/buildkit 
{"Name":"buildkit","Driver":"kubernetes","Nodes":[{"Name":"buildkit-amd64","Endpoint":"kubernetes:///buildkit?deployment=buildkit-amd64\u0026kubeconfig=%2FUsers%2Fmorlay%2F.kube%2Fconfig--hw-dev.yaml","Platforms":[{"architecture":"amd64","os":"linux"}],"Flags":null,"ConfigFile":"","DriverOpts":{"namespace":"gitlab"}},{"Name":"buildkit-arm64","Endpoint":"kubernetes:///buildkit?deployment=buildkit-arm64\u0026kubeconfig=%2FUsers%2Fmorlay%2F.kube%2Fconfig--hw-dev.yaml","Platforms":[{"architecture":"arm64","os":"linux"}],"Flags":null,"ConfigFile":"","DriverOpts":{"namespace":"gitlab"}}],"Dynamic":false}% 

There should also be some more prose about how this allows you to build each image on an actual host of the appropriate architecture, if available, and push them all together to the same tag at the end.

This is same for all driver. it about how client / buildkit works.
But i don't think i could explain clear this part. @tonistiigi could you help

@adamnovak
Copy link
Author

This is all good info @morlay.

That's a useful point about setting up the deployment in advance. I think if I do that I can give the pods requests and limits to work around #210. Right now my cluster assigns some very low limits to anything that doesn't provide its own, so I don't think I can successfully build anything but the smallest containers.

Any tips on how to get BuildKit on Kubernetes to pull from Docker Hub through my cluster's caching registry, to avoid the pull limits? When using the Docker Daemon I have to set up an /etc/docker/daemon.json like:

{"registry-mirrors": ["http://docker-registry.toil:5000"], "insecure-registries": ["docker-registry.toil:5000"]}

When pulling layers does BuildKit just pull through Kubernetes's container fetch mechanism? Or does it have its own config? Or does it run the Docker Daemon in its pod and I need to inject this config into the BuildKit pods before it starts?

@adamnovak
Copy link
Author

I've just tested dropping that /etc/docker/daemon.json into the buildkitd container, with a customized Deployment, but BuildKit doesn't seem to obey it.

@morlay
Copy link
Collaborator

morlay commented Feb 1, 2021

@adamnovak

buildkit running with containerd not docker.

should update /etc/buildkit/buildkitd.toml (could use configmap to mount it)

[registry."docker.io"]
mirrors = ["http://docker-registry.toil:5000"]
http = true
insecure = true

see more https://github.com/moby/buildkit/blob/master/docs/buildkitd.toml.md

@adamnovak
Copy link
Author

@morlay Thanks for the tip!

This doesn't quite work for a couple reasons. The docs you linked show that the right format is to leave off the protocol scheme (so more like):

[registry."docker.io"]
mirrors = ["docker-registry.toil:5000"]
http = true
insecure = true

When I do that, it still doesn't work; I think the problem is that the mirror value is just passed along as a hostname, and the port is never parsed out, so if I'm running an HTTP mirror it needs to be on port 80.

I will try moving the mirror to port 80 and seeing if that works.

@adamnovak
Copy link
Author

adamnovak commented Feb 1, 2021

Changing the port to 80 and passing just the hostname didn't seem to help.
After setting the command in my deployment's buildkitd container to buildkitd --debug, I saw that it was making requests to my mirror with https, and then to Docker Hub with HTTP. According to the parsing code, if I want to mark the mirror as insecure/http, I need a separate section for it:

[registry."docker.io"]
mirrors = ["docker-registry.toil"]
[registry."docker-registry.toil"]
http = true
insecure = true

I plugged that in and it seems to be working now. Thanks!

@adamnovak
Copy link
Author

To get the multi-arch builds working with emulation, I had to add tonistiigi/binfmt to my deployment as another initContainer and tell it to --install arm64. Otherwise, emulated builds would start fine, and RUN commands would even work, but as soon as a process tried to launch another process it would fail:

root@adamnovak-pod:/dind# docker buildx build --platform=linux/arm64 -f Dockerfile .
WARN[0000] No output specified for kubernetes driver. Build result will only remain in the build cache. To push result image into registry use --push or to load image into docker use --load 
[+] Building 170.3s (8/14)                                                                                                                                                                                 
 => [internal] load build definition from Dockerfile                                                                                                                                                  0.0s
 => => transferring dockerfile: 1.22kB                                                                                                                                                                0.0s
 => [internal] load .dockerignore                                                                                                                                                                     0.0s
 => => transferring context: 2B                                                                                                                                                                       0.0s
 => [internal] load metadata for docker.io/library/ubuntu:20.04                                                                                                                                       0.6s
 => [ 1/10] FROM docker.io/library/ubuntu:20.04@sha256:703218c0465075f4425e58fac086e09e1de5c340b12976ab9eb8ad26615c3715                                                                               0.0s
 => => resolve docker.io/library/ubuntu:20.04@sha256:703218c0465075f4425e58fac086e09e1de5c340b12976ab9eb8ad26615c3715                                                                                 0.0s
 => [internal] load build context                                                                                                                                                                     0.0s
 => => transferring context: 93B                                                                                                                                                                      0.0s
 => CACHED [ 2/10] RUN echo "I am running on linux/amd64, building for linux/arm64"                                                                                                                   0.0s
 => [ 3/10] RUN echo "dpkg-split: $(stat /usr/sbin/dpkg-split)"                                                                                                                                       0.2s
 => ERROR [ 4/10] RUN DEBIAN_FRONTEND=noninteractive apt-get update -qq &&     DEBIAN_FRONTEND=noninteractive apt-get install -qqy     apt-transport-https     ca-certificates     curl     lxc     169.4s
------                                                                                                                                                                                                     
 > [ 4/10] RUN DEBIAN_FRONTEND=noninteractive apt-get update -qq &&     DEBIAN_FRONTEND=noninteractive apt-get install -qqy     apt-transport-https     ca-certificates     curl     lxc     iptables     sudo     docker.io     containerd &&     apt-get clean:                                                                                                                                                     
#7 169.0 debconf: delaying package configuration, since apt-utils is not installed                                                                                                                         
#7 169.2 Error while loading /usr/sbin/dpkg-split: No such file or directory                                                                                                                               
#7 169.2 Error while loading /usr/sbin/dpkg-deb: No such file or directory                                                                                                                                 
#7 169.2 dpkg: error processing archive /var/cache/apt/archives/libssl1.1_1.1.1f-1ubuntu2.1_arm64.deb (--unpack):
#7 169.2  dpkg-deb --control subprocess returned error exit status 1
#7 169.2 Error while loading /usr/sbin/dpkg-split: No such file or directory
#7 169.2 Error while loading /usr/sbin/dpkg-deb: No such file or directory
#7 169.2 dpkg: error processing archive /var/cache/apt/archives/libpython3.8-minimal_3.8.5-1~20.04_arm64.deb (--unpack):
#7 169.2  dpkg-deb --control subprocess returned error exit status 1
#7 169.2 Error while loading /usr/sbin/dpkg-split: No such file or directory
#7 169.2 Error while loading /usr/sbin/dpkg-deb: No such file or directory
#7 169.2 dpkg: error processing archive /var/cache/apt/archives/libexpat1_2.2.9-1build1_arm64.deb (--unpack):
#7 169.2  dpkg-deb --control subprocess returned error exit status 1
#7 169.2 Error while loading /usr/sbin/dpkg-split: No such file or directory
#7 169.2 Error while loading /usr/sbin/dpkg-deb: No such file or directory
#7 169.2 dpkg: error processing archive /var/cache/apt/archives/python3.8-minimal_3.8.5-1~20.04_arm64.deb (--unpack):
#7 169.2  dpkg-deb --control subprocess returned error exit status 1
#7 169.2 Errors were encountered while processing:
#7 169.2  /var/cache/apt/archives/libssl1.1_1.1.1f-1ubuntu2.1_arm64.deb
#7 169.2  /var/cache/apt/archives/libpython3.8-minimal_3.8.5-1~20.04_arm64.deb
#7 169.2  /var/cache/apt/archives/libexpat1_2.2.9-1build1_arm64.deb
#7 169.2  /var/cache/apt/archives/python3.8-minimal_3.8.5-1~20.04_arm64.deb
#7 169.4 E: Sub-process /usr/bin/dpkg returned an error code (1)
------
Dockerfile:13
--------------------
  12 |     # should be sufficiently new to run in a container.
  13 | >>> RUN DEBIAN_FRONTEND=noninteractive apt-get update -qq && \
  14 | >>>     DEBIAN_FRONTEND=noninteractive apt-get install -qqy \
  15 | >>>     apt-transport-https \
  16 | >>>     ca-certificates \
  17 | >>>     curl \
  18 | >>>     lxc \
  19 | >>>     iptables \
  20 | >>>     sudo \
  21 | >>>     docker.io \
  22 | >>>     containerd && \
  23 | >>>     apt-get clean
  24 |     
--------------------
error: failed to solve: rpc error: code = Unknown desc = executor failed running [/dev/.buildkit_qemu_emulator /bin/sh -c DEBIAN_FRONTEND=noninteractive apt-get update -qq &&     DEBIAN_FRONTEND=noninteractive apt-get install -qqy     apt-transport-https     ca-certificates     curl     lxc     iptables     sudo     docker.io     containerd &&     apt-get clean]: exit code: 100

@morlay
Copy link
Collaborator

morlay commented Feb 3, 2021

@adamnovak privileged ?

initContainers:
  - name: qemu
    image: "{{ .Values.imageBinfmt.hub }}/binfmt:{{ .Values.imageBinfmt.tag }}"
    args:
      - --install
      - amd64,arm64
    securityContext:
      privileged: true

@tonistiigi
Copy link
Member

@adamnovak The error above should be fixed with qemu update in moby/buildkit#1953 . Can you test with --driver-opt image=moby/buildkit:master and report back. Still probably safer to use tonistiigi/binfmt image if you know you will need emulators (allow you to control version etc).

@adamnovak
Copy link
Author

@morlay That looks a bit like what I put in:

      - name: binfmt
        # We need this to set up the emulators in the host kernel for running
        # ARM binaries. This image tells the kernel to go looking for qemu when
        # it finds an ARM binary.
        image: tonistiigi/binfmt
        imagePullPolicy: IfNotPresent
        args: ["--install", "arm64,arm"]
        resources:
          requests:
            cpu: 500m
            memory: "1Gi"
            ephemeral-storage: "10Gi"
          limits:
            cpu: 500m
            memory: "1Gi"
            ephemeral-storage: "10Gi"
        securityContext:
          privileged: true

@tonistiigi I'm not letting buildx set up the deployment anymore by itself. If I deploy moby/buildkit:master instead of moby/buildkit:buildx-stable-1 you think I won't need the init container to install the emulators anymore? I could try that, although I think the cluster nodes might all have the emulators installed now.

@mylesagray
Copy link

mylesagray commented Feb 18, 2021

@tonistiigi I installed buildkit on my arm64 cluster using the moby/buildkit:master image as well as created a DaemonSet to run tonistiigi/binfmt on each node in the cluster:

# Create K8s ns
kubectl create ns buildkit-emu
kubectl create ns qemu-binfmt

# Run buildfmt on all nodes using a DaemonSet
kubectl apply -f https://github.com/mylesagray/home-cluster-gitops/blob/master/manifests/qemu-binfmt/ds.yaml

# Initialise buildx on K8s cluster
docker buildx create --use --name=buildkit-emu --platform=linux/amd64,linux/arm64,linux/arm --driver=kubernetes --driver-opt="namespace=buildkit-emu,replicas=3,image=moby/buildkit:master"

# Create buildx pods on K8s cluster
docker buildx inspect --bootstrap

# Inspect nodes
docker buildx inspect buildkit-emu
Name:   buildkit-emu
Driver: kubernetes

Nodes:
Name:      buildkit-emu0-78b4c5cbc8-2k6z4
Endpoint:  
Status:    running
Platforms: linux/amd64*, linux/arm64*, linux/riscv64, linux/ppc64le, linux/s390x, linux/386, linux/mips64le, linux/mips64, linux/arm/v7, linux/arm/v6

Name:      buildkit-emu0-78b4c5cbc8-mmbf8
Endpoint:  
Status:    running
Platforms: linux/amd64*, linux/arm64*, linux/riscv64, linux/ppc64le, linux/s390x, linux/386, linux/mips64le, linux/mips64, linux/arm/v7, linux/arm/v6

Name:      buildkit-emu0-78b4c5cbc8-nbj56
Endpoint:  
Status:    running
Platforms: linux/amd64*, linux/arm64*, linux/riscv64, linux/ppc64le, linux/s390x, linux/386, linux/mips64le, linux/mips64, linux/arm/v7, linux/arm/v6

But I am still having multi-arch builds fail (here, for amd64 on the arm64 cluster) - strangely, it gets through all the RUN and apt commands, but seems to fail when either doing go mod download or compiling the application.

It always seems to fail with an illegal instruction or a panic - to me saying there is something up with qemu?

❯ docker buildx use buildkit-emu
❯ docker buildx build --platform linux/amd64,linux/arm64,linux/arm -t $IMAGEREPO/argocd-notifications:v$(cat VERSION) --push .
[+] Building 2345.7s (38/43)
 => => sha256:baf6642121709e17d1419901978da7d29b673d5f936e42ec3241b7d7157e9541 9.34MB / 9.34MB                                                                                                            12.4s

........ SNIP ........

 => [linux/amd64 builder 7/9] COPY . .                                                                                                                                                                    45.0s
 => ERROR [linux/amd64 builder 8/9] RUN CGO_ENABLED=0 GOOS=linux go build -ldflags="-w -s" -o /app/argocd-notifications ./cmd                                                                            817.3s
------
 > [linux/amd64 builder 8/9] RUN CGO_ENABLED=0 GOOS=linux go build -ldflags="-w -s" -o /app/argocd-notifications ./cmd:
#15 274.6 # github.com/modern-go/reflect2
#15 274.6 SIGILL: illegal instruction
#15 274.6 PC=0x0 m=4 sigcode=0
#15 274.6 instruction bytes:qemu: uncaught target signal 11 (Segmentation fault) - core dumped
#15 318.3 # golang.org/x/net/http2
#15 318.3 qemu: uncaught target signal 11 (Segmentation fault) - core dumped
------
Dockerfile:14
--------------------
  12 |     # Perform the build
  13 |     COPY . .
  14 | >>> RUN CGO_ENABLED=0 GOOS=linux go build -ldflags="-w -s" -o /app/argocd-notifications ./cmd
  15 |     RUN ln -s /app/argocd-notifications /app/argocd-notifications-backend
  16 |
--------------------
error: failed to solve: rpc error: code = Unknown desc = executor failed running [/bin/sh -c CGO_ENABLED=0 GOOS=linux go build -ldflags="-w -s" -o /app/argocd-notifications ./cmd]: exit code: 2

Another failure (same build system, same nodes, same manifests - different failure location):

------
 > [linux/amd64 builder 6/9] RUN go mod download:
#36 725.8 qemu: uncaught target signal 11 (Segmentation fault) - core dumped
#36 744.8 Segmentation fault (core dumped)
------
Dockerfile:10
--------------------
   8 |     COPY go.sum /src/go.sum
   9 |
  10 | >>> RUN go mod download
  11 |
  12 |     # Perform the build
--------------------
error: failed to solve: rpc error: code = Unknown desc = executor failed running [/bin/sh -c go mod download]: exit code: 139

@morlay
Copy link
Collaborator

morlay commented Feb 18, 2021

@mylesagray

it is qemu issue.

qemu x86_64 not work well for golang compiling on aarch64 host.

if you use pure go, could set GOARCH=$TARGETARCH

Example: https://github.com/jaegertracing/jaeger-operator/blob/master/build/Dockerfile#L22

Notice the line 1 too.

@mylesagray
Copy link

@morlay Great call, thank you!

I adjusted the Dockerfile to include --platform=$BUILDPLATFORM on the builder FROM line, added the following ARGs:

ARG TARGETOS
ARG TARGETARCH
ARG TARGETPLATFORM
ARG BUILDPLATFORM

And adjusted my build to:

RUN CGO_ENABLED=0 GOOS=${TARGETOS} GOARCH=${TARGETARCH} go build -ldflags="-w -s" -o /app/argocd-notifications ./cmd

And that seems to have fixed it - it took me a while, however, to realise that when using --platform=$BUILDPLATFORM on the builder image; that it only runs that once rather than creating a builder per-architecture, making more reliable and faster.

My working Dockerfile, in full for those that stumble across this in future:

FROM --platform=$BUILDPLATFORM golang:1.15.3 as builder

RUN apt-get update && apt-get install ca-certificates

WORKDIR /src

ARG TARGETOS
ARG TARGETARCH
ARG TARGETPLATFORM
ARG BUILDPLATFORM

COPY go.mod /src/go.mod
COPY go.sum /src/go.sum

RUN go mod download

# Perform the build
COPY . .
RUN CGO_ENABLED=0 GOOS=${TARGETOS} GOARCH=${TARGETARCH} go build -ldflags="-w -s" -o /app/argocd-notifications ./cmd
RUN ln -s /app/argocd-notifications /app/argocd-notifications-backend

FROM scratch

COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ca-certificates.crt
COPY --from=builder /app/argocd-notifications /app/argocd-notifications
COPY --from=builder /app/argocd-notifications-backend /app/argocd-notifications-backend

# User numeric user so that kubernetes can assert that the user id isn't root (0).
# We are also using the root group (the 0 in 1000:0), it doesn't have any
# privileges, as opposed to the root user.
USER 1000:0

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Projects
None yet
Development

No branches or pull requests

5 participants