Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Building with cache-from and BUILDKIT_INLINE_CACHE args breaks reproducible docker builds #1876

Open
rishisv opened this issue Dec 4, 2020 · 3 comments

Comments

@rishisv
Copy link

rishisv commented Dec 4, 2020

Seeing this issue where when we are using --cache-from and BUILDKIT_INLINE_CACHE=1 build args, we no longer have reproducible docker builds when there are no content changes. We see image ID changes with every build even when there are no content changes. If the BUILDKIT_INLINE_CACHE=1 build arg is removed then we do have reproducible builds and the image ID remains constant. Including a reproduction script below. Slack thread here

✗ docker version
Client: Docker Engine - Community
 Version:           19.03.8
 API version:       1.40
 Go version:        go1.12.17
 Git commit:        afacb8b
 Built:             Wed Mar 11 01:21:11 2020
 OS/Arch:           darwin/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.8
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.17
  Git commit:       afacb8b
  Built:            Wed Mar 11 01:29:16 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          v1.2.13
  GitCommit:        7ad184331fa3e55e52b890ea95e65ba581ae3429
 runc:
  Version:          1.0.0-rc10
  GitCommit:        dc9208a3303feef5b3839f4323d9beb36df0a9dd
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

Reproduction bash script below, takes a remote registry repo as argument.

#!/bin/bash
set -exv

[ -z $1 ] && echo "No remote registry and repository argument specified, exiting" && exit 1

REGISTRY_REPO=$1
length=${#REGISTRY_REPO}
last_char=${REGISTRY_REPO:length-1:1}

[[ $last_char == "/" ]] && REGISTRY_REPO=${REGISTRY_REPO:0:length-1}

echo "Using remote registry and repo:" $REGISTRY_REPO

cat > /tmp/Dockerfile.test <<EOF

FROM python:3.8.2-buster

WORKDIR /opt/bin
ENV PATH "/opt/bin:$PATH"

WORKDIR /
RUN echo "deb http://deb.debian.org/debian buster-backports main" >> /etc/apt/sources.list

RUN apt-get update && \
	 apt-get -t buster-backports install -y --no-install-recommends etcd-client bison flex graphviz graphviz-dev protobuf-compiler libprotobuf-dev libprotoc-dev golang-go && \
	  rm -rf /var/lib/apt/lists/*

RUN go get github.com/golang/protobuf/protoc-gen-go
RUN go get google.golang.org/grpc/cmd/protoc-gen-go-grpc

RUN pip install -U pip
RUN pip install 'poetry==1.1.2'

WORKDIR /src
EOF

cd /tmp/
DOCKER_BUILDKIT=1 docker build --build-arg BUILDKIT_INLINE_CACHE=1  -t ${REGISTRY_REPO}/bug-repro-test:latest -f Dockerfile.test .

docker push ${REGISTRY_REPO}/bug-repro-test:latest

# Now delete the image we built for our test
docker rmi ${REGISTRY_REPO}/bug-repro-test:latest

# Lets build using cache-from and verify if we have reproducible builds

DOCKER_BUILDKIT=1 docker build --cache-from=${REGISTRY_REPO}/bug-repro-test:latest  -t ${REGISTRY_REPO}/bug-repro-test:check -f Dockerfile.test .
imageid1="`docker images --format "{{.ID}}" ${REGISTRY_REPO}/bug-repro-test:check`"

DOCKER_BUILDKIT=1 docker build --cache-from=${REGISTRY_REPO}/bug-repro-test:latest  -t ${REGISTRY_REPO}/bug-repro-test:check -f Dockerfile.test .
imageid2="`docker images --format "{{.ID}}" ${REGISTRY_REPO}/bug-repro-test:check`"

if [[ "$imageid1" != "$imageid2" ]];
then
	echo "Image IDs don't match, builds using cache-from are not reproducible"
fi

# Lets build using cache-from with BUILDKIT_INLINE_CACHE=1 and verify if we have reproducible builds

DOCKER_BUILDKIT=1 docker build --build-arg BUILDKIT_INLINE_CACHE=1 --cache-from=${REGISTRY_REPO}/bug-repro-test:latest  -t ${REGISTRY_REPO}/bug-repro-test:check -f Dockerfile.test .
imageid1="`docker images --format "{{.ID}}" ${REGISTRY_REPO}/bug-repro-test:check`"

DOCKER_BUILDKIT=1 docker build --build-arg BUILDKIT_INLINE_CACHE=1 --cache-from=${REGISTRY_REPO}/bug-repro-test:latest  -t ${REGISTRY_REPO}/bug-repro-test:check -f Dockerfile.test .
imageid2="`docker images --format "{{.ID}}" ${REGISTRY_REPO}/bug-repro-test:check`"

if [[ "$imageid1" != "$imageid2" ]];
then
	echo "Image IDs don't match, builds using cache-from and BUILDKIT_INLINE_CACHE flag are not reproducible"
fi
@adamf
Copy link

adamf commented Dec 4, 2020

We're seeing this as well and would love a fix!

@akshatflx
Copy link

Yes, by including the BUILDKIT_INLINE_CACHE=1 flag it builds a different docker image hash each time (even when there are no content changes).

Works as expected without this flag enabled.

@pkwarren
Copy link

I did some more investigation on this and found the following. The only config change between two builds with BUILDKIT_INLINE_CACHE=1 is to the moby.buildkit.cache.v0 key.

When I unmarshal that key (base64) and format it with jq -S on both images, I find the only differences are small places like this:

First image cache:

[
   {
     "digest": "sha256:807b3e8722c68572736f54d83fd741a5b1d69ccbef218fae52b4149a56d232ce",
     "inputs": [
       [
         {
          "link": 3
         }
       ]
     ]
   },
   {
     "digest": "sha256:807b3e8722c68572736f54d83fd741a5b1d69ccbef218fae52b4149a56d232ce",
     "inputs": [
       [
         {
          "link": 4
         }
       ]
     ]
   }
]

Second image cache:

[
   {
     "digest": "sha256:807b3e8722c68572736f54d83fd741a5b1d69ccbef218fae52b4149a56d232ce",
     "inputs": [
       [
         {
          "link": 4
         }
       ]
     ]
   },
   {
     "digest": "sha256:807b3e8722c68572736f54d83fd741a5b1d69ccbef218fae52b4149a56d232ce",
     "inputs": [
       [
         {
          "link": 3
         }
       ]
     ]
   }
]

I notice there is a method called sortConfig which is supposed to ensure deterministic sort of these caches (probably to avoid this type of problem):

func sortConfig(cc *CacheConfig) {

I don't know enough however about the file format to know what the right fix is. There are some entries in the files that look like this:

  {
    "digest": "sha256:c963489980ecadec4c2b06eb21b9d6d981669cb00c825aaf97831a79c5a4a5b5",
    "inputs": [
      [
        {
          "link": 1
        },
        {
          "link": 2
        }
      ]
    ]
  }

Is it valid to normalize those entries into a single object?

Attaching both buildkit cache files in their entirety in case this helps.

build-cache-v0-1.txt
build-cache-v0-2.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants