Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docker --cache-from with BUILDKIT_INLINE_CACHE does not work every second time #1981

Closed
Bi0max opened this issue Feb 16, 2021 · 25 comments
Closed

Comments

@Bi0max
Copy link

Bi0max commented Feb 16, 2021

I am trying to take advantage of the caching/pulling system of BUILDKIT for Docker for my CI/CD process. But it does not work as expected.
I created a dummy local example (but the same happens also in my CI system - AWS CodePipeline, and for both DockerHub and AWS ECR). You need to have Dockerfile, run_test.py (with any insides) and requirements.txt (with any insides) in a folder.
The Dockerfile:

# base image
FROM python:3.7-slim

# set working directory
WORKDIR /usr/src/app

# add and install requirements
RUN pip install --upgrade pip
COPY ./requirements.txt /usr/src/app/requirements.txt
RUN pip $PIP_PROXY install --no-cache-dir --compile -r requirements.txt

RUN echo 123
# add app
COPY ./run_test.py /usr/src/app/run_test.py

# run server
CMD ["python", "run_test.py"]

run_test.py is actually not interesting, but here is the code just in case:

import requests
import time

while True:
    time.sleep(1)
    print(requests)

In advance, I export two environment variables:

export DOCKER_BUILDKIT=1  # to activate buildkit
export DUMMY_IMAGE_URL=bi0max/test_docker

Then, to test I have the following command. First two commands remove local cache to resemble the CI environment, then build and push.
BE CAREFUL, CODE BELOW REMOVES LOCAL BUILD CACHE:

docker builder prune -a -f && \
(docker image rm $DUMMY_IMAGE_URL:latest || true) && \
docker build \
--cache-from $DUMMY_IMAGE_URL:latest \
--build-arg BUILDKIT_INLINE_CACHE=1 \
--tag $DUMMY_IMAGE_URL:latest "." && \
docker push $DUMMY_IMAGE_URL:latest

As expected, the first run just builds everything from scratch:

#2 [internal] load build definition from Dockerfile
#2 transferring dockerfile: 434B done
#2 DONE 0.0s

#1 [internal] load .dockerignore
#1 transferring context: 2B done
#1 DONE 0.1s

#3 [internal] load metadata for docker.io/library/python:3.7-slim
#3 DONE 0.0s

#12 [1/7] FROM docker.io/library/python:3.7-slim
#12 DONE 0.0s

#7 [internal] load build context
#7 DONE 0.0s

#4 importing cache manifest from bi0max/test_docker:latest
#4 ERROR: docker.io/bi0max/test_docker:latest not found

#12 [1/7] FROM docker.io/library/python:3.7-slim
#12 resolve docker.io/library/python:3.7-slim done
#12 DONE 0.0s

#7 [internal] load build context
#7 transferring context: 204B done
#7 DONE 0.1s

#5 [2/7] WORKDIR /usr/src/app
#5 DONE 0.0s

#6 [3/7] RUN pip install --upgrade pip
#6 1.951 Requirement already up-to-date: pip in /usr/local/lib/python3.7/site-packages (20.1.1)
#6 DONE 2.3s

#8 [4/7] COPY ./requirements.txt /usr/src/app/requirements.txt
#8 DONE 0.0s

#9 [5/7] RUN pip $PIP_PROXY install --no-cache-dir --compile -r requirement...
#9 0.750 Collecting requests==2.22.0
#9 0.848   Downloading requests-2.22.0-py2.py3-none-any.whl (57 kB)
#9 0.932 Collecting idna<2.9,>=2.5
#9 0.948   Downloading idna-2.8-py2.py3-none-any.whl (58 kB)
#9 0.995 Collecting chardet<3.1.0,>=3.0.2
#9 1.011   Downloading chardet-3.0.4-py2.py3-none-any.whl (133 kB)
#9 1.135 Collecting urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1
#9 1.153   Downloading urllib3-1.25.9-py2.py3-none-any.whl (126 kB)
#9 1.264 Collecting certifi>=2017.4.17
#9 1.282   Downloading certifi-2020.4.5.1-py2.py3-none-any.whl (157 kB)
#9 1.378 Installing collected packages: idna, chardet, urllib3, certifi, requests
#9 1.916 Successfully installed certifi-2020.4.5.1 chardet-3.0.4 idna-2.8 requests-2.22.0 urllib3-1.25.9
#9 DONE 2.2s

#10 [6/7] RUN echo 123
#10 0.265 123
#10 DONE 0.3s

#11 [7/7] COPY ./run_test.py /usr/src/app/run_test.py
#11 DONE 0.0s

#13 exporting to image
#13 exporting layers done
#13 writing image sha256:f98327afae246096725f7e54742fe9b25079f1b779699b099e66c8def1e19052 done
#13 naming to docker.io/bi0max/test_docker:latest done
#13 DONE 0.0s

#14 exporting cache
#14 preparing build cache for export done
#14 DONE 0.0s

Then, I slightly adjust run_test.py file and the result is again as expected. All the layers until the last step ([7/7] COPY) are downloaded from repository and reused.

#2 [internal] load .dockerignore
#2 transferring context: 2B done
#2 DONE 0.0s

#1 [internal] load build definition from Dockerfile
#1 transferring dockerfile: 434B done
#1 DONE 0.1s

#3 [internal] load metadata for docker.io/library/python:3.7-slim
#3 DONE 0.0s

#8 [internal] load build context
#8 DONE 0.0s

#4 [1/7] FROM docker.io/library/python:3.7-slim
#4 DONE 0.0s

#5 importing cache manifest from bi0max/test_docker:latest
#5 DONE 1.2s

#8 [internal] load build context
#8 transferring context: 193B done
#8 DONE 0.0s

#6 [2/7] WORKDIR /usr/src/app
#6 CACHED

#7 [3/7] RUN pip install --upgrade pip
#7 CACHED

#9 [4/7] COPY ./requirements.txt /usr/src/app/requirements.txt
#9 CACHED

#10 [5/7] RUN pip $PIP_PROXY install --no-cache-dir --compile -r requirement...
#10 CACHED

#11 [6/7] RUN echo 123
#11 pulling sha256:79fc69c08b391d082b4d2617faed489d220444fa0cf06953cdff55c667866bed
#11 pulling sha256:071624272167ab4e35a30eb1640cb3f15ced19c6cd10fa1c9d49763372e81c23
#11 pulling sha256:04ed4ecd76e1a110f468eb1a3173bbfa578c6b4c85a6dc82bf4a489ed8b8c54d
#11 pulling sha256:79fc69c08b391d082b4d2617faed489d220444fa0cf06953cdff55c667866bed 0.2s done
#11 pulling sha256:d6406c1ce2dc5e841233ebce164ee469388102cb98f1473adaeca15455d6d797
#11 pulling sha256:071624272167ab4e35a30eb1640cb3f15ced19c6cd10fa1c9d49763372e81c23 0.5s done
#11 pulling sha256:04ed4ecd76e1a110f468eb1a3173bbfa578c6b4c85a6dc82bf4a489ed8b8c54d 0.5s done
#11 pulling sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1
#11 pulling sha256:d6406c1ce2dc5e841233ebce164ee469388102cb98f1473adaeca15455d6d797 0.3s done
#11 pulling sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1 0.2s done
#11 CACHED

#12 [7/7] COPY ./run_test.py /usr/src/app/run_test.py
#12 DONE 0.0s

#13 exporting to image
#13 exporting layers done
#13 writing image sha256:f37692114f10b9a3646203569a0849af20774651f4aa0f5dc8d6f133fb7ff062 done
#13 naming to docker.io/bi0max/test_docker:latest done
#13 DONE 0.0s

#14 exporting cache
#14 preparing build cache for export done
#14 DONE 0.0s

Now, I change run_test.py again and I would expect docker to do the same thing as last time. But I get the following result, where it build everything from scratch:

#1 [internal] load .dockerignore
#1 transferring context: 2B done
#1 DONE 0.0s

#2 [internal] load build definition from Dockerfile
#2 transferring dockerfile: 434B done
#2 DONE 0.0s

#3 [internal] load metadata for docker.io/library/python:3.7-slim
#3 DONE 0.0s

#5 [1/7] FROM docker.io/library/python:3.7-slim
#5 DONE 0.0s

#8 [internal] load build context
#8 DONE 0.0s

#4 importing cache manifest from bi0max/test_docker:latest
#4 DONE 1.7s

#8 [internal] load build context
#8 transferring context: 182B done
#8 DONE 0.0s

#5 [1/7] FROM docker.io/library/python:3.7-slim
#5 resolve docker.io/library/python:3.7-slim done
#5 DONE 0.1s

#6 [2/7] WORKDIR /usr/src/app
#6 DONE 0.0s

#7 [3/7] RUN pip install --upgrade pip
#7 1.774 Requirement already up-to-date: pip in /usr/local/lib/python3.7/site-packages (20.1.1)
#7 DONE 2.1s

#9 [4/7] COPY ./requirements.txt /usr/src/app/requirements.txt
#9 DONE 0.0s

#10 [5/7] RUN pip $PIP_PROXY install --no-cache-dir --compile -r requirement...
#10 0.805 Collecting requests==2.22.0
#10 0.905   Downloading requests-2.22.0-py2.py3-none-any.whl (57 kB)
#10 1.079 Collecting urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1
#10 1.109   Downloading urllib3-1.25.9-py2.py3-none-any.whl (126 kB)
#10 1.242 Collecting certifi>=2017.4.17
#10 1.259   Downloading certifi-2020.4.5.1-py2.py3-none-any.whl (157 kB)
#10 1.336 Collecting idna<2.9,>=2.5
#10 1.353   Downloading idna-2.8-py2.py3-none-any.whl (58 kB)
#10 1.410 Collecting chardet<3.1.0,>=3.0.2
#10 1.428   Downloading chardet-3.0.4-py2.py3-none-any.whl (133 kB)
#10 1.545 Installing collected packages: urllib3, certifi, idna, chardet, requests
#10 2.102 Successfully installed certifi-2020.4.5.1 chardet-3.0.4 idna-2.8 requests-2.22.0 urllib3-1.25.9
#10 DONE 2.4s

#11 [6/7] RUN echo 123
#11 0.259 123
#11 DONE 0.3s

#12 [7/7] COPY ./run_test.py /usr/src/app/run_test.py
#12 DONE 0.0s

#13 exporting to image
#13 exporting layers done
#13 writing image sha256:f4ffb0e84e334b4b35fe2504de11012e5dc1ca5978eace055932e9bbbe83c93e done
#13 naming to docker.io/bi0max/test_docker:latest done
#13 DONE 0.0s

#14 exporting cache
#14 preparing build cache for export done
#14 DONE 0.0s

But the strangest thing for me is, when I change run_test.py for the third time, it uses cached layers again. And it continues in the same way: fourth time - doesn't use, fifth time - uses, etc...

Do I miss something here?

If I pull the image each time before building, then it always uses cache, but it also works in the same way without the BUILDKIT.

@immenz
Copy link

immenz commented Feb 17, 2021

I am expierencing quite a similar behaviour. In my case, I am using 3 multistage Dockerfiles. Strange thing is, the first stages are always cached, in one of the docker files the second stage is also always cached. In the other 2, this cached-non cached behaviour occured. Only difference i could spot is that in the dockerfiles with alternating cache usage, for both stages the same base docker image (From image:tag) is used.

@cgreening
Copy link

cgreening commented Feb 18, 2021

Seeing the same problem.

Asked on stack overflow - https://stackoverflow.com/questions/66224121/docker-cache-from-not-all-layers-being-run-if-no-files-have-changed

We have a very simple Dockerfile.

To speed up our builds we're using the --cache-from directive and using a previous build as a cache.

We're seeing some weird behaviour where if the files have not changed the lines after the COPY line are not being run.

RUN yarn && yarn build

Does not seem to get executed so when the application tries to start node_modules is missing.

FROM node

RUN mkdir /app
COPY . /app

WORKDIR /app
RUN yarn && yarn build

ENTRYPOINT ["yarn", "start"]

We're deploying to Kubernetes from GitHub actions but I can recreate the problem locally.

Initial build:

DOCKER_BUILDKIT=1 docker build -t gcr.io/XXX/test:a .  --build-arg BUILDKIT_INLINE_CACHE=1
docker push gcr.io/XXX/test:a

All works - node_modules and build folder are there:

Clean up docker as if we starting from scratch like on the build system

docker system prune -a

Do another build:

DOCKER_BUILDKIT=1 docker build -t gcr.io/XXX/test:b . --cache-from gcr.io/XXX/test:a --build-arg BUILDKIT_INLINE_CACHE=1
docker push gcr.io/XXX/test:a

Everything is still fine.

Clean up docker as if we starting from scratch like on the build system

docker system prune -a

Do a third build:

DOCKER_BUILDKIT=1 docker build -t gcr.io/XXX/test:c . --cache-from gcr.io/XXX/test:b --build-arg BUILDKIT_INLINE_CACHE=1

Files are missing!

docker run -it --entrypoint /bin/bash gcr.io/topo-wme-dev-d725ec6e/test:c
root@d07f6f1d3b12:/app# ls
DEVELOPING.md  Dockerfile  Makefile  README.md  admin-tools  app.dev.yaml  coverage  jest.config.js  package.json  src  tailwind.config.js  tools  tsconfig.json  tslint.json  yarn.lock

No node_modules or build folder.

After inspecting the image I can see that a layer is missing when compared to the working images.

@tonistiigi
Copy link
Member

@cgreening Please provide runnable reproduction steps.

@cgreening
Copy link

@tonistiigi Here you go:

https://github.com/cgreening/docker-cache-problem

Let me know if I can do anything to help.

@agoose77
Copy link

agoose77 commented Feb 19, 2021

I have a very similar problem in my CI using DOCKER_BUILDKIT=1, where a COPY instruction is missing the cache. The permissions and SHA of the file are the same.

Here are two layers that have the exact same file contents. It seems that only the times are different. I wasn't aware that this should change a layer hash?

d19e1c61b0cbe787e9d58d9ea54e2660ab6ae0c6d1fd3b11a410f60154dbe525.tar.gz.txt

2a3db49c74cd0666b6e4d2729cabafa22ea4270a5a0b6a41b92f87ec6f0f1301.tar.gz.txt

@tonistiigi
Copy link
Member

@cgreening Thanks. I can confirm I can repro this in 20.10.2 . Buildkit master(buildx) and 19.03 seem to not have the issue.

@agoose77
Copy link

I see this issue on 19.03.11 in docker-in-docker, so I assume that this means it's in the daemon?

@rhyek
Copy link

rhyek commented Feb 24, 2021

I can reproduce this locally on 20.10.3, build 48d30b5 and in my CI environment (github actions) using 20.10.3+azure, build 48d30b5b32e99c932b4ea3edca74353feddd83ff.
What is the preferred workaround for this, currently? I imagine basically disabling the use of buildkit?

Buildkit master(buildx) and 19.03 seem to not have the issue.

Just to confirm you mean buildx using the 19.03 daemon, correct?

@tonistiigi
Copy link
Member

Just to confirm you mean buildx using the 19.03 daemon, correct?

I mean buildx using buildkit release directly with the container/k8s driver.

@tonistiigi
Copy link
Member

tonistiigi commented Feb 24, 2021

The issue reported by @cgreening #1981 (comment) should be fixed with #1993, once vendored to moby codebase. I'm not sure if the other comments in here have the same problem as this was the only reproducer given.

@rhyek
Copy link

rhyek commented Feb 24, 2021

@tonistiigi thanks. My issue is exactly the same as reported by @cgreening.

I'm not familiar with buildkit release cycle and especially how it syncs up with docker engine releases. What would you say is the best way to access the fix sooner?

@tonistiigi
Copy link
Member

tonistiigi commented Feb 25, 2021

It's included in the PRs linked in ^ , including the backport to 20.10

Going to close this. If you find that fix does not apply to your use case open a new ticket with reproduction steps.

For the best way to avoid this issue at all is to use docker buildx with the container driver with docker buildx create. It does not rely on system docker version at all then and you can choose any buildkit version(for this case all latest buildkit versions should be ok).

@Bi0max
Copy link
Author

Bi0max commented Aug 2, 2021

It's included in the PRs linked in ^ , including the backport to 20.10

Going to close this. If you find that fix does not apply to your use case open a new ticket with reproduction steps.

For the best way to avoid this issue at all is to use docker buildx with the container driver with docker buildx create. It does not rely on system docker version at all then and you can choose any buildkit version(for this case all latest buildkit versions should be ok).

Actually, it still doesn't work. I have latest docker engine v20.10.7, but my reproducible example still doesn't work

sergioprado pushed a commit to torizon/torizoncore-builder that referenced this issue Aug 25, 2021
The build of the container sometimes fails with the following error:

failed to compute cache key: failed to walk
/var/lib/docker/overlay2/df6ea.../merged/root/aktualizr/build: lstat
/var/lib/docker/overlay2/df6ea.../merged/root/aktualizr/build: no such file or directory

It seems a cache issue with the sota-builder build stage. For some
reason, using --cache-from with BUILDKIT_INLINE_CACHE does not work
every time.

Indeed, there is a report of this issue in [1] and it is fixed in [2],
but it seems our CI it's not running a Buildkit version that has this
fix.

To workaround this issue, let's disable cached builds for now.

[1] moby/buildkit#1981
[2] moby/buildkit#1993

Related-to: TOR-1671

Signed-off-by: Sergio Prado <[email protected]>
@robgonnella
Copy link

To expand on @tonistiigi suggestion of using buildx plugin directly, this is what worked for me in gitlab-ci:

build_container:
  image: docker:stable
  services:
    - docker:stable-dind
  variables:
    - BUILDX_VERSION: v0.6.3
    - DOCKER_BUILDKIT: 1
  before_script:
    - mkdir -p "${HOME}/.docker/cli-plugins/"
    - curl -sLo "${HOME}/.docker/cli-plugins/docker-buildx" https://github.com/docker/buildx/releases/download/${BUILDX_VERSION}/buildx-${BUILDX_VERSION}.linux-amd64
    - chmod a+x "${HOME}/.docker/cli-plugins/docker-buildx"
    - docker buildx create --use
  script:
    - >
      docker buildx build
      --tag <image_base>:<image_tag>
      --cache-from=type=registry,ref=<image_base>:<image_tag>
      --cache-to=type=registry,ref=<image_base>:<image_tag>
      --progress plain
      --push .

@alex-treebeard
Copy link

Hi @Bi0max , I am experiencing this issue using docker 20.10 on gitlab. Is there a specific version I should be pinning to?

@Bi0max
Copy link
Author

Bi0max commented Dec 20, 2021

Hi @Bi0max , I am experiencing this issue using docker 20.10 on gitlab. Is there a specific version I should be pinning to?

Hi @alex-treebeard, last time I checked, i had v20.10.7 of docker engine (back then it still did not work). Haven't checked since then.

@petarmaric
Copy link

petarmaric commented Jan 8, 2022

My team has been affected by this issue as well, and the original poster's "pull the image each time before building" workaround (as mentioned right at the end of this issue description) seems to be working for us - 6 out of 610 out of 10 cache hits so far...

And I know, that the docs clearly states that an explicit docker pull beforehand shouldn't be required (bolded the relevant text for emphasis):

The following example builds an image with inline-cache metadata and pushes it to a registry, then uses the image as a cache source on another machine:

 docker build -t myname/myapp --build-arg BUILDKIT_INLINE_CACHE=1 .
 docker push myname/myapp

After pushing the image, the image is used as cache source on another machine. BuildKit automatically pulls the image from the registry if needed.

-- https://docs.docker.com/engine/reference/commandline/build/

But without it we experience the same "cache doesn't work every second time" bug as the original poster :/

For reference we're using gitlab CI, with their shared runners and docker:dind/docker:20.10.

@sherifabdlnaby
Copy link

Same issue here, using latest docker dind image 20.10.11.

@sherifabdlnaby
Copy link

I believe this issue should be reopened.

@thomasfrederikhoeck
Copy link

I can still reproduce this in 20.10.11+azure in my CI env, It think it should be reopened :-) @tonistiigi

@shalev123d
Copy link

We're experiencing the same issue as well, please reopen

@n1ngu
Copy link

n1ngu commented Jul 20, 2022

Docker version 20.10.15, build fd82621 on Bitbucket Pipelines. Totally reproduces.

@n1ngu
Copy link

n1ngu commented Jul 21, 2022

Ok everybody, you probably want to move the discussion to #2274

Sorry I missed it.

@apoeteo
Copy link

apoeteo commented Apr 20, 2023

I solved this by building the image with Buildah.

It works with the cache reliably and predictably.

My code for use in GitLab CI looks like:

before_script:
    - buildah login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" $CI_REGISTRY
script:
    - buildah build
        --tag $CI_REGISTRY_IMAGE/app:latest
        --cache-from $CI_REGISTRY_IMAGE/app # No tag here
        --layers
        --cache-to $CI_REGISTRY_IMAGE/app # No tag here
        .
    - buildah push $CI_REGISTRY_IMAGE/app:latest

The cache is stored in the GitLab registry separate from the app image.

I spent many hours by researching and testing this.

What's also recommended is to have the ".dockerignore" file with .git*

Using Kaniko is another good solution but Buildah is closer to default Docker commands (1:1 replacement) and can be easily added into default DinD image.

@alexwilson1
Copy link

I solved this by building the image with Buildah.

It works with the cache reliably and predictably.

My code for use in GitLab CI looks like:

before_script:
    - buildah login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" $CI_REGISTRY
script:
    - buildah build
        --tag $CI_REGISTRY_IMAGE/app:latest
        --cache-from $CI_REGISTRY_IMAGE/app # No tag here
        --layers
        --cache-to $CI_REGISTRY_IMAGE/app # No tag here
        .
    - buildah push $CI_REGISTRY_IMAGE/app:latest

The cache is stored in the GitLab registry separate from the app image.

I spent many hours by researching and testing this.

What's also recommended is to have the ".dockerignore" file with .git*

Using Kaniko is another good solution but Buildah is closer to default Docker commands (1:1 replacement) and can be easily added into default DinD image.

Works well, thanks for suggesting. Adding --storage-driver overlay2 to 'buildah build' also halved the (cache miss) build time for us based on initial tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests