Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caching problem with multistage build #1706

Closed
mhubig opened this issue Jul 28, 2021 · 4 comments · Fixed by #1735
Closed

Caching problem with multistage build #1706

mhubig opened this issue Jul 28, 2021 · 4 comments · Fixed by #1735

Comments

@mhubig
Copy link

mhubig commented Jul 28, 2021

Actual behavior
I have a problem with a docker image containing a python cli script. While tracking down the problem I found that kaniko inappropriately uses a cached version of the RUN command which is responsible for installing the cli tool which is build in a prior stage of the multistage Dockerfile.

FROM python:3.9-slim as base

ENV PYTHONFAULTHANDLER=1 \
  PYTHONHASHSEED=random \
  PYTHONUNBUFFERED=1

RUN set -ex; \
  apt-get update -y; \
  apt-get install -y --no-install-recommends gnupg curl parallel; \
  \
  echo "deb [signed-by=/usr/share/keyrings/cloud.google.gpg] http://packages.cloud.google.com/apt cloud-sdk main" | tee -a /etc/apt/sources.list.d/google-cloud-sdk.list ;\
  curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key --keyring /usr/share/keyrings/cloud.google.gpg add - ;\
  apt-get update -y; \
  apt-get install -y google-cloud-sdk; \
  \
  apt-get clean; \
  rm -rf /var/lib/apt/lists;

WORKDIR /app

FROM base as builder

ENV PIP_DEFAULT_TIMEOUT=100 \
  PIP_DISABLE_PIP_VERSION_CHECK=1 \
  PIP_NO_CACHE_DIR=1 \
  POETRY_VERSION=1.1.7

COPY pyproject.toml poetry.lock ./
RUN set -ex; \
  pip install "poetry==$POETRY_VERSION"; \
  poetry config virtualenvs.create false; \
  poetry install --no-interaction --no-ansi

COPY . .
RUN poetry build

FROM base as final

COPY --from=builder /app/dist/*.whl ./
RUN pip install *.whl

The last line of this Dockerfile ist cached by kaniko, so in the resulting image I always have the old cli tool version. Here ist the gitlab-ci job which I use to build the image with kaniko (version gcr.io/kaniko-project/executor:v1.6.0-debug):

Build Image:
  stage: build
  image: registry.gitlab.dm-drogeriemarkt.com/mythos/docker/kaniko:latest
  script:
    - /kaniko/executor
      --cache=true
      --context "${CI_PROJECT_DIR}"
      --dockerfile "${CI_PROJECT_DIR}/Dockerfile"
      --destination "${CI_REGISTRY_IMAGE}:${CI_COMMIT_SHA}"

Here is a snippet of the kaniko log output which also states that a cached version is used:

INFO[0051] Pushed image to 1 destinations               
INFO[0051] Saving file app/dist/cleanup-0.1.0-py3-none-any.whl for later use 
INFO[0051] Deleting filesystem...                       
INFO[0053] Base image from previous stage 0 found, using saved tar at path /kaniko/stages/0 
INFO[0053] Executing 0 build triggers                   
INFO[0054] Checking for cached layer registry.gitlab.site.com/mythos/hybris/hybris-backup/cache:49cdb016213ce38e45199ab5f02246092fabe2659381c707c26ac923de44253f... 
INFO[0054] Using caching version of cmd: RUN pip install *.whl 
INFO[0054] Unpacking rootfs as cmd COPY --from=builder /app/dist/*.whl ./ requires it. 
INFO[0070] COPY --from=builder /app/dist/*.whl ./       
INFO[0070] Resolving srcs [/app/dist/*.whl]...          
INFO[0070] Taking snapshot of files...                  
INFO[0070] RUN pip install *.whl                        
INFO[0070] Found cached layer, extracting to filesystem 
INFO[0071] Pushing image to registry.gitlab.site.com/mythos/hybris/hybris-backup:5fe242f8b8a38d2ebcb82b4052005b9a972401d2 
108INFO[0072] Pushed image to 1 destinations 

Triage Notes for the Maintainers

Description Yes/No
Please check if this a new feature you are proposing
Please check if the build works in docker but not in kaniko
Please check if this error is seen when you use --cache flag
Please check if your dockerfile is a multistage dockerfile
@cbartz
Copy link

cbartz commented Jul 29, 2021

Possibly related to #589 (comment) , try with kaniko v1.3.0 as it appears that a regression has happened.

@montyben
Copy link

I was having a similar issue building docker images with Gitlab CI. Downgrading kaniko to v1.3.0 solved the problem.

@hoang-innomize
Copy link

I also faced this issue and downgraded to v1.3.0 worked.

gilbsgilbs added a commit to gilbsgilbs/kaniko that referenced this issue Sep 9, 2021
…Tools#1706)

PR GoogleContainerTools#1518 reintroduced COPY layers caching using the `--cache-copy-layers`
flag. Unfortunately, this PR also introduced a bug by not including the
stage digest into the caching key of the COPY command when the
`--cache-copy-layers` flag was not set. As a result, kaniko would use
any previous (possibly stalled) layer from the cache because the digest
of the "COPY --from" command would never change.

PR author probably expected Go to fallthrough in the switch just like C
does. However, this is not the case. Go does not fallthrough in
switch-statements by default and requires the fallthrough keyword to be
used. Note that this keyword is not available in type-switches though,
because it wouldn't work properly with typings.
@gilbsgilbs
Copy link
Contributor

FYI, I opened #1735 which should fix this issue if anybody wants to give a look.

gilbsgilbs added a commit to gilbsgilbs/kaniko that referenced this issue Sep 9, 2021
…Tools#1706)

PR GoogleContainerTools#1518 reintroduced COPY layers caching using the `--cache-copy-layers`
flag. Unfortunately, this PR also introduced a bug by not including the
stage digest into the caching key of the COPY command when the
`--cache-copy-layers` flag was not set. As a result, kaniko would use
any previous (possibly stalled) layer from the cache because the digest
of the "COPY --from" command would never change.

PR author probably expected Go to fallthrough in the switch just like C
does. However, this is not the case. Go does not fallthrough in
switch-statements by default and requires the fallthrough keyword to be
used. Note that this keyword is not available in type-switches though,
because it wouldn't work properly with typings.
tejal29 pushed a commit to gilbsgilbs/kaniko that referenced this issue Oct 19, 2021
…Tools#1706)

PR GoogleContainerTools#1518 reintroduced COPY layers caching using the `--cache-copy-layers`
flag. Unfortunately, this PR also introduced a bug by not including the
stage digest into the caching key of the COPY command when the
`--cache-copy-layers` flag was not set. As a result, kaniko would use
any previous (possibly stalled) layer from the cache because the digest
of the "COPY --from" command would never change.

PR author probably expected Go to fallthrough in the switch just like C
does. However, this is not the case. Go does not fallthrough in
switch-statements by default and requires the fallthrough keyword to be
used. Note that this keyword is not available in type-switches though,
because it wouldn't work properly with typings.
tejal29 added a commit that referenced this issue Oct 19, 2021
* chore: add workflows for pr tests

* fix unit tests

* fix formatting

* chore: fix gobuild

* change minikube script

* chore: fix lint install script

* chore: ignore and fix tests

* fix lint and run gofmt

* lint fixes

* k8s executor image only

* fix Makefile

* fix travis env variables

* more info on k8s tests

* fix travis run

* fix

* fix

* fix

* fix log

* some more changes

* increase timeout

* delete travis.yml and fix multiple copy tests

* fix registry mirror

* fix lint

* add concurency

* last attemot to fix k8 integrations

* diff id for diff workflows

* Fix composite cache key for multi-stage copy command (#1706)

PR #1518 reintroduced COPY layers caching using the `--cache-copy-layers`
flag. Unfortunately, this PR also introduced a bug by not including the
stage digest into the caching key of the COPY command when the
`--cache-copy-layers` flag was not set. As a result, kaniko would use
any previous (possibly stalled) layer from the cache because the digest
of the "COPY --from" command would never change.

PR author probably expected Go to fallthrough in the switch just like C
does. However, this is not the case. Go does not fallthrough in
switch-statements by default and requires the fallthrough keyword to be
used. Note that this keyword is not available in type-switches though,
because it wouldn't work properly with typings.

* refactor: add an abstract copy command interface to avoid code duplication

* fix typo in error message

Co-authored-by: Tejal Desai <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants