Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The image is 425 MB larger than it should be #11396

Closed
mattlorimor opened this issue Nov 26, 2019 · 6 comments
Closed

The image is 425 MB larger than it should be #11396

mattlorimor opened this issue Nov 26, 2019 · 6 comments
Assignees
Milestone

Comments

@mattlorimor
Copy link

Is your feature request related to a problem? Please describe.
The pulled image, mcr.microsoft.com/azure-cli, at 1 GB is 425 MB larger than it should be due to this line not behaving as whoever wrote it expected.

# Remove CLI source code from the final image and normalize line endings.
RUN rm -rf ./azure-cli && \
    dos2unix /root/.bashrc /usr/local/bin/az

While running rm -rf ./azure-cli does, in fact, hide the directory from the subsequent layers, it does not remove it and its bloat from the built image. Proof of this can bee seen by using a tool such as dive to inspect the image.

image

Describe the solution you'd like
Find a way to actually remove the no-longer-needed source code from the finished container.

The trickiness of this is compounded by the fact that this is python and pip is involved. It's not like there is a single binary being generated that could simply be built in a separate stage and copied to the final stage.

I found a solution for this locally, but I'm not sure how feasible it would be on your end considering it requires the Docker daemon to be running in experimental mode. The new Dockerfile would look something like this:

# syntax = docker/dockerfile:experimental
#---------------------------------------------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License. See License.txt in the project root for license information.
#---------------------------------------------------------------------------------------------

ARG PYTHON_VERSION="3.6.9"

FROM alpine:3.10 as azure-cli-source

WORKDIR azure-cli
COPY . /azure-cli

FROM python:${PYTHON_VERSION}-alpine3.10

ARG CLI_VERSION

# Metadata as defined at http://label-schema.org
ARG BUILD_DATE

LABEL maintainer="Microsoft" \
      org.label-schema.schema-version="1.0" \
      org.label-schema.vendor="Microsoft" \
      org.label-schema.name="Azure CLI" \
      org.label-schema.version=$CLI_VERSION \
      org.label-schema.license="MIT" \
      org.label-schema.description="The Azure CLI is used for all Resource Manager deployments in Azure." \
      org.label-schema.url="https://docs.microsoft.com/cli/azure/overview" \
      org.label-schema.usage="https://docs.microsoft.com/cli/azure/install-az-cli2#docker" \
      org.label-schema.build-date=$BUILD_DATE \
      org.label-schema.vcs-url="https://github.com/Azure/azure-cli.git" \
      org.label-schema.docker.cmd="docker run -v \${HOME}/.azure:/root/.azure -it microsoft/azure-cli:$CLI_VERSION"

# bash gcc make openssl-dev libffi-dev musl-dev - dependencies required for CLI
# openssh - included for ssh-keygen
# ca-certificates

# curl - required for installing jp
# jq - we include jq as a useful tool
# pip wheel - required for CLI packaging
# jmespath-terminal - we include jpterm as a useful tool
# libintl and icu-libs - required by azure devops artifact (az extension add --name azure-devops)
RUN apk add --no-cache bash openssh ca-certificates jq curl openssl git zip \
 && apk add --no-cache --virtual .build-deps gcc make openssl-dev libffi-dev musl-dev linux-headers \
 && apk add --no-cache libintl icu-libs \
 && update-ca-certificates

ARG JP_VERSION="0.1.3"

RUN curl -L https://github.com/jmespath/jp/releases/download/${JP_VERSION}/jp-linux-amd64 -o /usr/local/bin/jp \
 && chmod +x /usr/local/bin/jp \
 && pip install --no-cache-dir --upgrade jmespath-terminal

WORKDIR azure-cli
# COPY . /azure-cli

# 1. Build packages and store in tmp dir
# 2. Install the cli and the other command modules that weren't included
# 3. Temporary fix - install azure-nspkg to remove import of pkg_resources in azure/__init__.py (to improve performance)
# RUN --mount=type=cache,from=azure-cli-source,target=/azure-cli /azure-cli/scripts/install_full.sh \
RUN --mount=type=cache,from=azure-cli-source,target=/azure-cli ./azure-cli/scripts/install_full.sh \
 && cat azure-cli/az.completion > ~/.bashrc \
 && runDeps="$( \
    scanelf --needed --nobanner --recursive /usr/local \
        | awk '{ gsub(/,/, "\nso:", $2); print "so:" $2 }' \
        | sort -u \
        | xargs -r apk info --installed \
        | sort -u \
    )" \
 && apk add --virtual .rundeps $runDeps \
 && rm -rf azure-cli/

WORKDIR /

# Remove CLI source code from the final image and normalize line endings.
RUN rm -rf ./azure-cli && \
    dos2unix /root/.bashrc /usr/local/bin/az

CMD bash

The big changes are:

# syntax = docker/dockerfile:experimental

at the top of the Dockerfile and then using the --mount flag in the run command:

RUN --mount=type=cache,from=azure-cli-source,target=/azure-cli ./azure-cli/scripts/install_full.sh \
 && cat azure-cli/az.completion > ~/.bashrc \
 && runDeps="$( \
    scanelf --needed --nobanner --recursive /usr/local \
        | awk '{ gsub(/,/, "\nso:", $2); print "so:" $2 }' \
        | sort -u \
        | xargs -r apk info --installed \
        | sort -u \
    )" \
 && apk add --virtual .rundeps $runDeps \
 && rm -rf azure-cli/

Doing this allows RUN to use the source code without a COPY and prevents it from being forever committed to an intermediate layer.

Building this version resulted in a functional image 586 MB in size.

Describe alternatives you've considered
If you could somehow make it so that you could pull down the source with a RUN instead of a COPY, it could be done and rm -rf'd in the same layer as the current install_full.sh command.

Additional context
It looks like the exact same thing is happening, at the very least, in Dockerfile.spot.

@mattlorimor
Copy link
Author

mattlorimor commented Nov 26, 2019

The pulled image, mcr.microsoft.com/azure-cli, at 1 GB is 425 MB larger than it should be due to this line not behaving as whoever wrote it expected

Holy crap. I just looked at the blame for that line. I tangentially know the person: @marstr. 😃

@yonzhan yonzhan added this to the S163 milestone Dec 1, 2019
@yonzhan
Copy link
Collaborator

yonzhan commented Dec 1, 2019

add in S163.

@fengzhou-msft
Copy link
Member

Multi stage builds is a potential solution and we already used it in RPM build. I will first figure out how many directories we need to copy from the first build to the second one and see if it's feasible.

@mattlorimor
Copy link
Author

mattlorimor commented Dec 3, 2019 via email

@yonzhan yonzhan modified the milestones: S163, S164 Jan 2, 2020
@fengzhou-msft fengzhou-msft modified the milestones: S164, S166 Feb 4, 2020
@fengzhou-msft
Copy link
Member

fengzhou-msft commented Feb 17, 2020

This PR: #12208 is a quick fix to save 400+ MB space for the image. After ignoring tests files, the source code only wastes about 12 MB space. I think we can live with that now without using experimental mode.

I also found another solution with experimental mode, that is to use --squash for docker build.

@fengzhou-msft
Copy link
Member

@mattlorimor Thanks for bringing up this issue and providing a solution. We definitely learned something valuable from your sharing. We welcome you to provide any feedback in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants