Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tools/docker/syzbot: follow best practices #5375

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

tarasmadan
Copy link
Collaborator

@tarasmadan tarasmadan commented Oct 8, 2024

  1. Remove duplicates.
  2. Prepend apt-get install by apt-get update. (https://docs.docker.com/build/building/best-practices/#apt-get)
  3. Less RUN calls - less layers.

@tarasmadan tarasmadan marked this pull request as ready for review October 9, 2024 06:30
RUN sudo update-alternatives --install /usr/bin/llvm-objcopy llvm-objcopy /usr/bin/llvm-objcopy-15 100
RUN sudo update-alternatives --install /usr/bin/llvm-objdump llvm-objdump /usr/bin/llvm-objdump-15 100
RUN sudo update-alternatives --install /usr/bin/llvm-addr2line llvm-addr2line /usr/bin/llvm-addr2line-15 100
RUN apt-get update --allow-releaseinfo-change && \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on my experience building containers, it's actually better to have each install on a separate RUN line b/c docker build caching.
If we have them all in one command, and if one package in the middle gets rotten, or we need to add a new one, all caching is invalidating.
I don't see any downsides of separate RUN lines (process start overhead is negligible in this case).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://docs.docker.com/build/building/best-practices/#apt-get :
"Always combine RUN apt-get update with apt-get install in the same RUN statement."

The motivation - every time we modify apt-get install lines, we want to run apt-get update.
Caching the apt-get update result we later use old versions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I am reading the best practice correctly it says: this allows you to install latest packages when you re-build containers.
But what they suggest both makes things worse and does not achieve what it claims to achieve.
If we are re-building the containers after a year, and change only the command line prompt at the end (or something else besides apt-get), we also want latest packages, but this trick won't work, caching will kick in.
The way it makes things significantly worse: if I have to build the container 10 times within few hours to figure out, which packages got broken and which I want to install, I do want caching, and, no, I don't want getting latest versions each time.
What will work much better, and will actually achieve installing latest packages is docker build --no-cache.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel some inconsistency here:

  1. "apt-get install -y -q --no-install-recommends llvm-15 clang-15 clang-format-15 clang-tidy-15 lld-15" installs 5 packages in one RUN. All seems to be connected (llvm+clang).
  2. Lines 37-46 are connected the same way (llvm+clang) but were splitted in 10 RUN calls.

I understand your concern. But in this case we're installing the strongly connected components. I don't think it makes sense to make a 10 levels deep cache here.

docker build --no-cache. Yes. And its quite sad we're not using github actions to autobuild it...
I hope it'll be the syz-ci refactoring scope.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between installing several packages with one apt-get command and multiple commands?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From chatGPT:

# Copyright 2021 syzkaller project authors. All rights reserved.
# Use of this source code is governed by Apache 2 LICENSE that can be found in the LICENSE file.

# See /tools/docker/README.md for details.

FROM debian:bookworm

# Set non-interactive mode for apt-get
ARG DEBIAN_FRONTEND=noninteractive

# Update and install essential packages in one layer to reduce image size
RUN apt-get update && apt-get install -y -q --no-install-recommends \
    # Build essentials:
    gcc g++ binutils make ccache \
    # Common utilities:
    unzip curl sudo procps psmisc nano vim git bzip2 dh-autoreconf software-properties-common \
    # Kernel build dependencies:
    flex bison bc gawk dwarves cpio texinfo texi2html lzop lbzip2 \
    zlib1g-dev libelf-dev libncurses-dev libmpc-dev libssl-dev \
    # Cuttlefish and gVisor dependencies:
    rsync libarchive-tools \
    # Cross-compilation tools:
    crossbuild-essential-amd64 crossbuild-essential-arm64 libbpf-dev \
    && apt-get clean && rm -rf /var/lib/apt/lists/*

# Conditional installation for cross-compilation
RUN if [ "$(uname -m)" = "x86_64" ]; then \
        apt-get update && apt-get install -y -q --no-install-recommends \
        libc6-dev-i386 libc6-dev-i386-amd64-cross lib32gcc-12-dev lib32stdc++-12-dev \
        g++-arm-linux-gnueabi g++-aarch64-linux-gnu g++-powerpc64le-linux-gnu \
        g++-mips64el-linux-gnuabi64 g++-s390x-linux-gnu g++-riscv64-linux-gnu; \
    fi && apt-get clean && rm -rf /var/lib/apt/lists/*

# Install Go
RUN curl -sSL "https://dl.google.com/go/go1.22.7.linux-$(uname -m | sed 's/aarch64/arm64/;s/x86_64/amd64/').tar.gz" | tar -C /usr/local -xz \
    && echo "export PATH=/usr/local/go/bin:\$PATH" >> /etc/profile

# Install LLVM and Clang
RUN curl -sSL https://apt.llvm.org/llvm-snapshot.gpg.key | apt-key add - \
    && add-apt-repository "deb http://apt.llvm.org/bookworm/ llvm-toolchain-bookworm-15 main" \
    && apt-get update --allow-releaseinfo-change && apt-get install -y -q --no-install-recommends \
    llvm-15 clang-15 clang-format-15 clang-tidy-15 lld-15 \
    && apt-get clean && rm -rf /var/lib/apt/lists/*

# Set up alternatives for clang
RUN update-alternatives --install /usr/bin/clang clang /usr/bin/clang-15 100 \
    && update-alternatives --install /usr/bin/clang++ clang++ /usr/bin/clang++-15 100 \
    && update-alternatives --install /usr/bin/clang-tidy clang-tidy /usr/bin/clang-tidy-15 100 \
    && update-alternatives --install /usr/bin/clang-format clang-format /usr/bin/clang-format-15 100 \
    && update-alternatives --install /usr/bin/ld.lld ld.lld /usr/bin/lld-15 100 \
    && update-alternatives --install /usr/bin/llvm-nm llvm-nm /usr/bin/llvm-nm-15 100 \
    && update-alternatives --install /usr/bin/llvm-ar llvm-ar /usr/bin/llvm-ar-15 100 \
    && update-alternatives --install /usr/bin/llvm-objcopy llvm-objcopy /usr/bin/llvm-objcopy-15 100 \
    && update-alternatives --install /usr/bin/llvm-objdump llvm-objdump /usr/bin/llvm-objdump-15 100 \
    && update-alternatives --install /usr/bin/llvm-addr2line llvm-addr2line /usr/bin/llvm-addr2line-15 100

# Create a symlink for Python
RUN mkdir -p /usr/grte/v5/bin && ln -s /usr/bin/python3 /usr/grte/v5/bin/python2.7

# Install Bazel
RUN curl -o /usr/local/bin/bazel -sSL "https://releases.bazel.build/7.1.2/release/bazel-7.1.2-linux-$(uname -m | sed 's/aarch64/arm64/')" \
    && chmod ugo+x /usr/local/bin/bazel

# Install QEMU from backports
RUN add-apt-repository "deb http://deb.debian.org/debian bookworm-backports main" \
    && apt-get update && apt-get install -t bookworm-backports -y -q --no-install-recommends \
    qemu-user \
    && apt-get clean && rm -rf /var/lib/apt/lists/*

# Conditional installation for QEMU system packages
RUN if [ "$(uname -m)" = "x86_64" ]; then \
        apt-get install -t bookworm-backports -y -q --no-install-recommends \
        qemu-utils qemu-system-misc qemu-system-x86 qemu-system-arm qemu-system-aarch64 \
        qemu-system-s390x qemu-system-mips qemu-system-ppc; \
    fi && apt-get clean && rm -rf /var/lib/apt/lists/*

# Create syzkaller user
RUN useradd --create-home syzkaller

# Set custom prompt for root
RUN echo "export PS1='\n\W🤖 '" >> /root/.bashrc

# Copy the command script
COPY run-syz-command.sh /run-syz-command.sh

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to the docker layers but perhaps, we could've used stable LLVM versions that kernel doc suggests? I've found that it's the most stable way rather than installing from the distro's packages.

From https://docs.kernel.org/kbuild/llvm.html:

We provide prebuilt stable versions of LLVM on kernel.org. These have been optimized with profile data for building Linux kernels, which should improve kernel build times relative to other distributions of LLVM.

So we could've done something like this:
RUN wget -O- https://mirrors.edge.kernel.org/pub/tools/llvm/files/llvm-15.0.7-x86_64.tar.gz | tar -xzf - -C /usr/bin/

and the most of llvm, clang dpkg packages will not be required in Dockerfile, thus less layers/caching.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to create issue and invite @ramosian-glider + @a-nogikh to discuss it!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this archive contains the most of llvm utils, also llvm-readelf, llvm-strip mentioned in this PR #5367:

user@work:~/opt/llvm-14.0.6-x86_64$ ls -l
total 12
drwxr-xr-x 2 user user 4096 Oct 10 21:34 bin
drwxr-xr-x 3 user user 4096 Apr 26 23:24 include
drwxr-xr-x 3 user user 4096 Apr 26 23:24 lib
user@work:~/opt/llvm-14.0.6-x86_64$ cd bin
user@work:~/opt/llvm-14.0.6-x86_64/bin$ ls -l
total 251664
lrwxrwxrwx 1 user user         8 Apr 26 23:24 clang -> clang-14
lrwxrwxrwx 1 user user         5 Apr 26 23:24 clang++ -> clang
-rwxr-xr-x 1 user user 100523848 Apr 26 23:20 clang-14
lrwxrwxrwx 1 user user         5 Apr 26 23:24 clang-cl -> clang
lrwxrwxrwx 1 user user         5 Apr 26 23:24 clang-cpp -> clang
lrwxrwxrwx 1 user user         3 Apr 26 23:24 ld64.lld -> lld
lrwxrwxrwx 1 user user         3 Apr 26 23:24 ld.lld -> lld
-rwxr-xr-x 1 user user  78376208 Apr 26 23:20 lld
lrwxrwxrwx 1 user user         3 Apr 26 23:24 lld-link -> lld
lrwxrwxrwx 1 user user        15 Apr 26 23:24 llvm-addr2line -> llvm-symbolizer
-rwxr-xr-x 1 user user  15696696 Apr 26 23:20 llvm-ar
-rwxr-xr-x 1 user user  13322072 Apr 26 23:20 llvm-dwarfdump
-rwxr-xr-x 1 user user  16128872 Apr 26 23:20 llvm-nm
-rwxr-xr-x 1 user user   4450536 Apr 26 23:20 llvm-objcopy
-rwxr-xr-x 1 user user  16973168 Apr 26 23:20 llvm-objdump
lrwxrwxrwx 1 user user         7 Apr 26 23:24 llvm-ranlib -> llvm-ar
lrwxrwxrwx 1 user user        12 Apr 26 23:24 llvm-readelf -> llvm-readobj
-rwxr-xr-x 1 user user   6598288 Apr 26 23:21 llvm-readobj
-rwxr-xr-x 1 user user    319464 Apr 26 23:20 llvm-strings
lrwxrwxrwx 1 user user        12 Apr 26 23:24 llvm-strip -> llvm-objcopy
-rwxr-xr-x 1 user user   5293248 Apr 26 23:20 llvm-symbolizer
lrwxrwxrwx 1 user user         3 Apr 26 23:24 wasm-ld -> lld

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You messages will be lost here. Please copy your proposal to the new issue.
It is better to make changes more focused to increase the landing probability.
Adding more ideas here we postpone this PR landing...

1. Remove duplicates.
2. Prepend apt-get install by apt-get update. (https://docs.docker.com/build/building/best-practices/#apt-get)
3. Less RUN calls - less layers.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants