Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Draft] Speedup dependencies installation via UV #8631

Open
wants to merge 12 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
81 changes: 25 additions & 56 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,7 +1,11 @@
ARG PIP_VERSION=24.0
ARG BASE_IMAGE=ubuntu:22.04

FROM ${BASE_IMAGE} AS build-image-base
FROM ${BASE_IMAGE} AS env
ENV UV_PROJECT_ENVIRONMENT=/opt/venv
ENV PATH="${UV_PROJECT_ENVIRONMENT}/bin:${PATH}"

FROM env AS build-image-base

RUN apt-get update && \
DEBIAN_FRONTEND=noninteractive apt-get --no-install-recommends install -yq \
Expand All @@ -25,12 +29,10 @@

ARG PIP_VERSION
ENV PIP_DISABLE_PIP_VERSION_CHECK=1
RUN --mount=type=cache,target=/root/.cache/pip/http \
python3 -m pip install -U pip==${PIP_VERSION}

# We build OpenH264, FFmpeg and PyAV in a separate build stage,
# because this way Docker can do it in parallel to all the other packages.
FROM build-image-base AS build-image-av
FROM build-image-base AS build-image

# Compile Openh264 and FFmpeg
ARG PREFIX=/opt/ffmpeg
Expand All @@ -51,49 +53,34 @@
--enable-shared --disable-static --disable-doc --disable-programs --prefix="${PREFIX}" && \
make -j5 && make install && make clean

COPY utils/dataset_manifest/requirements.txt /tmp/utils/dataset_manifest/requirements.txt

# Since we're using pip-compile-multi, each dependency can only be listed in
# one requirements file. In the case of PyAV, that should be
# `dataset_manifest/requirements.txt`. Make sure it's actually there,
# and then remove everything else.
RUN grep -q '^av==' /tmp/utils/dataset_manifest/requirements.txt
RUN sed -i '/^av==/!d' /tmp/utils/dataset_manifest/requirements.txt

# Work around https://github.com/PyAV-Org/PyAV/issues/1140
RUN pip install setuptools wheel 'cython<3'

RUN --mount=type=cache,target=/root/.cache/pip/http-v2 \
python3 -m pip wheel --no-binary=av --no-build-isolation \
-r /tmp/utils/dataset_manifest/requirements.txt \
-w /tmp/wheelhouse

# This stage builds wheels for all dependencies (except PyAV)
FROM build-image-base AS build-image

COPY cvat/requirements/ /tmp/cvat/requirements/
COPY utils/dataset_manifest/requirements.txt /tmp/utils/dataset_manifest/requirements.txt
WORKDIR /tmp/venv
COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/
Comment on lines +56 to +57
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Based on the Dockerfile analysis, I can now provide the final response:

The uv tool copy command needs to be fixed

The current COPY command COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/ appears incorrect as it's trying to copy two source paths (/uv and /uvx) to a single destination (/bin/). Additionally, the subsequent commands show that uv is used in the build process, indicating it needs to be properly accessible.

The command should be modified to:

COPY --from=ghcr.io/astral-sh/uv:latest /uv /bin/uv
🔗 Analysis chain

Ensure the uv tool is correctly copied and executable.

Copying uv from ghcr.io/astral-sh/uv:latest should be verified to ensure it is properly integrated into the build environment and has the necessary permissions.

Expected output: uv tool version information.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Verify that uv is executable

docker run --rm build-image /bin/uv --version

Length of output: 234


Script:

#!/bin/bash
# Let's check the Dockerfile content to understand the build stages and verify the uv copy command
cat Dockerfile

Length of output: 5288


# Exclude av from the requirements file
RUN sed -i '/^av==/d' /tmp/utils/dataset_manifest/requirements.txt
COPY pyproject.toml uv.lock ./
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Missing uv.lock file could cause build issues.

Ensure that both pyproject.toml and uv.lock are present in the context and correctly copied. The absence of uv.lock might lead to inconsistent dependency versions.

If uv.lock is not generated yet, consider adding a step to create it before the build process or include it in the repository.


ARG CVAT_CONFIGURATION="production"

RUN --mount=type=cache,target=/root/.cache/pip/http-v2 \
DATUMARO_HEADLESS=1 python3 -m pip wheel --no-deps --no-binary lxml,xmlsec \
-r /tmp/cvat/requirements/${CVAT_CONFIGURATION}.txt \
-w /tmp/wheelhouse
ENV UV_LINK_MODE=copy
RUN uv venv
RUN --mount=type=cache,target=/root/.cache/uv \
DATUMARO_HEADLESS=1 \
uv sync \
--frozen \
--no-install-project \
--extra ${CVAT_CONFIGURATION} \
$(if [ "${CVAT_DEBUG_ENABLED}" = 'yes' ]; then echo "--extra debug"; fi) \
--no-binary-package lxml \
--no-binary-package xmlsec
Comment on lines +63 to +72
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Correct the formatting of the multi-line RUN command

There is a formatting issue with the RUN command starting at line 62. The RUN instructions should be properly concatenated using backslashes (\) to ensure that the multi-line command executes correctly. Additionally, environment variables should be set within the same RUN command if they're only needed temporarily.

Apply this diff to fix the formatting:

-RUN uv venv
-
-RUN --mount=type=cache,target=/root/.cache/uv
-    DATUMARO_HEADLESS=1
-    uv sync
-    --frozen
-    --no-install-project
-    --extra ${CVAT_CONFIGURATION}
-    $(if [ "${CVAT_DEBUG_ENABLED}" = 'yes' ]; then echo "--extra debug"; fi)
-    --no-binary-package lxml
-    --no-binary-package xmlsec
+RUN --mount=type=cache,target=/root/.cache/uv \
+    uv venv && \
+    DATUMARO_HEADLESS=1 uv sync \
+        --frozen \
+        --no-install-project \
+        --extra ${CVAT_CONFIGURATION} \
+        $(if [ "${CVAT_DEBUG_ENABLED}" = 'yes' ]; then echo "--extra debug"; fi) \
+        --no-binary-package lxml \
+        --no-binary-package xmlsec

This adjustment ensures that the entire sequence is treated as a single RUN command and that all the arguments are correctly associated with the uv sync command.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
RUN uv venv
RUN --mount=type=cache,target=/root/.cache/uv \
DATUMARO_HEADLESS=1 \
uv sync \
--frozen \
--no-install-project \
--extra ${CVAT_CONFIGURATION} \
$(if [ "${CVAT_DEBUG_ENABLED}" = 'yes' ]; then echo "--extra debug"; fi) \
--no-binary-package lxml \
--no-binary-package xmlsec
RUN --mount=type=cache,target=/root/.cache/uv \
uv venv && \
DATUMARO_HEADLESS=1 uv sync \
--frozen \
--no-install-project \
--extra ${CVAT_CONFIGURATION} \
$(if [ "${CVAT_DEBUG_ENABLED}" = 'yes' ]; then echo "--extra debug"; fi) \
--no-binary-package lxml \
--no-binary-package xmlsec


FROM golang:1.23.0 AS build-smokescreen

RUN git clone --filter=blob:none --no-checkout https://github.com/stripe/smokescreen.git
RUN cd smokescreen && git checkout eb1ac09 && go build -o /tmp/smokescreen

FROM ${BASE_IMAGE}
FROM env

ARG http_proxy
ARG https_proxy
ARG no_proxy="nuclio,${no_proxy}"

Check warning on line 83 in Dockerfile

View workflow job for this annotation

GitHub Actions / build

Variables should be defined before their use

UndefinedVar: Usage of undefined variable '$no_proxy' More info: https://docs.docker.com/go/dockerfile/rule/undefined-var/
ARG socks_proxy
ARG TZ="Etc/UTC"

Expand Down Expand Up @@ -126,6 +113,7 @@
libxml2 \
libxmlsec1 \
libxmlsec1-openssl \
libglib2.0-0 \
nginx \
p7zip-full \
poppler-utils \
Expand All @@ -139,13 +127,14 @@
rm -rf /var/lib/apt/lists/* && \
echo 'application/wasm wasm' >> /etc/mime.types


# Install smokescreen
COPY --from=build-smokescreen /tmp/smokescreen /usr/local/bin/smokescreen

# Add a non-root user
ENV USER=${USER}
ENV HOME /home/${USER}
RUN adduser --shell /bin/bash --disabled-password --gecos "" ${USER}

Check warning on line 137 in Dockerfile

View workflow job for this annotation

GitHub Actions / build

Legacy key/value format with whitespace separator should not be used

LegacyKeyValueFormat: "ENV key=value" should be used instead of legacy "ENV key value" format More info: https://docs.docker.com/go/dockerfile/rule/legacy-key-value-format/

ARG CLAM_AV="no"
RUN if [ "$CLAM_AV" = "yes" ]; then \
Expand All @@ -159,34 +148,14 @@
rm -rf /var/lib/apt/lists/*; \
fi

# Install wheels from the build image
RUN python3 -m venv /opt/venv
ENV PATH="/opt/venv/bin:${PATH}"
# setuptools should be uninstalled after updating google-cloud-storage
# https://github.com/googleapis/python-storage/issues/740
RUN python -m pip install --upgrade setuptools
ARG PIP_VERSION
ARG PIP_DISABLE_PIP_VERSION_CHECK=1

RUN python -m pip install -U pip==${PIP_VERSION}
RUN --mount=type=bind,from=build-image,source=/tmp/wheelhouse,target=/mnt/wheelhouse \
--mount=type=bind,from=build-image-av,source=/tmp/wheelhouse,target=/mnt/wheelhouse-av \
python -m pip install --no-index /mnt/wheelhouse/*.whl /mnt/wheelhouse-av/*.whl

ENV NUMPROCS=1
COPY --from=build-image-av /opt/ffmpeg/lib /usr/lib

# These variables are required for supervisord substitutions in files
# This library allows remote python debugging with VS Code
ARG CVAT_DEBUG_ENABLED
RUN if [ "${CVAT_DEBUG_ENABLED}" = 'yes' ]; then \
python3 -m pip install --no-cache-dir debugpy; \
fi

# Removing pip due to security reasons. See: https://scout.docker.com/vulnerabilities/id/CVE-2018-20225
# The vulnerability is dubious and we don't use pip at runtime, but some vulnerability scanners mark it as a high vulnerability,
# and it was decided to remove pip from the final image
RUN python -m pip uninstall -y pip
COPY --from=build-image /opt/ffmpeg/lib /usr/lib
COPY --from=build-image ${UV_PROJECT_ENVIRONMENT} ${UV_PROJECT_ENVIRONMENT}
Comment on lines +157 to +158
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider cleaning up unnecessary files to reduce image size.

After copying the necessary files, remove any temporary files or caches that are not needed at runtime to optimize the image size.

Add the following commands:

RUN rm -rf /tmp/* /var/cache/apt/* /var/lib/apt/lists/*

Ensure this doesn't remove required files.


# Install and initialize CVAT, copy all necessary files
COPY cvat/nginx.conf /etc/nginx/nginx.conf
Expand Down
2 changes: 2 additions & 0 deletions cvat/requirements/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
These files are here for backwards compatibility only and not used during actual build.
See ../../pyproject.toml
6 changes: 0 additions & 6 deletions cvat/requirements/README.txt

This file was deleted.

3 changes: 0 additions & 3 deletions cvat/requirements/all.in

This file was deleted.

9 changes: 1 addition & 8 deletions cvat/requirements/all.txt
Original file line number Diff line number Diff line change
@@ -1,12 +1,5 @@
# SHA1:06f1cef9f12e8fbb0059de4b040a5f3a4fee3469
#
# This file is autogenerated by pip-compile-multi
# To update, run:
#
# pip-compile-multi
#
# This file was autogenerated via dev/make_requirements.py
-r development.txt
-r production.txt
-r testing.txt

# The following packages are considered to be unsafe in a requirements file:
59 changes: 0 additions & 59 deletions cvat/requirements/base.in

This file was deleted.

Loading
Loading