Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dockerfile #1350

Merged
merged 13 commits into from
Oct 31, 2023
64 changes: 64 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
FROM nvidia/cuda:11.8.0-devel-ubuntu22.04 AS dev
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
FROM nvidia/cuda:11.8.0-devel-ubuntu22.04 AS dev
ARG MAX_JOBS=4
FROM nvidia/cuda:11.8.0-devel-ubuntu22.04 AS dev

Forward declare the argument and set sensible default


RUN apt-get update -y \
&& apt-get install -y python3-pip python3-venv

WORKDIR /workspace
COPY requirements.txt requirements.txt
RUN --mount=type=cache,target=/root/.cache/pip \
pip install -r requirements.txt

COPY requirements-dev.txt requirements-dev.txt
RUN --mount=type=cache,target=/root/.cache/pip \
pip install -r requirements-dev.txt

FROM dev AS build_wheel

ARG max_jobs=4

COPY csrc csrc
COPY vllm vllm
COPY pyproject.toml pyproject.toml
COPY README.md README.md
COPY MANIFEST.in MANIFEST.in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally here we will copy only csrc folder and build only the c++ code. If you copy vllm folder any changes to the python code causes slow rebuild of the c++ code, most of the time this is not needed.

Also the README.md file is kind of required during the build of the c++ code, but we can copy empty README.me while build the c++ code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Do you think it would be possible to build the extensions separate from the wheel, so that the wheel building step only bundles everything without having to rebuild?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how to build the wheel with already build extensions, but it might be possible. I don't think we need to build the wheel. If you want we can do another container path for building the wheel if you want this to be used to publish the wheel files to pip?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I just saw you did another stage to build the wheel. This is ok I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for now no need to build the wheel in the dockerfile, can do that later if decide to consolidate docker/CI

COPY setup.py setup.py

RUN --mount=type=cache,target=/root/.cache/pip \
MAX_JOBS=$max_jobs python3 -m build

FROM dev AS build

COPY csrc csrc
COPY setup.py setup.py
COPY README.md README.md
skrider marked this conversation as resolved.
Show resolved Hide resolved
COPY requirements.txt requirements.txt
COPY pyproject.toml pyproject.toml
COPY vllm/__init__.py vllm/__init__.py

ENV MAX_JOBS=$max_jobs
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ENV MAX_JOBS=$max_jobs
ENV MAX_JOBS=$MAX_JOBS

RUN python3 setup.py build_ext --inplace

FROM dev AS test

COPY --from=build /workspace/vllm/*.so /workspace/vllm/
COPY tests tests
COPY vllm vllm

ENTRYPOINT ["python3", "-m", "pytest", "tests"]

FROM nvidia/cuda:11.8.0-base-ubuntu22.04 AS api_server

RUN apt-get update -y \
&& apt-get install -y python3-pip libnccl2
WORKDIR /workspace

COPY requirements.txt requirements.txt
RUN --mount=type=cache,target=/root/.cache/pip \
pip install -r requirements.txt

COPY --from=build /workspace/vllm/*.so /workspace/vllm/
COPY vllm vllm

EXPOSE 8000
ENTRYPOINT ["python3", "-m", "vllm.entrypoints.api_server"]
skrider marked this conversation as resolved.
Show resolved Hide resolved

1 change: 1 addition & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,7 @@ Documentation
serving/distributed_serving
serving/run_on_sky
serving/deploying_with_triton
serving/deploying_with_docker

.. toctree::
:maxdepth: 1
Expand Down
21 changes: 21 additions & 0 deletions docs/source/serving/deploying_with_docker.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
.. _deploying_with_docker:

Deploying with Docker
============================

You can build and run vLLM from source via the provided dockerfile. To build vLLM:

.. code-block:: console

$ DOCKER_BUILDKIT=1 docker build . --target prod --tag vllm --build-arg max_jobs=8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
$ DOCKER_BUILDKIT=1 docker build . --target prod --tag vllm --build-arg max_jobs=8
$ DOCKER_BUILDKIT=1 docker build . --target api_server --tag vllm --build-arg max_jobs=8
$ DOCKER_BUILDKIT=1 docker build . --target openai_api_server --tag vllm-openai --build-arg max_jobs=8


To run vLLM:

.. code-block:: console

$ docker run --runtime nvidia --gpus all \
-v ~/.cache/huggingface:/root/.cache/huggingface \
-p 8000:8000 \
--env "HUGGING_FACE_HUB_TOKEN=<secret>" \
vllm <args...>

3 changes: 3 additions & 0 deletions requirements-dev.txt
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,6 @@ types-setuptools
pytest
pytest-forked
pytest-asyncio

# distribution
build
Loading