Add dockerfile #1350

skrider · 2023-10-14T01:56:30Z

Implements #1293

Adds a dockerfile to run vLLM in a containerized environment. This has been tested to work with facebook/opt-13b on 2 g5.2xlarge AWS instances utilizing Ray for communication. Output of benchmarks/benchmark_serving.py:

Total time: 16.32 s
Throughput: 61.28 requests/s
Average latency: 10.59 s
Average latency per token: 0.88 s
Average latency per output token: 2.12 s

The image is quite large however, at 6.93 gb.

Is this testing setup sufficient?

Additionally, using cuda-base reduces the image size significantly but prevents installation of development dependencies, among other things preventing unit tests from being run - is this a concern?

Should we also consider pruning the image and removing unused dependencies from the pip install to further reduce image size?

simon-mo · 2023-10-16T17:53:37Z

Hi @skrider

Great work so far. This is a good start. Few comments:

We should absolutely try to prune the image as much as possible.
Can we add an ENTRYPOINT so users don't need to type python3 -m vllm.entrypoints.api_server each time.
Consider add EXPOSE for default port as well?
Ideally, we should have "dev" container that contains all build dependencies so we can run CI and help contributors set up everything; and another "prod" container with smallest size possible to run models or built upon.

skrider · 2023-10-18T21:25:51Z

@simon-mo - thanks!

We should absolutely try to prune the image as much as possible.

Noted, will do. I am working on some tooling to help prune images automatically here: https://github.com/skrider/pip-prune . So far have been able to reduce pip install footprint from 6 to 3 gb.

Regarding entrypoint - I considered adding this but thought that sometimes users might want to start the container with ray and then exec vllm.entrypoints.api_server later. Perhaps it would be good to provide an entrypoint.sh script that checks for an environment variable and optionally starts up/waits for Ray? Could get a bit clunky. Or should we just ignore Ray?

Regarding the dev/prod container - should these install vllm from pypi or the repo directly?

Apologies for the late reply - need to fix github notifications

simon-mo · 2023-10-23T22:07:36Z

Users can override entrypoint using --entrypoint command in docker run as well as similar means in K8s.
It would be great to allow customizing version using docker ARG and default to the latest version. As long as it is easy enough to build it for any version, it should be fine.

NikolaBorisov

Left few comments. Trying to reconcile this with #1415

NikolaBorisov · 2023-10-24T02:32:17Z

Dockerfile

+COPY vllm vllm
+COPY pyproject.toml pyproject.toml
+COPY README.md README.md
+COPY MANIFEST.in MANIFEST.in


Ideally here we will copy only csrc folder and build only the c++ code. If you copy vllm folder any changes to the python code causes slow rebuild of the c++ code, most of the time this is not needed.

Also the README.md file is kind of required during the build of the c++ code, but we can copy empty README.me while build the c++ code.

Makes sense. Do you think it would be possible to build the extensions separate from the wheel, so that the wheel building step only bundles everything without having to rebuild?

I'm not sure how to build the wheel with already build extensions, but it might be possible. I don't think we need to build the wheel. If you want we can do another container path for building the wheel if you want this to be used to publish the wheel files to pip?

Ok I just saw you did another stage to build the wheel. This is ok I think.

I think for now no need to build the wheel in the dockerfile, can do that later if decide to consolidate docker/CI

Dockerfile

NikolaBorisov · 2023-10-24T02:46:12Z

* It would be great to allow customizing version using docker `ARG` and default to the latest version. As long as it is easy enough to build it for any version, it should be fine.

I don't think we need this. Users who want particular version would be able to just docker pull vllm:0.2.0 If they want to build from scratch they can checkout the right branch/tag and docker build there.

Dockerfile

skrider · 2023-10-25T22:11:59Z

@NikolaBorisov Took your commentary into account - should be ready to merge. I am still working on pruning the image but that is not critical.

NikolaBorisov

Looks good to me. Just left 2 more suggestions.

NikolaBorisov · 2023-10-26T00:08:09Z

Dockerfile

+FROM api_server AS openai_api_server
+ENTRYPOINT ["python3", "-m", "vllm.entrypoints.openai.api_server"]


Suggested change

FROM api_server AS openai_api_server

ENTRYPOINT ["python3", "-m", "vllm.entrypoints.openai.api_server"]

FROM api_server AS openai_api_server

# extra dependencies for openapi server

RUN pip install accelerate fschat

ENTRYPOINT ["python3", "-m", "vllm.entrypoints.openai.api_server"]

NikolaBorisov · 2023-10-26T00:10:59Z

docs/source/serving/deploying_with_docker.rst

+
+.. code-block:: console
+
+    $ DOCKER_BUILDKIT=1 docker build . --target prod --tag vllm --build-arg max_jobs=8


Suggested change

$ DOCKER_BUILDKIT=1 docker build . --target prod --tag vllm --build-arg max_jobs=8

$ DOCKER_BUILDKIT=1 docker build . --target api_server --tag vllm --build-arg max_jobs=8

$ DOCKER_BUILDKIT=1 docker build . --target openai_api_server --tag vllm-openai --build-arg max_jobs=8

NikolaBorisov · 2023-10-26T05:18:59Z

@simon-mo I think this is ready to merge after @skrider fixes 2 small things. I'm going to close #1415 in favor of this one.

The only think other thing we might want to fix is the names of the last to stages in the file. Instead of 'api_server' and 'openai_api_server' I think 'vllm' and 'vllm-openai' are better names.

simon-mo · 2023-10-27T19:39:43Z

@NikolaBorisov thank you for your feedback! @skrider it would be great to address them. I also created vllm docker user and i'm planning to publish official images along with releases. Before we merge, @skrider can you add comments to Dockerfile and group different stages together so they are more maintainable for the future?

skrider · 2023-10-27T21:37:34Z

@simon-mo Yes will do- should be done EOD

Extremys · 2023-10-30T21:02:14Z

Implements #1293

Adds a dockerfile to run vLLM in a containerized environment. This has been tested to work with facebook/opt-13b on 2 g5.2xlarge AWS instances utilizing Ray for communication. Output of benchmarks/benchmark_serving.py:
Total time: 16.32 s
Throughput: 61.28 requests/s
Average latency: 10.59 s
Average latency per token: 0.88 s
Average latency per output token: 2.12 s
The image is quite large however, at 6.93 gb.

Is this testing setup sufficient?

Additionally, using cuda-base reduces the image size significantly but prevents installation of development dependencies, among other things preventing unit tests from being run - is this a concern?

Should we also consider pruning the image and removing unused dependencies from the pip install to further reduce image size?

Did you check that is working for multiGPUs inference also? (a model loaded on at least two GPUs)
Are you building the container from the target host with nvidia runtime enabled?

simon-mo · 2023-10-30T23:56:18Z

Dockerfile

@@ -0,0 +1,72 @@
+FROM nvidia/cuda:11.8.0-devel-ubuntu22.04 AS dev


Suggested change

FROM nvidia/cuda:11.8.0-devel-ubuntu22.04 AS dev

ARG MAX_JOBS=4

FROM nvidia/cuda:11.8.0-devel-ubuntu22.04 AS dev

Forward declare the argument and set sensible default

simon-mo · 2023-10-30T23:56:26Z

Dockerfile

+COPY vllm/__init__.py vllm/__init__.py
+
+# max jobs used by Ninja to build extensions
+ENV MAX_JOBS=$max_jobs 


Suggested change

ENV MAX_JOBS=$max_jobs

ENV MAX_JOBS=$MAX_JOBS

simon-mo · 2023-10-30T23:56:35Z

docs/source/serving/deploying_with_docker.rst

+
+.. code-block:: console
+
+    $ DOCKER_BUILDKIT=1 docker build . --target vllm --tag vllm --build-arg max_jobs=8


Suggested change

$ DOCKER_BUILDKIT=1 docker build . --target vllm --tag vllm --build-arg max_jobs=8

$ DOCKER_BUILDKIT=1 docker build . --target vllm --tag vllm --build-arg MAX_JOBS=8

skrider marked this pull request as draft October 14, 2023 01:57

skrider added 5 commits October 23, 2023 14:43

Add dockerfile

c371558

fix typo

1d2fe9d

[FIX] Specifies run command explicitly

6722764

refactor dockerfile to multistage build from source

577ae8f

update docs

9f9c659

skrider force-pushed the add-dockerfile branch from ee4ef93 to 9f9c659 Compare October 23, 2023 21:45

simon-mo mentioned this pull request Oct 23, 2023

Docker file #1415

Closed

NikolaBorisov reviewed Oct 24, 2023

View reviewed changes

refactor dockerfile for faster builds

5cd2b85

NikolaBorisov reviewed Oct 25, 2023

View reviewed changes

Dockerfile Outdated Show resolved Hide resolved

skrider added 3 commits October 25, 2023 14:21

don't copy readme over

4f4b206

make readme optional

b1e6fe4

add openai api server entrypoint

5b2ba1d

NikolaBorisov approved these changes Oct 26, 2023

View reviewed changes

miscellaneous dockerfile fixes and improvements

d259d91

skrider marked this pull request as ready for review October 27, 2023 23:09

skrider added 3 commits October 27, 2023 17:42

share base install between vllm and vllm-openai

259cfc3

remove build package dependency

0e9fb7d

remove libnccl dependency

13661d3

simon-mo approved these changes Oct 30, 2023

View reviewed changes

simon-mo approved these changes Oct 31, 2023

View reviewed changes

simon-mo merged commit 9cabcb7 into vllm-project:main Oct 31, 2023
2 checks passed

hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024

Add Dockerfile (vllm-project#1350)

37a54d5

simon-mo mentioned this pull request Feb 28, 2024

feature request: Dockerfile #390

Closed

sjchoi1 pushed a commit to casys-kaist-internal/vllm that referenced this pull request May 7, 2024

Add Dockerfile (vllm-project#1350)

40eea27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add dockerfile #1350

Add dockerfile #1350

skrider commented Oct 14, 2023 •

edited

Loading

simon-mo commented Oct 16, 2023

skrider commented Oct 18, 2023 •

edited

Loading

simon-mo commented Oct 23, 2023

NikolaBorisov left a comment

NikolaBorisov Oct 24, 2023

skrider Oct 24, 2023

NikolaBorisov Oct 25, 2023

NikolaBorisov Oct 25, 2023

skrider Oct 25, 2023

NikolaBorisov commented Oct 24, 2023

skrider commented Oct 25, 2023

NikolaBorisov left a comment

NikolaBorisov Oct 26, 2023

NikolaBorisov Oct 26, 2023

NikolaBorisov commented Oct 26, 2023

simon-mo commented Oct 27, 2023

skrider commented Oct 27, 2023

Extremys commented Oct 30, 2023

simon-mo Oct 30, 2023

simon-mo Oct 30, 2023

simon-mo Oct 30, 2023

		FROM api_server AS openai_api_server
		ENTRYPOINT ["python3", "-m", "vllm.entrypoints.openai.api_server"]


		.. code-block:: console

		$ DOCKER_BUILDKIT=1 docker build . --target prod --tag vllm --build-arg max_jobs=8

		@@ -0,0 +1,72 @@
		FROM nvidia/cuda:11.8.0-devel-ubuntu22.04 AS dev

Add dockerfile #1350

Add dockerfile #1350

Conversation

skrider commented Oct 14, 2023 • edited Loading

simon-mo commented Oct 16, 2023

skrider commented Oct 18, 2023 • edited Loading

simon-mo commented Oct 23, 2023

NikolaBorisov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NikolaBorisov commented Oct 24, 2023

skrider commented Oct 25, 2023

NikolaBorisov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NikolaBorisov commented Oct 26, 2023

simon-mo commented Oct 27, 2023

skrider commented Oct 27, 2023

Extremys commented Oct 30, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

skrider commented Oct 14, 2023 •

edited

Loading

skrider commented Oct 18, 2023 •

edited

Loading