Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker image #1293

Closed
sureshbhusare opened this issue Oct 8, 2023 · 12 comments
Closed

Docker image #1293

sureshbhusare opened this issue Oct 8, 2023 · 12 comments

Comments

@sureshbhusare
Copy link

Any dockerfile ? or any official docker image ?

@MaxZabarka
Copy link

+1

@casper-hansen
Copy link
Contributor

@agrogov
Copy link

agrogov commented Oct 8, 2023

CUDA-based image is too fat and useless, just use slim python image.
I'm using this Dockerfile to run Mistral on 2 GPUs:

FROM python:3.11-slim
ENV DEBIAN_FRONTEND=noninteractive

RUN pip install --upgrade pip && \
    pip install --upgrade ray && \
    pip install --upgrade pyarrow && \
    pip install pandas fschat==0.2.23 && \
    pip install --upgrade vllm
RUN apt-get update && apt-get install git -y
RUN pip install git+https://github.com/huggingface/transformers.git

EXPOSE 8080 6379

CMD echo "Y" | ray start --head && sleep 5 && ray status && python -m vllm.entrypoints.openai.api_server \
        --served-model $MODEL_ID \
        --model $MODEL_ID \
        --tensor-parallel-size 2 \
        --worker-use-ray \
        --host 0.0.0.0 \
        --port 8080 \
        --gpu-memory-utilization 0.45 \
        --max-num-batched-tokens 32768

docker run -d --gpus all -it --ipc=host --shm-size 10g -e MODEL_ID=$model -p 8080:8080 -p 6379:6379 -v $volume:/root/.cache/huggingface/hub/ morgulio/vllm:0.2.0
Before start specify model and volume vars as you need.

@sureshbhusare
Copy link
Author

sureshbhusare commented Oct 8, 2023 via email

@MaxZabarka
Copy link

Is there a Dockerfile anywhere that is successful in building vLLM?

@agrogov
Copy link

agrogov commented Oct 8, 2023

@sureshbhusare RTX 3090 FE

@sureshbhusare
Copy link
Author

**agrogov ** commented Oct 8, 2023

This does not work. Nvidia Driver error.

@agrogov
Copy link

agrogov commented Oct 9, 2023

@sureshbhusare do you have CUDA & Nvidia Docker Toolkit installed on the host?
It's mandatory condition.

@olihough86
Copy link

olihough86 commented Oct 9, 2023

I had a working Dockerfile but it's now broken, a recent commit is causing a CUDA mismatch, this image uses a CUDA 1.8 base.

The detected CUDA version (11.8) mismatches the version that was used to compile
PyTorch (12.1). Please make sure to use the same CUDA versions.
_

So something is causing pytorch compiled against 12.1 to be installed

FROM runpod/pytorch:2.0.1-py3.10-cuda11.8.0-devel
RUN git clone https://github.com/vllm-project/vllm.git && cd vllm && pip install .

Dockerfile is only two lines, vLLM is supposed to require CUDA 11.8 as per the docs, has this changed?

@djmaze
Copy link

djmaze commented Oct 14, 2023

@olihough86 Just built your Dockerfile and it works for me.

UPDATE: It fails at the v0.2.0 tag but works on main.

@agt
Copy link
Contributor

agt commented Oct 17, 2023

My site prefers Nvidia's conda channel for CUDA over the NVCR images - our vLLM Dockerfile is available @ https://github.com/ucsd-ets/traip-vllm if anybody's interested in that approach.

@thearchitectxy
Copy link

thearchitectxy commented Oct 18, 2023

My site prefers Nvidia's conda channel for CUDA over the NVCR images - our vLLM Dockerfile is available @ https://github.com/ucsd-ets/traip-vllm if anybody's interested in that approach.

Can this dockerFIle be built on any pc(i am on a macbook) and push to registry? @agt having issue building locally only, the idea is the push to ECR and then run it via kubernetes deployment but getting this error


300.3 terminate called after throwing an instance of 'std::length_error'
300.3   what():  basic_string::_M_create
300.4 Aborted
------
Dockerfile:32
--------------------
  31 |     
  32 | >>> RUN . /opt/conda/bin/activate && \
  33 | >>>     mamba env create -p /opt/vllm -f /root/vllm-environment.yml
  34 |     
--------------------
ERROR: failed to solve: process "/bin/sh -c . /opt/conda/bin/activate &&     mamba env create -p /opt/vllm -f /root/vllm-environment.yml" did not complete successfully: exit code: 134

is it also possible for you to put it on Docker hub to prevent building locally? @agt

@esmeetu esmeetu closed this as completed Mar 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants