-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docker image #1293
Comments
+1 |
Mistral created a Dockerfile here: |
CUDA-based image is too fat and useless, just use slim python image.
|
What are your GPUs?
…On Sun, Oct 8, 2023 at 8:46 AM Alexey Rogov ***@***.***> wrote:
CUDA-based image is too fat and useless, just use slim python image.
I'm using this Dockerfile to run Mistral on 2 GPUs:
`FROM python:3.11-slim
ENV DEBIAN_FRONTEND=noninteractive
RUN pip install --upgrade pip &&
pip install --upgrade ray &&
pip install --upgrade pyarrow &&
pip install pandas fschat==0.2.23 &&
pip install --upgrade vllm
RUN apt-get update && apt-get install git -y
RUN pip install git+https://github.com/huggingface/transformers.git
EXPOSE 8080 6379
CMD echo "Y" | ray start --head && sleep 5 && ray status && python -m
vllm.entrypoints.openai.api_server
--served-model $MODEL_ID
--model $MODEL_ID
--tensor-parallel-size 2
--worker-use-ray
--host 0.0.0.0
--port 8080
--gpu-memory-utilization 0.45
--max-num-batched-tokens 32768`
docker run -d --gpus all -it --ipc=host --shm-size 10g -e MODEL_ID=$model
-p 8080:8080 -p 6379:6379 -v $volume:/root/.cache/huggingface/hub/
morgulio/vllm:0.2.0
Before start specify vars model and volume as you need.
—
Reply to this email directly, view it on GitHub
<#1293 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AHVJPWZUMUQHZIH734SQOADX6KOAPAVCNFSM6AAAAAA5XM6RNWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONJSGAYTSNJWHE>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Is there a Dockerfile anywhere that is successful in building vLLM? |
@sureshbhusare RTX 3090 FE |
This does not work. Nvidia Driver error. |
@sureshbhusare do you have CUDA & Nvidia Docker Toolkit installed on the host? |
I had a working Dockerfile but it's now broken, a recent commit is causing a CUDA mismatch, this image uses a CUDA 1.8 base. The detected CUDA version (11.8) mismatches the version that was used to compile So something is causing pytorch compiled against 12.1 to be installed FROM runpod/pytorch:2.0.1-py3.10-cuda11.8.0-devel Dockerfile is only two lines, vLLM is supposed to require CUDA 11.8 as per the docs, has this changed? |
@olihough86 Just built your Dockerfile and it works for me. UPDATE: It fails at the |
My site prefers Nvidia's conda channel for CUDA over the NVCR images - our vLLM Dockerfile is available @ https://github.com/ucsd-ets/traip-vllm if anybody's interested in that approach. |
Can this dockerFIle be built on any pc(i am on a macbook) and push to registry? @agt having issue building locally only, the idea is the push to ECR and then run it via kubernetes deployment but getting this error
is it also possible for you to put it on Docker hub to prevent building locally? @agt |
Any dockerfile ? or any official docker image ?
The text was updated successfully, but these errors were encountered: