-
-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vLLM Distributed Inference stuck when using multi -GPU #2466
Comments
this is happening to me too, on 2 * 3090 |
try these parameters |
hello, I have tried the method you provided, but it has no effect. |
No effect here either |
Did you found a solution i Ve the same issue ? |
@BilalKHA95 try this export NCCL_P2P_DISABLE=1 this woked for me |
Thank's !!! it's working now, this env variable + update cuda tooltik to 12.3 |
This also solved this issue for me. |
Hi! |
This didn't work for me:
Is there any solutions? Thank you guys very much in advance! Best regards, Shuyue |
We have added documentation for this situation in #5430. Please take a look. |
I am trying to run inferece server on multi GPU using this on (4 * NVIDIA GeForce RTX 3090) server.
python -u -m vllm.entrypoints.api_server --host 0.0.0.0 --model mistralai/Mistral-7B-Instruct-v0.2 --tensor-parallel-size 4
while this works fine when using --tensor-parallel-size =1 , but on using tensor-parallel-size >1 it stuck on strat up.
Thanks
The text was updated successfully, but these errors were encountered: