vLLM Distributed Inference stuck when using multi -GPU #2466

RathoreShubh · 2024-01-17T12:17:04Z

I am trying to run inferece server on multi GPU using this on (4 * NVIDIA GeForce RTX 3090) server.

python -u -m vllm.entrypoints.api_server --host 0.0.0.0 --model mistralai/Mistral-7B-Instruct-v0.2 --tensor-parallel-size 4

while this works fine when using --tensor-parallel-size =1 , but on using tensor-parallel-size >1 it stuck on strat up.

Thanks

RhizoNymph · 2024-01-17T18:53:24Z

this is happening to me too, on 2 * 3090

s-natsubori · 2024-01-18T03:09:21Z

try these parameters
--gpu-memory-utilization 0.7~0.9
--max-model-len 8192

Double-bear · 2024-01-18T12:29:24Z

try these parameters --gpu-memory-utilization 0.7~0.9 --max-model-len 8192

hello, I have tried the method you provided, but it has no effect.

RhizoNymph · 2024-01-18T19:01:35Z

No effect here either

BilalKHA95 · 2024-02-21T18:35:00Z

Did you found a solution i Ve the same issue ?

shubham-bnxt · 2024-02-22T10:29:17Z

@BilalKHA95 try this

export NCCL_P2P_DISABLE=1

this woked for me

BilalKHA95 · 2024-02-22T16:58:06Z

@BilalKHA95 try this

export NCCL_P2P_DISABLE=1

this woked for me

Thank's !!! it's working now, this env variable + update cuda tooltik to 12.3

Palmik · 2024-03-16T07:43:23Z

export NCCL_P2P_DISABLE=1

This also solved this issue for me.

emersonium · 2024-05-04T22:00:27Z

@BilalKHA95 try this
export NCCL_P2P_DISABLE=1
this woked for me

Thank's !!! it's working now, this env variable + update cuda tooltik to 12.3

Hi!
does this result in higher tokens/second for you ? (for a small model like: -model mistralai/Mistral-7B-Instruct-v0.2 --tensor-parallel-size 4) ? thanks!

SuperBruceJia · 2024-06-10T01:23:17Z

This didn't work for me:

export NCCL_P2P_DISABLE=1

Is there any solutions?

Thank you guys very much in advance!

Best regards,

Shuyue
June 9th, 2024

DarkLight1337 · 2024-06-13T09:04:58Z

We have added documentation for this situation in #5430. Please take a look.

youkaichao mentioned this issue Mar 17, 2024

[Bug]: Dead lock in distributed inference when ray worker raises an exception #3455

Closed

SuperBruceJia mentioned this issue Jun 10, 2024

[Bug]: Can't run vllm distributed inference with vLLM + Ray #5094

Closed

DarkLight1337 closed this as completed Jun 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vLLM Distributed Inference stuck when using multi -GPU #2466

vLLM Distributed Inference stuck when using multi -GPU #2466

RathoreShubh commented Jan 17, 2024

RhizoNymph commented Jan 17, 2024

s-natsubori commented Jan 18, 2024

Double-bear commented Jan 18, 2024

RhizoNymph commented Jan 18, 2024

BilalKHA95 commented Feb 21, 2024

shubham-bnxt commented Feb 22, 2024

BilalKHA95 commented Feb 22, 2024

Palmik commented Mar 16, 2024

emersonium commented May 4, 2024

SuperBruceJia commented Jun 10, 2024

DarkLight1337 commented Jun 13, 2024

vLLM Distributed Inference stuck when using multi -GPU #2466

vLLM Distributed Inference stuck when using multi -GPU #2466

Comments

RathoreShubh commented Jan 17, 2024

RhizoNymph commented Jan 17, 2024

s-natsubori commented Jan 18, 2024

Double-bear commented Jan 18, 2024

RhizoNymph commented Jan 18, 2024

BilalKHA95 commented Feb 21, 2024

shubham-bnxt commented Feb 22, 2024

BilalKHA95 commented Feb 22, 2024

Palmik commented Mar 16, 2024

emersonium commented May 4, 2024

SuperBruceJia commented Jun 10, 2024

DarkLight1337 commented Jun 13, 2024