-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: with worker_use_ray = true
, and tensor_parallel_size > 1, the process is pending forever
#4639
Comments
Does it work without your change? |
I guess it could work, but cannot test since I have only 2 GPU, and assign driver with 0.1. |
Yes, each nccl process need to own one GPU. |
Fine, so to run vllm on ray cluster, I have to waste some GPU, that's not expected, any suggestion? @youkaichao |
Actually it's not a problem just for ray cluster scenario, I mean on node if I have 2 GPU, I guess it cannot serve as tensor_parallel_size = 2 with vllm, since driver process will occupied some GPU |
I don't know your setup with ray. Our CI works fine with 2 GPU machine for |
you are right, on node it works. |
got the same issue,I solved it by updating accelerate from 0.26.0 to 0.30.0 |
so you also rewrite the original code to set |
I guess we hit the same scenario, and it's really a rare case. but with vllm 4.x, it requires Really expect some one give a clarification... |
In case it helps, you can now use tensor parallel without Ray, see #4539. |
Actually, in my case I adopt ray as a unity workload platform and want to run variously LLM workload in a single ray cluster. but when I try to integrated with vllm. I hit such problem. |
I also encountered this problem. I solved it by compiling the nccl source code and then modifying the path of libnccl.so.2 in the vllm source code. |
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you! |
Your current environment
🐛 Describe the bug
It's little complex for my case, I try to launch vllm in a ray cluster, since the latest vllm requires the driver process has GPU capability. but I do not want to waste a GPU for driver(could not use by worker), so I changed source code of vllm:
vllm/vllm/executor/ray_gpu_executor.py
Lines 66 to 71 in a98187c
to always use
num_gpus = self.cache_config.gpu_memory_utilization
, no mattertensor_parallel_size
, that's means I can make worker and driver share one GPU.unfortunate, the process is pending forever, to be specific it's pending in
nccl.ncclCommInitRank
Any suggestion? or any suggestion to launch vllm on ray cluster?
The text was updated successfully, but these errors were encountered: