-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: vllm 0.4.1 crashing after checking P2P status on single GPU #4587
Comments
I'm trying to set up llama3, so I'm hoping not to have to downgrade vllm versions if possible. |
Any luck? I have the same issue on a single GPU machine. |
Do you try newer versions? I believe this code is introduced in #4159 , and we have some update to that part of code later. |
My educated guess is that this is some hardware problem. I see the gpu model is |
@khoj-pez Which GPU model are you using? Also, does the error remain if you set environment variable NCCL_P2P_DISABLE=1 ? |
After further investigation, i think this is a nccl issue:
nccl calls I suggest you report this to the nccl team at https://github.com/NVIDIA/nccl . |
I talked with NCCL team, and they confirm they don't support vGPU. I created an issue to track this problem |
I guess this problem is solved by #4591, can you confirm? @youkaichao |
I don't have vGPU to test. It would be great if @alexandergagliano can test the latest code and see if this issue is resolved. |
I can confirm that the error is still present on a vGPU in 0.4.3, but it is now triggered by the warmup at this line: vllm/vllm/distributed/parallel_state.py Line 122 in 246598a
Commenting out that one line seems to make at least the openai compatible server work for me though! |
/assign |
glad to hear it works. do you know how to use vGPU in k8s so that our ci can test this case? |
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you! |
Your current environment
🐛 Describe the bug
Thanks for the great code!
I'm getting a strange nccl issue in the latest version of vllm (0.4.1). I had no problems with earlier releases (just confirmed that v0.3.0 runs without issue. From what I can tell of the error message, the code is attempting a peer-to-peer connection, but I'm only running on a single GPU. Running the minimal example above, I get:
Any ideas? Thanks!
The text was updated successfully, but these errors were encountered: