-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Test_skip_speculation fails in distributed execution #5814
Comments
BTW, my suspicion is that this will go away once we have SPMD workers in vLLM/spec decode. that will happen after #5408 |
That's great. I'll re-check this bug after that. |
#5408 has been merged, but the test still fails. |
After some testing, I found that driver_worker stalls during ModelRunner.execute(), when there are no drafts to score/evaluate. |
I think it would be a simple solution to switch back to the non spec mode for the step where there are no available drafts. |
btw I got a similar problem at #5799 for non-spec decode case with TP>1, which hangs at sampler in the driver worker. It seems like the root cause of my case is I changed model runner to let it return a list, and non-driver workers should return an empty list correspondingly instead of None. I fixed in the latest commit of the PR. Hopefully the helps your case too. |
@comaniac Great, thanks for sharing! |
Command to reproduce the problem: It does not fail when we add a 'spec-draft-tp 1' option. |
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you! |
Your current environment
🐛 Describe the bug
test_skip_speculation is to verify that vllm can work seamlessly by skipping speculation when the sequence length becomes larger than the max model len of the draft model.
Since it's in 'test_multistep_correctness.py', it's been tested only for single GPU setup, not multi-GPU setup.
And I checked and the test failed in the multi-GPU setup. CI fail result link
The reason for the test failure is that the skipping feature does not consider the multiple draft worker situation.
related comment, code to check (Thanks to @cadedaniel @comaniac )
This bug was found during #5414
The text was updated successfully, but these errors were encountered: