-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
beam search bug #644
Comments
thanks |
I tried the implementation of #646 , the output is aligned with hf perfectly. |
hi @thisissum , I think the concurrency problem is fixed by now. I've tried up to 5 concurrent queries and things worked out fine in my case. |
hi @hsm1997 , thanks a lot for fixing the concurrency problem mentioned above, but I met another problem: |
I did more experiments and find that if I print self.block_manager.get_num_free_gpu_blocks() in scheduler, the num of free gpu blocks keeps decreasing. The llm_engine stop working after the num of free gpu blocks descrease to a very low number. |
hi @thisissum , I think the memory leak problem is fixed now (by simply removing one redundant |
hi @hsm1997 , I tested again and I'm sure that the problem is fixed. Thanks a lot |
Fixed by #857. Please raise a new issue if there is any new error. |
There is a problem with the implementation of vllm beam search. It seems that the previous search results have been fixed and unchanged. Can you provide some help
The text was updated successfully, but these errors were encountered: