Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

beam search bug #644

Closed
lijianxing123 opened this issue Aug 2, 2023 · 8 comments
Closed

beam search bug #644

lijianxing123 opened this issue Aug 2, 2023 · 8 comments
Labels
bug Something isn't working

Comments

@lijianxing123
Copy link

There is a problem with the implementation of vllm beam search. It seems that the previous search results have been fixed and unchanged. Can you provide some help

@lijianxing123
Copy link
Author

thanks

@thisissum
Copy link

I tried the implementation of #646 , the output is aligned with hf perfectly.
But there are some concurrency problem in this implementation, the api server throws "IndexError: list index out of range" when I request it with 2 threads and get stucked.

@hsm1997
Copy link

hsm1997 commented Aug 7, 2023

hi @thisissum , I think the concurrency problem is fixed by now. I've tried up to 5 concurrent queries and things worked out fine in my case.

@zhuohan123 zhuohan123 added the bug Something isn't working label Aug 7, 2023
@thisissum
Copy link

hi @hsm1997 , thanks a lot for fixing the concurrency problem mentioned above, but I met another problem:
I tried to use 5 concurrent queries to request the api server 1000 times, it seems that the llm_engine stop working after 500-600 requests while the fastapi app can still receive requests. This phenomenon happens randomly and still happens after I decrease the concurrency to 1.

@thisissum
Copy link

hi @hsm1997 , thanks a lot for fixing the concurrency problem mentioned above, but I met another problem: I tried to use 5 concurrent queries to request the api server 1000 times, it seems that the llm_engine stop working after 500-600 requests while the fastapi app can still receive requests. This phenomenon happens randomly and still happens after I decrease the concurrency to 1.

I did more experiments and find that if I print self.block_manager.get_num_free_gpu_blocks() in scheduler, the num of free gpu blocks keeps decreasing. The llm_engine stop working after the num of free gpu blocks descrease to a very low number.

@hsm1997
Copy link

hsm1997 commented Aug 9, 2023

hi @thisissum , I think the memory leak problem is fixed now (by simply removing one redundant block_manager.fork statement). I tested with 3 concurrent threads, EACH queried the api server for about 200 times, all queries succeeded and thenum_free_gpu_blocks does not always decreases.

@thisissum
Copy link

hi @hsm1997 , I tested again and I'm sure that the problem is fixed. Thanks a lot

@zhuohan123
Copy link
Member

Fixed by #857. Please raise a new issue if there is any new error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants