beam search bug #644

lijianxing123 · 2023-08-02T07:37:48Z

There is a problem with the implementation of vllm beam search. It seems that the previous search results have been fixed and unchanged. Can you provide some help

lijianxing123 · 2023-08-02T07:37:59Z

thanks

thisissum · 2023-08-03T09:43:46Z

I tried the implementation of #646 , the output is aligned with hf perfectly.
But there are some concurrency problem in this implementation, the api server throws "IndexError: list index out of range" when I request it with 2 threads and get stucked.

hsm1997 · 2023-08-07T09:20:34Z

hi @thisissum , I think the concurrency problem is fixed by now. I've tried up to 5 concurrent queries and things worked out fine in my case.

thisissum · 2023-08-08T09:08:19Z

hi @hsm1997 , thanks a lot for fixing the concurrency problem mentioned above, but I met another problem:
I tried to use 5 concurrent queries to request the api server 1000 times, it seems that the llm_engine stop working after 500-600 requests while the fastapi app can still receive requests. This phenomenon happens randomly and still happens after I decrease the concurrency to 1.

thisissum · 2023-08-08T11:55:22Z

hi @hsm1997 , thanks a lot for fixing the concurrency problem mentioned above, but I met another problem: I tried to use 5 concurrent queries to request the api server 1000 times, it seems that the llm_engine stop working after 500-600 requests while the fastapi app can still receive requests. This phenomenon happens randomly and still happens after I decrease the concurrency to 1.

I did more experiments and find that if I print self.block_manager.get_num_free_gpu_blocks() in scheduler, the num of free gpu blocks keeps decreasing. The llm_engine stop working after the num of free gpu blocks descrease to a very low number.

hsm1997 · 2023-08-09T08:19:34Z

hi @thisissum , I think the memory leak problem is fixed now (by simply removing one redundant block_manager.fork statement). I tested with 3 concurrent threads, EACH queried the api server for about 200 times, all queries succeeded and thenum_free_gpu_blocks does not always decreases.

thisissum · 2023-08-10T02:32:20Z

hi @hsm1997 , I tested again and I'm sure that the problem is fixed. Thanks a lot

zhuohan123 · 2023-09-05T00:31:52Z

Fixed by #857. Please raise a new issue if there is any new error.

hsm1997 mentioned this issue Aug 3, 2023

Align with huggingface beam search #646

Closed

zhuohan123 added the bug Something isn't working label Aug 7, 2023

zhuohan123 closed this as completed Sep 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

beam search bug #644

beam search bug #644

lijianxing123 commented Aug 2, 2023

lijianxing123 commented Aug 2, 2023

thisissum commented Aug 3, 2023

hsm1997 commented Aug 7, 2023

thisissum commented Aug 8, 2023

thisissum commented Aug 8, 2023

hsm1997 commented Aug 9, 2023 •

edited

Loading

thisissum commented Aug 10, 2023

zhuohan123 commented Sep 5, 2023

beam search bug #644

beam search bug #644

Comments

lijianxing123 commented Aug 2, 2023

lijianxing123 commented Aug 2, 2023

thisissum commented Aug 3, 2023

hsm1997 commented Aug 7, 2023

thisissum commented Aug 8, 2023

thisissum commented Aug 8, 2023

hsm1997 commented Aug 9, 2023 • edited Loading

thisissum commented Aug 10, 2023

zhuohan123 commented Sep 5, 2023

hsm1997 commented Aug 9, 2023 •

edited

Loading