Align vLLM's beam search implementation with HF generate #857

zhuohan123 · 2023-08-24T22:47:33Z

This PR refactors the changes in #646.

The goal of this PR is to align the beam search with hf_model.generate(), which is also aligned with many other older frameworks, including tensor2tensor and fairseq. When meeting a finished beam candidate, our old beam search algorithm will always keep this finished beam and reduce the beam width of the remaining beam search by 1. However, for HF, the beam width will always be a fixed number, and we will select the top-"beam width" running requests for the next iteration.

This change breaks the assumption that every sequence group in vLLM will always have a fixed number (which actually always equals to best_of) of requests. Therefore, we need to dynamically grow the number of sequences in a sequence group. After this PR, every request will start with only one sequence (for prompt computation). Later, each request will grow into multiple other sequences during the decoding process based on their sampling algorithms.

Should be merged after #867

TODOs:

Document the exact changes in PR's description.
Add more comments to the code about the change.
Conduct a more thorough test with more input examples (maybe read from a dataset) and mixed sampling methods.
Maybe in a separate PR, add a test with multiple processes querying the API to test for race conditions.

cc @hsm1997

examples/test_sampling.py

vllm/engine/llm_engine.py

WoosukKwon

@zhuohan123 Awesome! Thanks for the amazing work! 🚀

As we discussed offline, I think the PR only needs small fixes for further clarification. I like the changes in the system design. Thanks again for the hard work.

vllm/model_executor/models/mpt.py

vllm/model_executor/models/qwen.py

vllm/sampling_params.py

examples/test_sampling.py

vllm/model_executor/layers/sampler.py

vllm/sequence.py

vllm/engine/llm_engine.py

tests/samplers/test_beam_search.py

WoosukKwon

Looks very good to me! Many thanks for the hard work. I believe this will make vLLM a unique inference engine that effectively supports beam search. Nice work!

WoosukKwon · 2023-09-04T15:39:46Z

@zhuohan123 BTW, please close some issues fixed by this PR.

…t#857)

zhuohan123 added 8 commits August 21, 2023 22:39

[WIP] restructure sequence id in a sequence group

b6851eb

Merge branch 'main' into correct-beam-search

e657d6b

Add SamplerOutput for aquila

9d951f6

[WIP] Fix scheduler logic

4cc447d

[WIP] Finish all parts except beam search logic

ae77670

Add beam search

fea9ed8

Fix first execution bugs

cec001e

Match the results of HF beam search

63894ad

zhuohan123 requested a review from WoosukKwon August 24, 2023 22:47

code organization

18b6f03

zhuohan123 changed the title ~~Align vLLM's beam search implementation with HF generate~~ [WIP] Align vLLM's beam search implementation with HF generate Aug 25, 2023

zhuohan123 added 3 commits August 25, 2023 04:46

Fix bug

28d81d2

Revert llm engine example

b3e5400

Add more comments

632e4a7

zhuohan123 changed the title ~~[WIP] Align vLLM's beam search implementation with HF generate~~ Align vLLM's beam search implementation with HF generate Aug 25, 2023

Fix pylint

f7a8de2

zhuohan123 commented Aug 25, 2023

View reviewed changes

examples/test_sampling.py Outdated Show resolved Hide resolved

zhuohan123 added 2 commits August 25, 2023 07:08

Merge branch 'main' into correct-beam-search

13ec78a

Add more comments.

32016c6

zhuohan123 force-pushed the main branch from 3affdce to 0080d83 Compare August 30, 2023 09:26

zhuohan123 commented Aug 30, 2023

View reviewed changes

vllm/engine/llm_engine.py Show resolved Hide resolved

Merge branch 'main' into correct-beam-search

df3412c

WoosukKwon reviewed Aug 31, 2023

View reviewed changes

WoosukKwon mentioned this pull request Aug 31, 2023

Add tests for models #922

Merged

3 tasks

zhuohan123 added 6 commits September 1, 2023 00:44

Add better test

e29c6a2

Fix one difference between HF and vLLM, pass tests

8effea2

Remove some tests temporarily

f424dc5

Fix part of review comments

7c84e5b

Merge branch 'main' into correct-beam-search

849c6b6

Move SamplerOutput to sequence.py

9929656

zhuohan123 added 3 commits September 4, 2023 07:30

Fix comments

746f726

Fix sampling methods and fix tests

ffd2adf

Separate beam search and parallel sampling case for easy understanding

9503e0b

zhuohan123 requested a review from WoosukKwon September 4, 2023 09:07

WoosukKwon reviewed Sep 4, 2023

View reviewed changes

tests/samplers/test_beam_search.py Show resolved Hide resolved

WoosukKwon approved these changes Sep 4, 2023

View reviewed changes

WoosukKwon mentioned this pull request Sep 4, 2023

Bump up the version to v0.1.5 #944

Merged

4 tasks

zhuohan123 merged commit 002800f into main Sep 5, 2023
2 checks passed

This was referenced Sep 5, 2023

Align with huggingface beam search #646

Closed

Weird beam search outputs #344

Closed

beam search bug #644

Closed

Abraham-Xu mentioned this pull request Sep 7, 2023

Misaligned beam search result with huggingface #975

Open

zhuohan123 deleted the correct-beam-search branch September 7, 2023 21:02

liuyanyi pushed a commit to liuyanyi/vllm that referenced this pull request Sep 12, 2023

Align vLLM's beam search implementation with HF generate (vllm-projec…

d1d743c

…t#857)

hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024

Align vLLM's beam search implementation with HF generate (vllm-projec…

8b92779

…t#857)

sjchoi1 pushed a commit to casys-kaist-internal/vllm that referenced this pull request May 7, 2024

Align vLLM's beam search implementation with HF generate (vllm-projec…

8cd78eb

…t#857)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Align vLLM's beam search implementation with HF generate #857

Align vLLM's beam search implementation with HF generate #857

zhuohan123 commented Aug 24, 2023 •

edited

Loading

WoosukKwon left a comment

WoosukKwon left a comment

WoosukKwon commented Sep 4, 2023

Align vLLM's beam search implementation with HF generate #857

Align vLLM's beam search implementation with HF generate #857

Conversation

zhuohan123 commented Aug 24, 2023 • edited Loading

WoosukKwon left a comment

Choose a reason for hiding this comment

WoosukKwon left a comment

Choose a reason for hiding this comment

WoosukKwon commented Sep 4, 2023

zhuohan123 commented Aug 24, 2023 •

edited

Loading