[Feature]: Beam search: top_p, min_p and logit processors #10754

denadai2 · 2024-11-28T20:53:34Z

🚀 The feature, motivation and pitch

Dear vllm community, we recently deprecated beam search from the core library, in favour of a new method called beam_search. However, this new method is far less powerful than before and it restrict the possibilities of applying beam search to many use cases. For example, by controlling the generation (top_p etc) or doing constrained beam search (e.g. https://huggingface.co/blog/constrained-beam-search).

We, at Spotify, use 0.6.1 for this reason. I am sure that many more are doing the same. However, we would like to move to pytorch 2.5 to fully use our h100s, FSDP2 etc. Moreover, we would like to stay up-to-date with vllm

Could we consider moving these parameters in the new method as well? Thaaaaank you!

ref #6226

Alternatives

Huggingface

Additional context

No response

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

The text was updated successfully, but these errors were encountered:

mgoin · 2024-12-02T23:53:38Z

Hey @denadai2 thanks for reporting. I think this is a resource priority problem since in theory you could pipe through anything to the internal SamplingParams created in the beam_search methods

vllm/vllm/engine/protocol.py

Lines 97 to 101 in 4c05edb

    
           beam_search_params = SamplingParams( 
        
               logprobs=2 * beam_width, 
        
               max_tokens=1, 
        
               temperature=temperature, 
        
           )

Could you please narrow the list of potential parameters so someone can prioritize the meaningful ones? Contribution is welcome as well!

youkaichao · 2024-12-03T00:04:17Z

can you join the slack https://slack.vllm.ai for collaboration?

I want to re-implement beam search in the flavor of parallel sampling, but do not have bandwidth.

see

vllm/vllm/sequence.py

Line 1357 in 4c05edb

class ParallelSampleSequenceGroup(SequenceGroupBase):

njhill · 2024-12-03T05:15:05Z

Yes IMO we should deprecate/remove the new beam_search method. We can make the original API work with the "external" implementation. No need for separate BeamSearchParams, most SamplingParams can be used as-is, just need to adjust those that are themselves used by the beam search logic (like num logprobs) ... #9427 (comment)

denadai2 · 2024-12-10T21:28:58Z

can you join the slack https://slack.vllm.ai for collaboration?

I want to re-implement beam search in the flavor of parallel sampling, but do not have bandwidth.

see

vllm/vllm/sequence.py

Line 1357 in 4c05edb

class ParallelSampleSequenceGroup(SequenceGroupBase):

Done! Is there a way to have a middle ground in which we do not have do do a complete refactor?

denadai2 added the feature request label Nov 28, 2024

This was referenced Nov 28, 2024

[RFC] Drop beam search support #6226

Closed

[RFC]: Reimplement and separate beam search on top of vLLM core #8306

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Beam search: top_p, min_p and logit processors #10754

[Feature]: Beam search: top_p, min_p and logit processors #10754

denadai2 commented Nov 28, 2024 •

edited

Loading

mgoin commented Dec 2, 2024

youkaichao commented Dec 3, 2024

njhill commented Dec 3, 2024

denadai2 commented Dec 10, 2024

[Feature]: Beam search: top_p, min_p and logit processors #10754

[Feature]: Beam search: top_p, min_p and logit processors #10754

Comments

denadai2 commented Nov 28, 2024 • edited Loading

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

mgoin commented Dec 2, 2024

youkaichao commented Dec 3, 2024

njhill commented Dec 3, 2024

denadai2 commented Dec 10, 2024

denadai2 commented Nov 28, 2024 •

edited

Loading