Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Beam search: top_p, min_p and logit processors #10754

Open
1 task done
denadai2 opened this issue Nov 28, 2024 · 4 comments
Open
1 task done

[Feature]: Beam search: top_p, min_p and logit processors #10754

denadai2 opened this issue Nov 28, 2024 · 4 comments

Comments

@denadai2
Copy link

denadai2 commented Nov 28, 2024

🚀 The feature, motivation and pitch

Dear vllm community, we recently deprecated beam search from the core library, in favour of a new method called beam_search. However, this new method is far less powerful than before and it restrict the possibilities of applying beam search to many use cases. For example, by controlling the generation (top_p etc) or doing constrained beam search (e.g. https://huggingface.co/blog/constrained-beam-search).

We, at Spotify, use 0.6.1 for this reason. I am sure that many more are doing the same. However, we would like to move to pytorch 2.5 to fully use our h100s, FSDP2 etc. Moreover, we would like to stay up-to-date with vllm

Could we consider moving these parameters in the new method as well? Thaaaaank you!

ref #6226

Alternatives

Huggingface

Additional context

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@mgoin
Copy link
Member

mgoin commented Dec 2, 2024

Hey @denadai2 thanks for reporting. I think this is a resource priority problem since in theory you could pipe through anything to the internal SamplingParams created in the beam_search methods

beam_search_params = SamplingParams(
logprobs=2 * beam_width,
max_tokens=1,
temperature=temperature,
)

Could you please narrow the list of potential parameters so someone can prioritize the meaningful ones? Contribution is welcome as well!

@youkaichao
Copy link
Member

can you join the slack https://slack.vllm.ai for collaboration?

I want to re-implement beam search in the flavor of parallel sampling, but do not have bandwidth.

see

class ParallelSampleSequenceGroup(SequenceGroupBase):

@njhill
Copy link
Member

njhill commented Dec 3, 2024

Yes IMO we should deprecate/remove the new beam_search method. We can make the original API work with the "external" implementation. No need for separate BeamSearchParams, most SamplingParams can be used as-is, just need to adjust those that are themselves used by the beam search logic (like num logprobs) ... #9427 (comment)

@denadai2
Copy link
Author

can you join the slack https://slack.vllm.ai for collaboration?

I want to re-implement beam search in the flavor of parallel sampling, but do not have bandwidth.

see

class ParallelSampleSequenceGroup(SequenceGroupBase):

Done! Is there a way to have a middle ground in which we do not have do do a complete refactor?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants