-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
beam search support #722
Comments
Hi @leiwen83 Indeed beam search is not implemented however we have a different algorithm which seems to work just as good or even better.
Is that option what you could be looking for. |
I vote for Beam Search. In the case of using Page Attention, Beam Search can share one Prifill operation and save computation with long prompts. |
@jiguanglizipao I agree with you, it seems that the argument "best_of" does not provide good results. Moreover, in the case of my model, using "do_sample" leads to unwanted results |
Would ge great to have. best_of is great but way to slow. |
Beam search is much worse than best_of performance wise. The timing difference you show here a surprisingly different. How did you measure |
@Narsil Thanks for your response. Probably you are right, just saying my observations so far. The timing is from the docker container itself. It prints that after it generates text. Starting the docker like this:
Testing with that:
|
Oh I see bnb-nf4 is just super slow on anything above batch_size=1. It has nothing to do with best_of. |
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
Feature request
Beam search is useful feature provided by transformer library, but it seem it is missing in TGI?
Would it be supported?
Motivation
beam search would be helpful for response quality.
Your contribution
I'd have a try if this feature is implemented
The text was updated successfully, but these errors were encountered: