Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Usage]: Specify the number of tokens to be generated #3518

Closed
oximi123 opened this issue Mar 20, 2024 · 1 comment
Closed

[Usage]: Specify the number of tokens to be generated #3518

oximi123 opened this issue Mar 20, 2024 · 1 comment
Labels
usage How to use vllm

Comments

@oximi123
Copy link

Your current environment

vllm

How would you like to use vllm

I want to specify the number of generated tokens for each request using vllm. Is there a way to do so?

@oximi123 oximi123 added the usage How to use vllm label Mar 20, 2024
@simon-mo
Copy link
Collaborator

You can use a combination of max_tokens and ignore_eos in sampling params to achieve this. https://docs.vllm.ai/en/latest/dev/sampling_params.html

There's also active work in #3124 for minimal tokens.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
usage How to use vllm
Projects
None yet
Development

No branches or pull requests

2 participants