[Usage]: Specify the number of tokens to be generated #3518

oximi123 · 2024-03-20T02:00:47Z

vllm

I want to specify the number of generated tokens for each request using vllm. Is there a way to do so?

simon-mo · 2024-03-20T02:08:39Z

You can use a combination of max_tokens and ignore_eos in sampling params to achieve this. https://docs.vllm.ai/en/latest/dev/sampling_params.html

There's also active work in #3124 for minimal tokens.

oximi123 added the usage How to use vllm label Mar 20, 2024

oximi123 closed this as completed Mar 20, 2024

Provide feedback