[Usage]: Generate specified number of tokens for each request individually #3650

oximi123 · 2024-03-27T02:33:27Z

Your current environment

VLLM with python 3.9, Ubuntu 20

How would you like to use vllm

How can I specify the number of generated tokens for each request individually in both online serving mode and offline batching mode? For example, three requests with 100 tokens generated for request 1, 200 for request 2 and 300 for request 3. In both offline and online mode, three requests can be processed in a batch and return the specified number of tokens.

simon-mo · 2024-03-27T03:47:38Z

This is possible by setting ignore_eos and max_tokens sampling parameters.

If you don't want eos in the output, the min_tokens parameter is added recently and will be in upcoming release.

oximi123 · 2024-03-27T03:59:07Z

This is possible by setting ignore_eos and max_tokens sampling parameters.

If you don't want eos in the output, the min_tokens parameter is added recently and will be in upcoming release.

Thanks for your reply. I'ved tried the sampling parameters but it seems like a global setting which applies to all the request (e.g., generate 100 tokens for all requests). Is there a way to specify number of generated tokens for each request individually?

njhill · 2024-03-27T04:01:30Z

@oximi123 it sounds like #3570 is what you’re looking for.

simon-mo · 2024-03-27T04:06:47Z

In the meantime the online api supports multiple independent requests with different parameters. vLLM perform batching under the hood.

oximi123 · 2024-03-27T04:08:19Z

@oximi123 it sounds like #3570 is what you’re looking for.

Thanks, I'll try it.

Sriharsha-hatwar · 2024-03-30T23:28:21Z

Hello @oximi123 , is there any build where we can test the min_tokens as the version in PyPI does not have that, I wanted to use it and see if it works, is there any way we can test it earlier?

DarkLight1337 · 2024-06-01T13:00:10Z

Closing as #3124 has been released in v0.4.0, and #3570 has been released in v0.4.1.

oximi123 added the usage How to use vllm label Mar 27, 2024

DarkLight1337 closed this as completed Jun 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Usage]: Generate specified number of tokens for each request individually #3650

[Usage]: Generate specified number of tokens for each request individually #3650

oximi123 commented Mar 27, 2024

simon-mo commented Mar 27, 2024

oximi123 commented Mar 27, 2024

njhill commented Mar 27, 2024

simon-mo commented Mar 27, 2024

oximi123 commented Mar 27, 2024

Sriharsha-hatwar commented Mar 30, 2024

DarkLight1337 commented Jun 1, 2024 •

edited

Loading

[Usage]: Generate specified number of tokens for each request individually #3650

[Usage]: Generate specified number of tokens for each request individually #3650

Comments

oximi123 commented Mar 27, 2024

Your current environment

How would you like to use vllm

simon-mo commented Mar 27, 2024

oximi123 commented Mar 27, 2024

njhill commented Mar 27, 2024

simon-mo commented Mar 27, 2024

oximi123 commented Mar 27, 2024

Sriharsha-hatwar commented Mar 30, 2024

DarkLight1337 commented Jun 1, 2024 • edited Loading

DarkLight1337 commented Jun 1, 2024 •

edited

Loading