Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Usage]: Generate specified number of tokens for each request individually #3650

Closed
oximi123 opened this issue Mar 27, 2024 · 7 comments
Closed
Labels
usage How to use vllm

Comments

@oximi123
Copy link

Your current environment

VLLM with python 3.9, Ubuntu 20

How would you like to use vllm

How can I specify the number of generated tokens for each request individually in both online serving mode and offline batching mode? For example, three requests with 100 tokens generated for request 1, 200 for request 2 and 300 for request 3. In both offline and online mode, three requests can be processed in a batch and return the specified number of tokens.

@oximi123 oximi123 added the usage How to use vllm label Mar 27, 2024
@simon-mo
Copy link
Collaborator

This is possible by setting ignore_eos and max_tokens sampling parameters.

If you don't want eos in the output, the min_tokens parameter is added recently and will be in upcoming release.

@oximi123
Copy link
Author

This is possible by setting ignore_eos and max_tokens sampling parameters.

If you don't want eos in the output, the min_tokens parameter is added recently and will be in upcoming release.

Thanks for your reply. I'ved tried the sampling parameters but it seems like a global setting which applies to all the request (e.g., generate 100 tokens for all requests). Is there a way to specify number of generated tokens for each request individually?

@njhill
Copy link
Member

njhill commented Mar 27, 2024

@oximi123 it sounds like #3570 is what you’re looking for.

Copy link
Collaborator

In the meantime the online api supports multiple independent requests with different parameters. vLLM perform batching under the hood.

@oximi123
Copy link
Author

@oximi123 it sounds like #3570 is what you’re looking for.

Thanks, I'll try it.

@Sriharsha-hatwar
Copy link

Hello @oximi123 , is there any build where we can test the min_tokens as the version in PyPI does not have that, I wanted to use it and see if it works, is there any way we can test it earlier?

@DarkLight1337
Copy link
Member

DarkLight1337 commented Jun 1, 2024

Closing as #3124 has been released in v0.4.0, and #3570 has been released in v0.4.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
usage How to use vllm
Projects
None yet
Development

No branches or pull requests

5 participants