Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: return Usage info for streaming request for each chunk in ChatCompletion #6540

Closed
yecohn opened this issue Jul 18, 2024 · 2 comments

Comments

@yecohn
Copy link
Contributor

yecohn commented Jul 18, 2024

🚀 The feature, motivation and pitch

in entrypoints.openai.serving_completions.py I see OpenAIServingCompletion holds the method completion_stream_generator that can return usage info for each chunk by using StreamOptions continuous_usage_stats.
line 297.

``if (request.stream_options
and request.stream_options.include_usage):
if (request.stream_options.continuous_usage_stats
or output.finish_reason is not None):
prompt_tokens = len(res.prompt_token_ids)
completion_tokens = len(output.token_ids)
usage = UsageInfo(
prompt_tokens=prompt_tokens,
completion_tokens=completion_tokens,
total_tokens=prompt_tokens + completion_tokens,
)

Somehow, this is not the case in entrypoints.openai.serving_chat.py. I propose to add this feature for OpenAIServingChat.

What do you think ?

Alternatives

No response

Additional context

No response

@tdoublep
Copy link
Member

@yecohn I guess it makes sense. We added it to the completions API because that is the API we are primarily using for benchmarking. However, if it's helpful to have it in the chat API too, I don't see any reason not to add it. Shouldn't be a big change.

@yecohn
Copy link
Contributor Author

yecohn commented Jul 20, 2024

Perfect then I'll open a PR

@hmellor hmellor closed this as completed Jul 26, 2024
Alvant pushed a commit to compressa-ai/vllm that referenced this issue Oct 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants