-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature]: return Usage info for streaming request for each chunk in ChatCompletion #6540
Labels
Comments
@yecohn I guess it makes sense. We added it to the completions API because that is the API we are primarily using for benchmarking. However, if it's helpful to have it in the chat API too, I don't see any reason not to add it. Shouldn't be a big change. |
Perfect then I'll open a PR |
simon-mo
pushed a commit
that referenced
this issue
Jul 23, 2024
xjpang
pushed a commit
to xjpang/vllm
that referenced
this issue
Jul 24, 2024
xjpang
pushed a commit
to xjpang/vllm
that referenced
this issue
Jul 24, 2024
fialhocoelho
pushed a commit
to opendatahub-io/vllm
that referenced
this issue
Jul 24, 2024
cduk
pushed a commit
to cduk/vllm-pascal
that referenced
this issue
Aug 6, 2024
kylesayrs
pushed a commit
to neuralmagic/vllm
that referenced
this issue
Aug 17, 2024
Alvant
pushed a commit
to compressa-ai/vllm
that referenced
this issue
Oct 26, 2024
…t#6540 (vllm-project#6652) Signed-off-by: Alvant <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
🚀 The feature, motivation and pitch
in entrypoints.openai.serving_completions.py I see OpenAIServingCompletion holds the method completion_stream_generator that can return usage info for each chunk by using StreamOptions
continuous_usage_stats
.line 297.
Somehow, this is not the case in entrypoints.openai.serving_chat.py. I propose to add this feature for OpenAIServingChat.
What do you think ?
Alternatives
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: