[Feature]: return Usage info for streaming request for each chunk in ChatCompletion #6540

yecohn · 2024-07-18T14:12:15Z

🚀 The feature, motivation and pitch

in entrypoints.openai.serving_completions.py I see OpenAIServingCompletion holds the method completion_stream_generator that can return usage info for each chunk by using StreamOptions continuous_usage_stats.
line 297.

``if (request.stream_options
and request.stream_options.include_usage):
if (request.stream_options.continuous_usage_stats
or output.finish_reason is not None):
prompt_tokens = len(res.prompt_token_ids)
completion_tokens = len(output.token_ids)
usage = UsageInfo(
prompt_tokens=prompt_tokens,
completion_tokens=completion_tokens,
total_tokens=prompt_tokens + completion_tokens,
)

Somehow, this is not the case in entrypoints.openai.serving_chat.py. I propose to add this feature for OpenAIServingChat.

What do you think ?

Alternatives

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

tdoublep · 2024-07-19T19:45:12Z

@yecohn I guess it makes sense. We added it to the completions API because that is the API we are primarily using for benchmarking. However, if it's helpful to have it in the chat API too, I don't see any reason not to add it. Shouldn't be a big change.

yecohn · 2024-07-20T13:32:31Z

Perfect then I'll open a PR

…t#6540 (vllm-project#6652)

…t#6540 (vllm-project#6652) Signed-off-by: Alvant <[email protected]>

yecohn added the feature request label Jul 18, 2024

yecohn mentioned this issue Jul 22, 2024

[Frontend] Add Usage data in each chunk for chat_serving. #6540 #6652

Merged

simon-mo pushed a commit that referenced this issue Jul 23, 2024

[Frontend] Add Usage data in each chunk for chat_serving. #6540 (#6652)

58f5303

xjpang pushed a commit to xjpang/vllm that referenced this issue Jul 24, 2024

[Frontend] Add Usage data in each chunk for chat_serving. vllm-projec…

5b654e2

…t#6540 (vllm-project#6652)

xjpang pushed a commit to xjpang/vllm that referenced this issue Jul 24, 2024

[Frontend] Add Usage data in each chunk for chat_serving. vllm-projec…

960491b

…t#6540 (vllm-project#6652)

fialhocoelho pushed a commit to opendatahub-io/vllm that referenced this issue Jul 24, 2024

[Frontend] Add Usage data in each chunk for chat_serving. vllm-projec…

63d5038

…t#6540 (vllm-project#6652)

hmellor closed this as completed Jul 26, 2024

dtrifiro mentioned this issue Aug 5, 2024

Sync with [email protected] opendatahub-io/vllm#120

Closed

cduk pushed a commit to cduk/vllm-pascal that referenced this issue Aug 6, 2024

[Frontend] Add Usage data in each chunk for chat_serving. vllm-projec…

bcc2c53

…t#6540 (vllm-project#6652)

robertgshaw2-neuralmagic mentioned this issue Aug 7, 2024

[Bug]: stream_options.include_usage being retrieved on every chunk #7262

Closed

kylesayrs pushed a commit to neuralmagic/vllm that referenced this issue Aug 17, 2024

[Frontend] Add Usage data in each chunk for chat_serving. vllm-projec…

cec589c

…t#6540 (vllm-project#6652)

Alvant pushed a commit to compressa-ai/vllm that referenced this issue Oct 26, 2024

[Frontend] Add Usage data in each chunk for chat_serving. vllm-projec…

0354389

…t#6540 (vllm-project#6652) Signed-off-by: Alvant <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: return Usage info for streaming request for each chunk in ChatCompletion #6540

[Feature]: return Usage info for streaming request for each chunk in ChatCompletion #6540

yecohn commented Jul 18, 2024 •

edited

Loading

tdoublep commented Jul 19, 2024

yecohn commented Jul 20, 2024

[Feature]: return Usage info for streaming request for each chunk in ChatCompletion #6540

[Feature]: return Usage info for streaming request for each chunk in ChatCompletion #6540

Comments

yecohn commented Jul 18, 2024 • edited Loading

🚀 The feature, motivation and pitch

Alternatives

Additional context

tdoublep commented Jul 19, 2024

yecohn commented Jul 20, 2024

yecohn commented Jul 18, 2024 •

edited

Loading