Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Frontend] Add Usage data in each chunk for chat_serving. #6540 #6652

Merged
merged 8 commits into from
Jul 23, 2024

Conversation

yecohn
Copy link
Contributor

@yecohn yecohn commented Jul 22, 2024

PR for issue #6540

I added a some logic to return usage data on each chunk when flag stream_options.continuous_usage_stats is set.

Copy link

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which consists a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of default ones by unblocking the steps in your fast-check build on Buildkite UI.

Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge).

To run full CI, you can do one of these:

  • Comment /ready on the PR
  • Add ready label to the PR
  • Enable auto-merge.

🚀

Copy link
Member

@tdoublep tdoublep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks generally good - I suggested some small simplifications to the logic. I haven't reviewed the new tests yet due to large diff.

Comment on lines 226 to 227
or res.outputs[i].finish_reason
is not None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this response contains only the role (e.g., no tokens have been generated), I don't think it is necessary to check the stopping criterion at this stage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, i'll change it

Comment on lines 223 to 239
if (request.stream_options
and request.stream_options.include_usage):
chunk.usage = None
if (request.stream_options.continuous_usage_stats
or res.outputs[i].finish_reason
is not None):
prompt_tokens = len(res.prompt_token_ids)
completion_tokens = 0
usage = UsageInfo(
prompt_tokens=prompt_tokens,
completion_tokens=completion_tokens,
total_tokens=prompt_tokens +
completion_tokens,
)
if request.stream_options.continuous_usage_stats:
chunk.usage = usage
else:
chunk.usage = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this could just be simplified to:

if (request.stream_options
        and request.stream_options.include_usage):
    if request.stream_options.continuous_usage_stats:
        prompt_tokens=len(res.prompt_token_ids)
        usage = UsageInfo(
            prompt_tokens=prompt_tokens,
            completion_tokens=0,
            total_tokens=prompt_tokens,
        )
    else:
        chunk=None

Comment on lines 270 to 289
if (request.stream_options.
continuous_usage_stats
or res.outputs[i].finish_reason
is not None):
prompt_tokens = len(
res.prompt_token_ids)
completion_tokens = len(
res.outputs[i].token_ids)
usage = UsageInfo(
prompt_tokens=prompt_tokens,
completion_tokens=completion_tokens,
total_tokens=prompt_tokens +
completion_tokens,
)
if (request.stream_options.
continuous_usage_stats):
chunk.usage = usage
else:
chunk.usage = None

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comments apply here as above: (a) I don't think we need to check the finish reason since we are just echoing the prompt and (b) we can simplify the if/else logic in the same way as above.

Comment on lines 348 to 361
if (request.stream_options.continuous_usage_stats):
prompt_tokens = len(res.prompt_token_ids)
completion_tokens = len(output.token_ids)
usage = UsageInfo(
prompt_tokens=prompt_tokens,
completion_tokens=completion_tokens,
total_tokens=prompt_tokens +
completion_tokens,
)
if request.stream_options.continuous_usage_stats:
chunk.usage = usage
else:
chunk.usage = None

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we just have:

if request.stream_options.continuous_usage_stats:
    prompt_tokens = len(res.prompt_token_ids)
    completion_tokens = len(output.token_ids)
    usage = UsageInfo(
        prompt_tokens=prompt_tokens,
        completion_tokens=completion_tokens,
        total_tokens=prompt_tokens +
        completion_tokens,
    )
    chunk.usage = usage
else:
    chunk.usage = None

Comment on lines 381 to 393
if (request.stream_options.continuous_usage_stats):
prompt_tokens = len(res.prompt_token_ids)
completion_tokens = len(output.token_ids)
usage = UsageInfo(
prompt_tokens=prompt_tokens,
completion_tokens=completion_tokens,
total_tokens=prompt_tokens +
completion_tokens,
)
if request.stream_options.continuous_usage_stats:
chunk.usage = usage
else:
chunk.usage = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point

Copy link
Member

@tdoublep tdoublep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tdoublep
Copy link
Member

/ready

@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Jul 23, 2024
@tdoublep
Copy link
Member

The CI failure does not look related to these changes.

think this PR looks good cc @simon-mo

@simon-mo simon-mo merged commit 58f5303 into vllm-project:main Jul 23, 2024
70 of 84 checks passed
@tdoublep
Copy link
Member

Thanks for contributing this! @yecohn

@yecohn
Copy link
Contributor Author

yecohn commented Jul 25, 2024

Great news 🥳

@robertgshaw2-neuralmagic
Copy link
Collaborator

This change does not conform for the OAI spec:

Alvant pushed a commit to compressa-ai/vllm that referenced this pull request Oct 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants