Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Misc] Make benchmarks use EngineArgs #9529

Merged
merged 1 commit into from
Oct 22, 2024

Conversation

JArnoldAMD
Copy link
Contributor

Update the benchmark scripts to directly use the CLI arguments provided by EngineArgs instead of duplicating a subset of these arguments in each benchmark script.

Currently the CLI arguments are duplicated, forcing changes to be made in multiple locations and resulting in some useful vLLM options not being exposed in the scripts.  For example, the --num-scheduler-steps option is currently available in benchmark_throughput.py but not benchmark_latency.py, making it difficult to understand the latency impacts of this option.  As another example, the benchmark_prioritization.py script appears to be broken currently because it was not updated to expose the --scheduling-policy option which is required for enabling priority.

These maintenance challenges are eliminated by using EngineArgs.add_cli_args to add support for all engine arguments directly, and then passing these options to the engine initialization.

One minor change in behavior is that when benchmark_throughput.py runs in async mode it no longer includes hard-coded settings for worker_use_ray=False (which is deprecated anyway) and disable_log_requests=True (but the user now has the option to pass --disable-log-requests on the command-line).

Similarly, benchmark_prefix_caching no longer has hard-coded values for trust_remote_code=True and enforce_eager=True, but these may now be passed on the command-line.

Update the benchmark scripts to directly use the CLI arguments
provided by EngineArgs instead of duplicating a subset of these
arguments in each benchmark script.

Currently the CLI arguments are duplicated, forcing changes to be
made in multiple locations and resulting in some useful vLLM options
not being exposed in the scripts.  For example, the
--num-scheduler-steps option is currently available in
benchmark_throughput.py but not benchmark_latency.py, making it
difficult to understand the latency impacts of this option.  As
another example, the benchmark_prioritization.py script appears to
be broken currently because it was not updated to expose the
--scheduling-policy option which is required for enabling priority.

These maintenance challenges are eliminated by using
EngineArgs.add_cli_args to add support for all engine arguments
directly, and then passing these options to the engine
initialization.

One minor change in behavior is that when benchmark_throughput.py
runs in async mode it no longer includes hard-coded settings for
worker_use_ray=False (which is deprecated anyway) and
disable_log_requests=True (but the user now has the option to pass
--disable-log-requests on the command-line).

Similarly, benchmark_prefix_caching no longer has hard-coded values
for trust_remote_code=True and enforce_eager=True, but these may
now be passed on the command-line.
Copy link

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

  • Add ready label to the PR
  • Enable auto-merge.

🚀

Copy link
Collaborator

@comaniac comaniac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. It's pretty clean!

@@ -190,9 +182,7 @@ def main(args):
default='128:256',
help='Range of input lengths for sampling prompts,'
'specified as "min:max" (e.g., "128:256").')
parser.add_argument("--seed",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have "seed" in the engine arg as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@comaniac comaniac added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 19, 2024
@comaniac
Copy link
Collaborator

Also cc @KuntaiDu

Copy link
Collaborator

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh I've meant to get to this refactor for a while, thank you!

@simon-mo simon-mo merged commit cb6fdaa into vllm-project:main Oct 22, 2024
39 of 41 checks passed
charlifu pushed a commit to charlifu/vllm that referenced this pull request Oct 23, 2024
Alvant pushed a commit to compressa-ai/vllm that referenced this pull request Oct 26, 2024
MErkinSag pushed a commit to MErkinSag/vllm that referenced this pull request Oct 26, 2024
garg-amit pushed a commit to garg-amit/vllm that referenced this pull request Oct 28, 2024
FerdinandZhong pushed a commit to FerdinandZhong/vllm that referenced this pull request Oct 29, 2024
NickLucche pushed a commit to NickLucche/vllm that referenced this pull request Oct 31, 2024
NickLucche pushed a commit to NickLucche/vllm that referenced this pull request Oct 31, 2024
sumitd2 pushed a commit to sumitd2/vllm that referenced this pull request Nov 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants