-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Misc] Make benchmarks use EngineArgs #9529
[Misc] Make benchmarks use EngineArgs #9529
Conversation
Update the benchmark scripts to directly use the CLI arguments provided by EngineArgs instead of duplicating a subset of these arguments in each benchmark script. Currently the CLI arguments are duplicated, forcing changes to be made in multiple locations and resulting in some useful vLLM options not being exposed in the scripts. For example, the --num-scheduler-steps option is currently available in benchmark_throughput.py but not benchmark_latency.py, making it difficult to understand the latency impacts of this option. As another example, the benchmark_prioritization.py script appears to be broken currently because it was not updated to expose the --scheduling-policy option which is required for enabling priority. These maintenance challenges are eliminated by using EngineArgs.add_cli_args to add support for all engine arguments directly, and then passing these options to the engine initialization. One minor change in behavior is that when benchmark_throughput.py runs in async mode it no longer includes hard-coded settings for worker_use_ray=False (which is deprecated anyway) and disable_log_requests=True (but the user now has the option to pass --disable-log-requests on the command-line). Similarly, benchmark_prefix_caching no longer has hard-coded values for trust_remote_code=True and enforce_eager=True, but these may now be passed on the command-line.
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. It's pretty clean!
@@ -190,9 +182,7 @@ def main(args): | |||
default='128:256', | |||
help='Range of input lengths for sampling prompts,' | |||
'specified as "min:max" (e.g., "128:256").') | |||
parser.add_argument("--seed", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have "seed" in the engine arg as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also cc @KuntaiDu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh I've meant to get to this refactor for a while, thank you!
Signed-off-by: charlifu <[email protected]>
Signed-off-by: Alvant <[email protected]>
Signed-off-by: Erkin Sagiroglu <[email protected]>
Signed-off-by: Amit Garg <[email protected]>
Signed-off-by: qishuai <[email protected]>
Signed-off-by: NickLucche <[email protected]>
Signed-off-by: NickLucche <[email protected]>
Signed-off-by: Sumit Dubey <[email protected]>
Update the benchmark scripts to directly use the CLI arguments provided by EngineArgs instead of duplicating a subset of these arguments in each benchmark script.
Currently the CLI arguments are duplicated, forcing changes to be made in multiple locations and resulting in some useful vLLM options not being exposed in the scripts. For example, the --num-scheduler-steps option is currently available in benchmark_throughput.py but not benchmark_latency.py, making it difficult to understand the latency impacts of this option. As another example, the benchmark_prioritization.py script appears to be broken currently because it was not updated to expose the --scheduling-policy option which is required for enabling priority.
These maintenance challenges are eliminated by using EngineArgs.add_cli_args to add support for all engine arguments directly, and then passing these options to the engine initialization.
One minor change in behavior is that when benchmark_throughput.py runs in async mode it no longer includes hard-coded settings for worker_use_ray=False (which is deprecated anyway) and disable_log_requests=True (but the user now has the option to pass --disable-log-requests on the command-line).
Similarly, benchmark_prefix_caching no longer has hard-coded values for trust_remote_code=True and enforce_eager=True, but these may now be passed on the command-line.