[Misc] benchmark: Add option to set max concurrency #9390

russellb · 2024-10-15T20:55:41Z

Add a new flag to benchmark_serving.py that allows you to specify
the maximum number of concurrent requests. If not specified, it
defaults to the current behavior of unbounded concurrency.

Closes #3127

Signed-off-by: Russell Bryant [email protected]

github-actions · 2024-10-15T20:55:56Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

comaniac

In general LGTM. One question I have is how should this be used together with request rate (QPS)? Are we expecting only one of them being set at a time?

benchmarks/benchmark_serving.py

KuntaiDu · 2024-10-16T03:05:50Z

In general LGTM. One question I have is how should this be used together with request rate (QPS)? Are we expecting only one of them being set at a time?

For me, maximum concurrency is mainly used to avoid request buffer overflow at inference engine side (IIRC if we send >1000 requests to vLLM, TGI or other inference engines, there will be request failure due to request buffer overflow). This should always be set when benchmarking many request under very high QPS to prevent request failure.

KuntaiDu · 2024-10-16T03:07:28Z

benchmarks/benchmark_serving.py

+    parser.add_argument("--max-concurrency",
+                        type=int,
+                        default=None,
+                        help="Maximum number of concurrent requests.")


Would be great if you can explain more on why setting --max-concurrency given that there is already --request-rate available. The main reason for me is to avoid request failure due to sending too many request to the inference engine, but I'm not sure.

To give some context - I think the main motivation here is that sometimes inference server are setup with max concurrency as a metric for autoscaling.

Currently on vLLM, we don't have a way to "reject" requests depending on the server load, so very often users set up this concurrency control at a higher level, thus it would be great if we can simulate this kind of setup in benchmark framework as well.

I think it'd be great to add more information for this argument based on my reply to Cody's comment above if that makes sense.

I expanded the help text. Let me know what you think!

Add a new flag to `benchmark_serving.py` that allows you to specify the maximum number of concurrent requests. If not specified, it defaults to the current behavior of unbounded concurrency. Signed-off-by: Russell Bryant <[email protected]>

comaniac · 2024-10-16T19:45:52Z

The message looks good to me. In addition, please also update other parts such as logging and dumped file name. Please search request_rate in this file and make sure max_concurrency is reflected to these places as well.

Signed-off-by: Russell Bryant <[email protected]>

russellb · 2024-10-16T20:26:27Z

The message looks good to me. In addition, please also update other parts such as logging and dumped file name. Please search request_rate in this file and make sure max_concurrency is reflected to these places as well.

I added some logging and added max_concurrency to the output json data.

I wasn't sure about the default filename, though. request-rate always has a value (defaults to inf), while max-concurrency defaults to None. At some point, it seems reasonable to just fall back to the ability to set custom filenames. If you feel strongly about this though, I'll update this to conditionally include the concurrency setting when it is set.

comaniac · 2024-10-16T20:32:54Z

The message looks good to me. In addition, please also update other parts such as logging and dumped file name. Please search request_rate in this file and make sure max_concurrency is reflected to these places as well.

I added some logging and added max_concurrency to the output json data.

I wasn't sure about the default filename, though. request-rate always has a value (defaults to inf), while max-concurrency defaults to None. At some point, it seems reasonable to just fall back to the ability to set custom filenames. If you feel strongly about this though, I'll update this to conditionally include the concurrency setting when it is set.

Just make it simple like the following?

max_concurrency_str = f"-concurrency{args.max_concurrency}" if args.max_concurrency is not None else ""
file_name = f"{backend}-{args.request_rate}qps{max_concurrency_str}-{base_model_id}-{current_dt}.json"  #noqa

Signed-off-by: Russell Bryant <[email protected]>

russellb · 2024-10-16T21:24:05Z

Just make it simple like the following?

sure, that works. done!

comaniac

LGTM

russellb · 2024-10-17T15:11:46Z

CI is currently failing on the use of contextlib.nullcontext(). It looks like it worked for me because I was using a newer version of Python, while it's failing with 3.9 in CI.

asyncio support was added to nullcontext in 3.10: python/cpython#85715

Signed-off-by: Russell Bryant <[email protected]>

KuntaiDu

LGTM

Signed-off-by: Russell Bryant <[email protected]> Signed-off-by: charlifu <[email protected]>

Signed-off-by: Russell Bryant <[email protected]> Signed-off-by: Vinay Damodaran <[email protected]>

Signed-off-by: Russell Bryant <[email protected]> Signed-off-by: Alvant <[email protected]>

Signed-off-by: Russell Bryant <[email protected]> Signed-off-by: Amit Garg <[email protected]>

Signed-off-by: Russell Bryant <[email protected]> Signed-off-by: qishuai <[email protected]>

Signed-off-by: Russell Bryant <[email protected]> Signed-off-by: Sumit Dubey <[email protected]>

comaniac reviewed Oct 16, 2024

View reviewed changes

benchmarks/benchmark_serving.py Outdated Show resolved Hide resolved

KuntaiDu requested changes Oct 16, 2024

View reviewed changes

russellb force-pushed the issue-3127 branch from 8e413ef to 4834bbb Compare October 16, 2024 19:38

russellb added 2 commits October 16, 2024 20:14

benchmark: Add max_concurrency to logging and result data

520b43f

Signed-off-by: Russell Bryant <[email protected]>

benchmark: Simplify by using nullcontext

6bb7eff

Signed-off-by: Russell Bryant <[email protected]>

benchmarking: Add concurrency to default filename

6d8fb55

Signed-off-by: Russell Bryant <[email protected]>

comaniac approved these changes Oct 16, 2024

View reviewed changes

comaniac added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 16, 2024

benchmarks: fix compat with Python 3.9

a37c18f

Signed-off-by: Russell Bryant <[email protected]>

russellb requested a review from KuntaiDu October 17, 2024 15:16

KuntaiDu approved these changes Oct 18, 2024

View reviewed changes

comaniac merged commit 7dbe738 into vllm-project:main Oct 18, 2024
38 checks passed

charlifu pushed a commit to charlifu/vllm that referenced this pull request Oct 23, 2024

[Misc] benchmark: Add option to set max concurrency (vllm-project#9390)

38efe54

Signed-off-by: Russell Bryant <[email protected]> Signed-off-by: charlifu <[email protected]>

vrdn-23 pushed a commit to vrdn-23/vllm that referenced this pull request Oct 23, 2024

[Misc] benchmark: Add option to set max concurrency (vllm-project#9390)

4431dbd

Signed-off-by: Russell Bryant <[email protected]> Signed-off-by: Vinay Damodaran <[email protected]>

Alvant pushed a commit to compressa-ai/vllm that referenced this pull request Oct 26, 2024

[Misc] benchmark: Add option to set max concurrency (vllm-project#9390)

689ba4d

Signed-off-by: Russell Bryant <[email protected]> Signed-off-by: Alvant <[email protected]>

garg-amit pushed a commit to garg-amit/vllm that referenced this pull request Oct 28, 2024

[Misc] benchmark: Add option to set max concurrency (vllm-project#9390)

b6b3327

Signed-off-by: Russell Bryant <[email protected]> Signed-off-by: Amit Garg <[email protected]>

FerdinandZhong pushed a commit to FerdinandZhong/vllm that referenced this pull request Oct 29, 2024

[Misc] benchmark: Add option to set max concurrency (vllm-project#9390)

5cd967c

Signed-off-by: Russell Bryant <[email protected]> Signed-off-by: qishuai <[email protected]>

sumitd2 pushed a commit to sumitd2/vllm that referenced this pull request Nov 14, 2024

[Misc] benchmark: Add option to set max concurrency (vllm-project#9390)

e90aa42

Signed-off-by: Russell Bryant <[email protected]> Signed-off-by: Sumit Dubey <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Misc] benchmark: Add option to set max concurrency #9390

[Misc] benchmark: Add option to set max concurrency #9390

russellb commented Oct 15, 2024 •

edited

Loading

github-actions bot commented Oct 15, 2024

comaniac left a comment

KuntaiDu commented Oct 16, 2024

KuntaiDu Oct 16, 2024

ywang96 Oct 16, 2024 •

edited

Loading

ywang96 Oct 16, 2024

russellb Oct 16, 2024

comaniac commented Oct 16, 2024

russellb commented Oct 16, 2024

comaniac commented Oct 16, 2024

russellb commented Oct 16, 2024

comaniac left a comment

russellb commented Oct 17, 2024

KuntaiDu left a comment

[Misc] benchmark: Add option to set max concurrency #9390

[Misc] benchmark: Add option to set max concurrency #9390

Conversation

russellb commented Oct 15, 2024 • edited Loading

github-actions bot commented Oct 15, 2024

comaniac left a comment

Choose a reason for hiding this comment

KuntaiDu commented Oct 16, 2024

KuntaiDu Oct 16, 2024

Choose a reason for hiding this comment

ywang96 Oct 16, 2024 • edited Loading

Choose a reason for hiding this comment

ywang96 Oct 16, 2024

Choose a reason for hiding this comment

russellb Oct 16, 2024

Choose a reason for hiding this comment

comaniac commented Oct 16, 2024

russellb commented Oct 16, 2024

comaniac commented Oct 16, 2024

russellb commented Oct 16, 2024

comaniac left a comment

Choose a reason for hiding this comment

russellb commented Oct 17, 2024

KuntaiDu left a comment

Choose a reason for hiding this comment

russellb commented Oct 15, 2024 •

edited

Loading

ywang96 Oct 16, 2024 •

edited

Loading