[REQUEST]Does deepspeed support dynamic batch during inference? #3455

colynhn · 2023-05-05T10:19:03Z

No description provided.

colynhn · 2023-05-05T10:23:46Z

model size: 6B
gpu mem using: 24g
gpu type: A100 40G
latency one request: 3s

When I use deepspeed for single-card inference, the qps does not exceed 2, and the utilization rate of gpu is about 52%. When will deepspeed support dynamic batch size and improve the utilization rate of gpu?

trianxy · 2023-05-05T10:57:42Z

You can check NVIDIA Triton which supports dynamic batching and other stuff to increase your GPU utilisation.

Not sure where/how DeepSpeed could support such stuff, without extending its scope considerably.

stan-kirdey · 2023-05-08T20:46:23Z

You can look at AWS's DeepJavaLibrary Serving - https://github.com/deepjavalibrary/djl-serving

It uses netty/java to dispatch the requests to inference, and can be configured to batch the requests dynamically based on a time window.
DJL Serving is battle tested with DeepSpeed / PyTorch and high load inference environments.

https://djl.ai/ - umbrella project

These are some great tutorials on how to use djl serving / deepspeed for inference - https://github.com/aws/amazon-sagemaker-examples/tree/main/inference/generativeai/llm-workshop

colynhn · 2023-05-24T09:43:37Z

thx.

bm-synth · 2024-03-11T12:26:10Z

@colynhn @trianxy @stan-kirdey if not too late: I am building dynamic batch sizes (and corresponding LR scaling) on deepspeed in PR 5237, as part of the data analysis module. Stay tuned.

colynhn added the enhancement New feature or request label May 5, 2023

colynhn changed the title ~~[REQUEST]Does~~ [REQUEST]Does deepspeed support dynamic batch during inference? May 5, 2023

colynhn closed this as completed May 24, 2023

bm-synth mentioned this issue Mar 11, 2024

Variable batch size and LR scheduler #5237

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REQUEST]Does deepspeed support dynamic batch during inference? #3455

[REQUEST]Does deepspeed support dynamic batch during inference? #3455

colynhn commented May 5, 2023

colynhn commented May 5, 2023

trianxy commented May 5, 2023

stan-kirdey commented May 8, 2023 •

edited

Loading

colynhn commented May 24, 2023

bm-synth commented Mar 11, 2024 •

edited

Loading

[REQUEST]Does deepspeed support dynamic batch during inference? #3455

[REQUEST]Does deepspeed support dynamic batch during inference? #3455

Comments

colynhn commented May 5, 2023

colynhn commented May 5, 2023

trianxy commented May 5, 2023

stan-kirdey commented May 8, 2023 • edited Loading

colynhn commented May 24, 2023

bm-synth commented Mar 11, 2024 • edited Loading

stan-kirdey commented May 8, 2023 •

edited

Loading

bm-synth commented Mar 11, 2024 •

edited

Loading