Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REQUEST]Does deepspeed support dynamic batch during inference? #3455

Closed
colynhn opened this issue May 5, 2023 · 5 comments
Closed

[REQUEST]Does deepspeed support dynamic batch during inference? #3455

colynhn opened this issue May 5, 2023 · 5 comments
Labels
enhancement New feature or request

Comments

@colynhn
Copy link

colynhn commented May 5, 2023

No description provided.

@colynhn colynhn added the enhancement New feature or request label May 5, 2023
@colynhn colynhn changed the title [REQUEST]Does [REQUEST]Does deepspeed support dynamic batch during inference? May 5, 2023
@colynhn
Copy link
Author

colynhn commented May 5, 2023

model size: 6B
gpu mem using: 24g
gpu type: A100 40G
latency one request: 3s

When I use deepspeed for single-card inference, the qps does not exceed 2, and the utilization rate of gpu is about 52%. When will deepspeed support dynamic batch size and improve the utilization rate of gpu?

@trianxy
Copy link

trianxy commented May 5, 2023

You can check NVIDIA Triton which supports dynamic batching and other stuff to increase your GPU utilisation.

Not sure where/how DeepSpeed could support such stuff, without extending its scope considerably.

@stan-kirdey
Copy link

stan-kirdey commented May 8, 2023

You can look at AWS's DeepJavaLibrary Serving - https://github.com/deepjavalibrary/djl-serving

It uses netty/java to dispatch the requests to inference, and can be configured to batch the requests dynamically based on a time window.
DJL Serving is battle tested with DeepSpeed / PyTorch and high load inference environments.

https://djl.ai/ - umbrella project

These are some great tutorials on how to use djl serving / deepspeed for inference - https://github.com/aws/amazon-sagemaker-examples/tree/main/inference/generativeai/llm-workshop

@colynhn colynhn closed this as completed May 24, 2023
@colynhn
Copy link
Author

colynhn commented May 24, 2023

thx.

@bm-synth
Copy link
Contributor

bm-synth commented Mar 11, 2024

@colynhn @trianxy @stan-kirdey if not too late: I am building dynamic batch sizes (and corresponding LR scaling) on deepspeed in PR 5237, as part of the data analysis module. Stay tuned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants