-
Notifications
You must be signed in to change notification settings - Fork 515
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] DeepSeek-Coder-V2-Instruct-FP8 on 8xA100 #989
Comments
llama-405b-fp8 is supported in sglang Lines 199 to 200 in 228cf47
DeepSeek-Coder-V2-Instruct-FP8 should be supported as well. Could you try it and let us know if there are any problems? |
VLLM don't support MoE FP8 models on Ampere. This is because vLLM uses Triton for its FusedMoE kernel, which doesn't support the FP8 Marlin mixed-precision gemm. See https://huggingface.co/neuralmagic/DeepSeek-Coder-V2-Instruct-FP8/discussions/1 Running DeepSeek-Coder-V2-Lite-Instruct-FP8, there is an error
|
What is your vllm version? |
0.5.4 |
@Xu-Chen So can we use sglang to run deepseek v2 232B? Thanks |
Yes, you can, without quantization. |
all of them should be supported in v0.3.1.post3. see also blog https://lmsys.org/blog/2024-09-04-sglang-v0-3/ |
Have you test on A100? |
a100 does not support fp8 natively, so i guess it is not supported. |
Motivation
VLLM has announced their support for running llama3.1-405b-fp8 on 8xA100. This is the blog
Does sglang support running DeepSeek-Coder-V2-Instruct-FP8 on 8xA100?
Related resources
No response
The text was updated successfully, but these errors were encountered: