[Feature] DeepSeek-Coder-V2-Instruct-FP8 on 8xA100 #989

halexan · 2024-08-08T08:43:21Z

Motivation

VLLM has announced their support for running llama3.1-405b-fp8 on 8xA100. This is the blog

Does sglang support running DeepSeek-Coder-V2-Instruct-FP8 on 8xA100?

Related resources

No response

Ying1123 · 2024-08-08T11:10:31Z

llama-405b-fp8 is supported in sglang

sglang/README.md

Lines 199 to 200 in 228cf47

    
           ## Run 405B (fp8) on a single node 
        
           python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3.1-405B-Instruct-FP8 --tp 8

.

DeepSeek-Coder-V2-Instruct-FP8 should be supported as well. Could you try it and let us know if there are any problems?

Xu-Chen · 2024-08-08T12:27:37Z

VLLM don't support MoE FP8 models on Ampere. This is because vLLM uses Triton for its FusedMoE kernel, which doesn't support the FP8 Marlin mixed-precision gemm. See https://huggingface.co/neuralmagic/DeepSeek-Coder-V2-Instruct-FP8/discussions/1

Running DeepSeek-Coder-V2-Lite-Instruct-FP8, there is an error

  File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/model_loader/loader.py", line 327, in load_model
    model.load_weights(
  File "/root/.local/lib/python3.10/site-packages/sglang/srt/models/deepseek_v2.py", line 694, in load_weights
    weight_loader(
  File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/layers/fused_moe/layer.py", line 205, in weight_loader
    raise ValueError(
ValueError: input_scales of w1 and w3 of a layer must be equal. But got 0.06986899673938751 vs. 0.09467455744743347

halexan · 2024-08-10T00:40:17Z

VLLM don't support MoE FP8 models on Ampere. This is because vLLM uses Triton for its FusedMoE kernel, which doesn't support the FP8 Marlin mixed-precision gemm. See https://huggingface.co/neuralmagic/DeepSeek-Coder-V2-Instruct-FP8/discussions/1

Running DeepSeek-Coder-V2-Lite-Instruct-FP8, there is an error
  File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/model_loader/loader.py", line 327, in load_model
    model.load_weights(
  File "/root/.local/lib/python3.10/site-packages/sglang/srt/models/deepseek_v2.py", line 694, in load_weights
    weight_loader(
  File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/layers/fused_moe/layer.py", line 205, in weight_loader
    raise ValueError(
ValueError: input_scales of w1 and w3 of a layer must be equal. But got 0.06986899673938751 vs. 0.09467455744743347

What is your vllm version？

Xu-Chen · 2024-08-10T05:30:37Z

VLLM don't support MoE FP8 models on Ampere. This is because vLLM uses Triton for its FusedMoE kernel, which doesn't support the FP8 Marlin mixed-precision gemm. See https://huggingface.co/neuralmagic/DeepSeek-Coder-V2-Instruct-FP8/discussions/1

Running DeepSeek-Coder-V2-Lite-Instruct-FP8, there is an error
  File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/model_loader/loader.py", line 327, in load_model
    model.load_weights(
  File "/root/.local/lib/python3.10/site-packages/sglang/srt/models/deepseek_v2.py", line 694, in load_weights
    weight_loader(
  File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/layers/fused_moe/layer.py", line 205, in weight_loader
    raise ValueError(
ValueError: input_scales of w1 and w3 of a layer must be equal. But got 0.06986899673938751 vs. 0.09467455744743347
What is your vllm version？

0.5.4

KylinMountain · 2024-08-12T07:38:52Z

@Xu-Chen So can we use sglang to run deepseek v2 232B? Thanks

halexan · 2024-08-12T09:00:00Z

@Xu-Chen So can we use sglang to run deepseek v2 232B? Thanks

Yes, you can, without quantization.

merrymercy · 2024-09-22T12:58:29Z

all of them should be supported in v0.3.1.post3. see also blog https://lmsys.org/blog/2024-09-04-sglang-v0-3/

Xu-Chen · 2024-09-22T13:51:39Z

all of them should be supported in v0.3.1.post3. see also blog https://lmsys.org/blog/2024-09-04-sglang-v0-3/

Have you test on A100?

merrymercy · 2024-09-22T13:54:19Z

a100 does not support fp8 natively, so i guess it is not supported.

halexan mentioned this issue Aug 9, 2024

[Feature]: DeepSeek-Coder-V2-Instruct-FP8 on 8xA100 vllm-project/vllm#7322

Closed

merrymercy closed this as completed Sep 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] DeepSeek-Coder-V2-Instruct-FP8 on 8xA100 #989

[Feature] DeepSeek-Coder-V2-Instruct-FP8 on 8xA100 #989

halexan commented Aug 8, 2024

Ying1123 commented Aug 8, 2024 •

edited

Loading

Xu-Chen commented Aug 8, 2024 •

edited

Loading

halexan commented Aug 10, 2024

Xu-Chen commented Aug 10, 2024

KylinMountain commented Aug 12, 2024

halexan commented Aug 12, 2024 •

edited

Loading

merrymercy commented Sep 22, 2024

Xu-Chen commented Sep 22, 2024

merrymercy commented Sep 22, 2024

[Feature] DeepSeek-Coder-V2-Instruct-FP8 on 8xA100 #989

[Feature] DeepSeek-Coder-V2-Instruct-FP8 on 8xA100 #989

Comments

halexan commented Aug 8, 2024

Motivation

Related resources

Ying1123 commented Aug 8, 2024 • edited Loading

Xu-Chen commented Aug 8, 2024 • edited Loading

halexan commented Aug 10, 2024

Xu-Chen commented Aug 10, 2024

KylinMountain commented Aug 12, 2024

halexan commented Aug 12, 2024 • edited Loading

merrymercy commented Sep 22, 2024

Xu-Chen commented Sep 22, 2024

merrymercy commented Sep 22, 2024

Ying1123 commented Aug 8, 2024 •

edited

Loading

Xu-Chen commented Aug 8, 2024 •

edited

Loading

halexan commented Aug 12, 2024 •

edited

Loading