Use Llama RMSNorm for Gemma #2974

WoosukKwon · 2024-02-22T01:01:35Z

Gemma's RMSNorm is only slightly different from Llama's RMSNorm. Thus, we can use the existing custom op for it. This optimization leads to ~10% latency reduction.

vllm/model_executor/models/gemma.py

Yard1

Nice!

vllm/model_executor/models/gemma.py

Yard1 · 2024-02-22T01:22:01Z

Before we merge, let's make sure it doesn't change the outputs (maybe we could add a test like we have for other models, using transformers as a reference).

WoosukKwon · 2024-02-22T01:45:49Z

For a note, using the custom op brings a slight numerical difference in handling the residual connection.

While the original implementation uses the current dtype (f16 of bf16) in hidden_states + residual, the fused RMSNorm op upcasts both to FP32 before addition:

vllm/vllm/model_executor/layers/layernorm.py

Line 35 in 8fbd84b

x = x + residual.to(torch.float32)

WoosukKwon added 2 commits February 22, 2024 00:59

Use RMSNorm for Gemma

ca6e482

Minor

ef29c95

WoosukKwon requested a review from Yard1 February 22, 2024 01:03

Yard1 reviewed Feb 22, 2024

View reviewed changes

vllm/model_executor/models/gemma.py Show resolved Hide resolved

Yard1 approved these changes Feb 22, 2024

View reviewed changes

Yard1 reviewed Feb 22, 2024

View reviewed changes

vllm/model_executor/models/gemma.py Outdated Show resolved Hide resolved

Address comment

1986eb9

WoosukKwon merged commit 95529e3 into main Feb 22, 2024
4 of 6 checks passed

WoosukKwon deleted the optimize-gemma branch February 22, 2024 02:28

xjpang pushed a commit to xjpang/vllm that referenced this pull request Mar 4, 2024

Use Llama RMSNorm custom op for Gemma (vllm-project#2974)

e8f9cc5

Temirulan pushed a commit to Temirulan/vllm-whisper that referenced this pull request Sep 6, 2024

Use Llama RMSNorm custom op for Gemma (vllm-project#2974)

0521e69

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Llama RMSNorm for Gemma #2974

Use Llama RMSNorm for Gemma #2974

WoosukKwon commented Feb 22, 2024 •

edited

Loading

Yard1 left a comment

Yard1 commented Feb 22, 2024 •

edited

Loading

WoosukKwon commented Feb 22, 2024

Use Llama RMSNorm for Gemma #2974

Use Llama RMSNorm for Gemma #2974

Conversation

WoosukKwon commented Feb 22, 2024 • edited Loading

Yard1 left a comment

Choose a reason for hiding this comment

Yard1 commented Feb 22, 2024 • edited Loading

WoosukKwon commented Feb 22, 2024

WoosukKwon commented Feb 22, 2024 •

edited

Loading

Yard1 commented Feb 22, 2024 •

edited

Loading