Skip to content

Commit

Permalink
[Bugfix] Fix dynamic FP8 quantization for Mixtral (vllm-project#4793)
Browse files Browse the repository at this point in the history
  • Loading branch information
pcmoritz authored May 13, 2024
1 parent a6de2a3 commit 3733fc7
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion vllm/model_executor/models/mixtral.py
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ def __init__(
params_dtype=self.params_dtype,
quant_config=None)

if self.use_fp8:
if self.use_fp8 and self.quant_config.is_checkpoint_fp8_serialized:
params_dtype = torch.float8_e4m3fn

self.w13_weight = nn.Parameter(
Expand Down

0 comments on commit 3733fc7

Please sign in to comment.