-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX: Qwen1.5-GPTQ-Int4 inference error #11432
Conversation
# Skip merge_qkv if quant_method is 'gptq' | ||
should_apply_merge_qkv = ( | ||
not hasattr(model.config, "quantization_config") or | ||
not hasattr(model.config.quantization_config, "quant_method") or | ||
model.config.quantization_config.quant_method != "gptq" | ||
) | ||
if should_apply_merge_qkv: | ||
model.apply(merge_qkv) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is gptq loaded as q4_1? If so, why can't we use merge_qkv
? @qiuxin2012
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a new merge_qkv for gptq here. Just convert q k v to one LowBitLinear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a new merge_qkv for gptq here. Just convert q k v to one LowBitLinear.
It's different from q4_1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gptq is loaded as q4_1. However, gptq's weights have already been quantized. If we want to use merge_qkv, we need to dequantize from the gptq format to the normal format first, perform merge_qkv, and then quantize back into the LowBitLinear.
I will try this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gptq is loaded as q4_1. However, gptq's weights have already been quantized. If we want to use merge_qkv, we need to dequantize from the gptq format to the normal format first, perform merge_qkv, and then quantize back into the LowBitLinear.
I will try this method.
Instead of dequantization, I think we can just rearrange the quantized qkv tensors into a combined one? Anyway I think we may fix it later in a separate PR if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll merge this PR first so users can start using it, and then I'll submit another PR to address further fixes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we probably cannot merge qkv if quantization_config.desc_act==True
.
* merge_qkv if quant_method is 'gptq' * fix python style checks * refactor * update GPU example
Description
1. Why the change?
Fix user issue: #11413
2. User API changes
No user api change.
3. Summary of the change
The weights of gptq have already been quantized. During the merge_qkv process in qwen2, we encounter an
AttributeError: 'QuantLinear' object has no attribute 'weight'. Did you mean: 'qweight'?
This PR skips merge_qkv for the qwen2 gptq model as a quick fix. We will submit another PR later to thoroughly resolve this issue.4. How to test?
manually tested it
5. New dependencies
No