Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX: Qwen1.5-GPTQ-Int4 inference error #11432

Merged
merged 4 commits into from
Jun 26, 2024

Conversation

liu-shaojun
Copy link
Contributor

@liu-shaojun liu-shaojun commented Jun 26, 2024

Description

1. Why the change?

Fix user issue: #11413

2. User API changes

No user api change.

3. Summary of the change

The weights of gptq have already been quantized. During the merge_qkv process in qwen2, we encounter an AttributeError: 'QuantLinear' object has no attribute 'weight'. Did you mean: 'qweight'? This PR skips merge_qkv for the qwen2 gptq model as a quick fix. We will submit another PR later to thoroughly resolve this issue.

4. How to test?

manually tested it

5. New dependencies

No

Comment on lines 737 to 744
# Skip merge_qkv if quant_method is 'gptq'
should_apply_merge_qkv = (
not hasattr(model.config, "quantization_config") or
not hasattr(model.config.quantization_config, "quant_method") or
model.config.quantization_config.quant_method != "gptq"
)
if should_apply_merge_qkv:
model.apply(merge_qkv)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is gptq loaded as q4_1? If so, why can't we use merge_qkv? @qiuxin2012

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a new merge_qkv for gptq here. Just convert q k v to one LowBitLinear.

Copy link
Contributor

@jason-dai jason-dai Jun 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a new merge_qkv for gptq here. Just convert q k v to one LowBitLinear.

It's different from q4_1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gptq is loaded as q4_1. However, gptq's weights have already been quantized. If we want to use merge_qkv, we need to dequantize from the gptq format to the normal format first, perform merge_qkv, and then quantize back into the LowBitLinear.

I will try this method.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gptq is loaded as q4_1. However, gptq's weights have already been quantized. If we want to use merge_qkv, we need to dequantize from the gptq format to the normal format first, perform merge_qkv, and then quantize back into the LowBitLinear.

I will try this method.

Instead of dequantization, I think we can just rearrange the quantized qkv tensors into a combined one? Anyway I think we may fix it later in a separate PR if needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll merge this PR first so users can start using it, and then I'll submit another PR to address further fixes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we probably cannot merge qkv if quantization_config.desc_act==True.

@liu-shaojun liu-shaojun marked this pull request as ready for review June 26, 2024 04:54
@liu-shaojun liu-shaojun requested a review from yangw1234 June 26, 2024 04:54
@liu-shaojun liu-shaojun merged commit ab9f7f3 into intel-analytics:main Jun 26, 2024
33 checks passed
@liu-shaojun liu-shaojun deleted the qwen2 branch June 26, 2024 07:36
RyuKosei pushed a commit to RyuKosei/ipex-llm that referenced this pull request Jul 19, 2024
* merge_qkv if quant_method is 'gptq'

* fix python style checks

* refactor

* update GPU example
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants