FIX: Qwen1.5-GPTQ-Int4 inference error #11432

liu-shaojun · 2024-06-26T02:35:20Z

Description

1. Why the change?

Fix user issue: #11413

2. User API changes

No user api change.

3. Summary of the change

The weights of gptq have already been quantized. During the merge_qkv process in qwen2, we encounter an AttributeError: 'QuantLinear' object has no attribute 'weight'. Did you mean: 'qweight'? This PR skips merge_qkv for the qwen2 gptq model as a quick fix. We will submit another PR later to thoroughly resolve this issue.

4. How to test?

manually tested it

5. New dependencies

No

jason-dai · 2024-06-26T02:50:21Z

python/llm/src/ipex_llm/transformers/convert.py

+        # Skip merge_qkv if quant_method is 'gptq'
+        should_apply_merge_qkv = (
+            not hasattr(model.config, "quantization_config") or
+            not hasattr(model.config.quantization_config, "quant_method") or
+            model.config.quantization_config.quant_method != "gptq"
+        )
+        if should_apply_merge_qkv:
+            model.apply(merge_qkv)


Is gptq loaded as q4_1? If so, why can't we use merge_qkv? @qiuxin2012

We need a new merge_qkv for gptq here. Just convert q k v to one LowBitLinear.

We need a new merge_qkv for gptq here. Just convert q k v to one LowBitLinear.

It's different from q4_1?

gptq is loaded as q4_1. However, gptq's weights have already been quantized. If we want to use merge_qkv, we need to dequantize from the gptq format to the normal format first, perform merge_qkv, and then quantize back into the LowBitLinear.

I will try this method.

gptq is loaded as q4_1. However, gptq's weights have already been quantized. If we want to use merge_qkv, we need to dequantize from the gptq format to the normal format first, perform merge_qkv, and then quantize back into the LowBitLinear.

I will try this method.

Instead of dequantization, I think we can just rearrange the quantized qkv tensors into a combined one? Anyway I think we may fix it later in a separate PR if needed.

I'll merge this PR first so users can start using it, and then I'll submit another PR to address further fixes.

I guess we probably cannot merge qkv if quantization_config.desc_act==True.

* merge_qkv if quant_method is 'gptq' * fix python style checks * refactor * update GPU example

liu-shaojun added 2 commits June 26, 2024 02:32

merge_qkv if quant_method is 'gptq'

9798e05

fix python style checks

887ac64

jason-dai reviewed Jun 26, 2024

View reviewed changes

liu-shaojun marked this pull request as ready for review June 26, 2024 04:54

liu-shaojun requested a review from yangw1234 June 26, 2024 04:54

yangw1234 approved these changes Jun 26, 2024

View reviewed changes

liu-shaojun added 2 commits June 26, 2024 05:15

refactor

6e6db4e

update GPU example

d0ea969

liu-shaojun merged commit ab9f7f3 into intel-analytics:main Jun 26, 2024
33 checks passed

liu-shaojun deleted the qwen2 branch June 26, 2024 07:36

RyuKosei pushed a commit to RyuKosei/ipex-llm that referenced this pull request Jul 19, 2024

FIX: Qwen1.5-GPTQ-Int4 inference error (intel-analytics#11432)

e5ce8c4

* merge_qkv if quant_method is 'gptq' * fix python style checks * refactor * update GPU example

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIX: Qwen1.5-GPTQ-Int4 inference error #11432

FIX: Qwen1.5-GPTQ-Int4 inference error #11432

liu-shaojun commented Jun 26, 2024 •

edited

Loading

jason-dai Jun 26, 2024

qiuxin2012 Jun 26, 2024

jason-dai Jun 26, 2024 •

edited

Loading

liu-shaojun Jun 26, 2024

jason-dai Jun 26, 2024

liu-shaojun Jun 26, 2024

yangw1234 Jun 26, 2024

FIX: Qwen1.5-GPTQ-Int4 inference error #11432

FIX: Qwen1.5-GPTQ-Int4 inference error #11432

Conversation

liu-shaojun commented Jun 26, 2024 • edited Loading

Description

1. Why the change?

2. User API changes

3. Summary of the change

4. How to test?

5. New dependencies

jason-dai Jun 26, 2024

Choose a reason for hiding this comment

qiuxin2012 Jun 26, 2024

Choose a reason for hiding this comment

jason-dai Jun 26, 2024 • edited Loading

Choose a reason for hiding this comment

liu-shaojun Jun 26, 2024

Choose a reason for hiding this comment

jason-dai Jun 26, 2024

Choose a reason for hiding this comment

liu-shaojun Jun 26, 2024

Choose a reason for hiding this comment

yangw1234 Jun 26, 2024

Choose a reason for hiding this comment

liu-shaojun commented Jun 26, 2024 •

edited

Loading

jason-dai Jun 26, 2024 •

edited

Loading