Support Qwen2-7b MLP in int4 and transpose_value_cache=True #11968

yangw1234 · 2024-08-29T21:33:54Z

Description

Support Qwen2-7b MLP in int4 and transpose_value_cache=True

Test: https://github.com/intel-analytics/ipex-llm-workflow/actions/runs/10623417029

Perf: https://github.com/analytics-zoo/nano/issues/1576#issuecomment-2319077118

plusbang · 2024-08-30T01:29:04Z

python/llm/src/ipex_llm/transformers/npu_models/mp_models_base.py

@@ -396,7 +396,7 @@ def set_weights_async(self, op_id, weights):
                          (f"weights size does not match graph, "
                           f"with weights size: {len(weights)} and "
                           f" graph linear size: {len(self.linear_ops)}"))
-        self.setWeights(offset, op_id, *weights)
+        self.setWeights(offset, op_id, *weights, verify_size=True)


maybe we could remove verify size check? other LGTM.

plusbang · 2024-09-02T06:34:06Z

Fixed conflict and merge it first.

…alytics#11968)

yangw1234 requested a review from plusbang August 29, 2024 22:16

plusbang approved these changes Aug 30, 2024

View reviewed changes

yangw1234 added 2 commits September 2, 2024 14:29

Support Qwen2-7b mlp in int4

29d1836

fix style

e2b8c1e

plusbang force-pushed the qwenint4 branch from e835ce5 to e2b8c1e Compare September 2, 2024 06:30

rm verify size

0851c35

plusbang merged commit c48817b into intel-analytics:main Sep 2, 2024
1 check passed

plusbang mentioned this pull request Sep 2, 2024

Fix AttributeError of qwen2-1.5B #11990

Merged

1 task

cyita pushed a commit to cyita/BigDL that referenced this pull request Sep 5, 2024

Support Qwen2-7b MLP in int4 and transpose_value_cache=True (intel-an…

9132ec1

…alytics#11968)

cranechu0131 pushed a commit to cranechu0131/ipex-llm that referenced this pull request Sep 9, 2024

Support Qwen2-7b MLP in int4 and transpose_value_cache=True (intel-an…

71770b7

…alytics#11968)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Qwen2-7b MLP in int4 and transpose_value_cache=True #11968

Support Qwen2-7b MLP in int4 and transpose_value_cache=True #11968

yangw1234 commented Aug 29, 2024 •

edited

Loading

plusbang Aug 30, 2024

plusbang commented Sep 2, 2024

Support Qwen2-7b MLP in int4 and transpose_value_cache=True #11968

Support Qwen2-7b MLP in int4 and transpose_value_cache=True #11968

Conversation

yangw1234 commented Aug 29, 2024 • edited Loading

Description

plusbang Aug 30, 2024

Choose a reason for hiding this comment

plusbang commented Sep 2, 2024

yangw1234 commented Aug 29, 2024 •

edited

Loading