Support vpm and resampler module of minicpm-v on NPU #12375

plusbang · 2024-11-11T03:22:35Z

Description

Update minicpm-v usage on NPU.

2. User API changes

No need to specify torch_dtype=torch.float32 and modules_to_not_convert=['vpm', 'resampler']
lm_head of minicpm_v_2_6 could run on NPU by default

3. Summary of the change

replace conv2d and layernorm with MinicpmVPatchEmbedding and MinicpmVLayerNorm
pad mlp.fc2 and replace forward function to avoid compile error
pad lm_head and replace forward function to avoid compile error
port attention / multi-head-attention / resampler forward functions
update example script

4. How to test?

Application test
https://github.com/analytics-zoo/nano/issues/1724#issuecomment-2467282958

python/llm/src/ipex_llm/transformers/npu_models/minicpmv_mp.py

rnwang04

others LGTM

jason-dai · 2024-11-12T09:04:55Z

python/llm/src/ipex_llm/transformers/npu_models/minicpmv_mp.py

+        padded_weight = F.pad(module.lm_head.weight,
+                              (0, 0, 0, 152064-151666))  # 152064 is qwen2-7b vocab_size


Will this impact accuracy for channel-wise?

Will this impact accuracy for channel-wise?

The original weight shape is [151666, 3584] and pad it to [152064, 3584], each row has no change. I think this don't influence CW : )

plusbang added 3 commits November 11, 2024 10:20

clean

2a0a5f3

Add

bc6af09

fix code style

2e0b83e

plusbang requested review from rnwang04 and sgwhat November 11, 2024 05:41

plusbang added 2 commits November 11, 2024 14:44

move lm_head

67c881e

fix code style

c2001ff

rnwang04 reviewed Nov 11, 2024

View reviewed changes

python/llm/src/ipex_llm/transformers/npu_models/minicpmv_mp.py Show resolved Hide resolved

rnwang04 reviewed Nov 11, 2024

View reviewed changes

python/llm/src/ipex_llm/transformers/npu_models/minicpmv_mp.py Show resolved Hide resolved

rnwang04 approved these changes Nov 11, 2024

View reviewed changes

plusbang added 5 commits November 11, 2024 16:45

add comment

9b7302e

divide lm head of MiniCPM-V 2.6

dc8ea9e

fix code style

4671d95

update script

ef20bb3

fix

e1bed68

plusbang merged commit 7a97fbb into intel-analytics:main Nov 12, 2024
1 check passed

jason-dai reviewed Nov 12, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support vpm and resampler module of minicpm-v on NPU #12375

Support vpm and resampler module of minicpm-v on NPU #12375

plusbang commented Nov 11, 2024 •

edited

Loading

rnwang04 left a comment

jason-dai Nov 12, 2024

plusbang Nov 12, 2024

		padded_weight = F.pad(module.lm_head.weight,
		(0, 0, 0, 152064-151666)) # 152064 is qwen2-7b vocab_size

Support vpm and resampler module of minicpm-v on NPU #12375

Support vpm and resampler module of minicpm-v on NPU #12375

Conversation

plusbang commented Nov 11, 2024 • edited Loading

Description

2. User API changes

3. Summary of the change

4. How to test?

rnwang04 left a comment

Choose a reason for hiding this comment

jason-dai Nov 12, 2024

Choose a reason for hiding this comment

plusbang Nov 12, 2024

Choose a reason for hiding this comment

plusbang commented Nov 11, 2024 •

edited

Loading