Support qwen2-1.5b with fused decoderlayer optimization on NPU #11888

plusbang · 2024-08-21T09:28:53Z

Description

Add qwen2-1.5B NPU support with fused decoderlayer and multi process optimization.

common changes: add bias parameter for LLMBaseNNFactory.attention, fix small bug of repeat_kv
qwen2 specific: add related convert, add decoderrunner / prefillrunner / fuseddecoder etc, provide example script
llama2 specific: remove unused import

Unit test: Please manually trigger the PR Validation here by inputting the PR number (e.g., 1234). And paste your action link here once it has been successfully finished.
https://github.com/intel-analytics/ipex-llm-workflow/actions/runs/10487861709
Application test (verified on MTL and LNL, https://github.com/analytics-zoo/nano/issues/1576#issuecomment-2300519193)

plusbang added 6 commits August 21, 2024 14:11

first add

b418edd

fix

c72f57d

work

66a7f00

fix code style

e14175f

add readme

2906cef

small change

8ed00c8

plusbang marked this pull request as ready for review August 21, 2024 10:10

plusbang requested review from yangw1234 and sgwhat August 21, 2024 10:51

sgwhat approved these changes Aug 22, 2024

View reviewed changes

plusbang merged commit 72a7bf6 into intel-analytics:main Aug 22, 2024
1 check passed