[NPU] Add Optimized Support for Llama3.2-1B/3B on NPU #12339

sgwhat · 2024-11-05T08:31:00Z

Description

1. Why the change?

Support llama3.2-1b/3b-instruct model on Intel NPU.

2. User API changes

Upgrade the required version of transformers to 4.45.0, everything else remains consistent with llama3.

3. Summary of the change

Support running the llama model on npu based on transformers 4.45.0.

4. How to test?

LNL test llama-3.2-1b/3b with transformers==4.45.0
LNL test llama-2-7b with transformers==4.40.0

5. To Do

Merge this PR with llama_mp.py
Add document and example

sgwhat · 2024-11-06T08:07:35Z

Shall we also upgrade Recommended NPU Driver Version in our document later? @jason-dai

plusbang

Please also confirm this works for llama2-7b L0 pipeline example, other LGTM.

plusbang · 2024-11-06T08:45:37Z

python/llm/example/NPU/HF-Transformers-AutoModels/LLM/README.md

@@ -7,6 +7,8 @@ In this directory, you will find examples on how to directly run HuggingFace `tr
 |------------|----------------------------------------------------------------|
 | Llama2 | [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) |
 | Llama3 | [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) |
+| Llama3.2-1B | [meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct) |
+| Llama3.2-3B | [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) |


Maybe we could merge to one line?

plusbang · 2024-11-06T08:54:50Z

python/llm/src/ipex_llm/transformers/npu_models/convert_mp.py

-    )
+    if model.config.num_hidden_layers == 28 or model.config.num_hidden_layers == 16:
+        # llama-3.2-3B & llama-3.2-1B
+        llama_model_forward = gen_llama_32_fused_model_forward(


I feel we could also add transformers version check here.

sgwhat added 4 commits November 5, 2024 16:20

Add initial support for llama3.2-1b/3b

d34ce57

revert mp_base unexpected changes

631e482

move llama3.2 support into current llama_mp impl

7137df0

fix code style

a95c128

sgwhat changed the title ~~[WIP] Add Optimized Support for Llama3.2-1B/3B on NPU~~ [NPU] Add Optimized Support for Llama3.2-1B/3B on NPU Nov 6, 2024

sgwhat added 3 commits November 6, 2024 15:35

update

c978885

add llama3.2 document

cdf5467

update doc style

502957e

sgwhat requested review from jason-dai and plusbang November 6, 2024 08:05

plusbang approved these changes Nov 6, 2024

View reviewed changes

plusbang reviewed Nov 6, 2024

View reviewed changes

sgwhat added 3 commits November 6, 2024 16:55

simplify position_ids logics

0794aee

check transformers version instead and fix code style

b2eca18

fix code style only

426e641

sgwhat merged commit a7b6668 into intel-analytics:main Nov 6, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NPU] Add Optimized Support for Llama3.2-1B/3B on NPU #12339

[NPU] Add Optimized Support for Llama3.2-1B/3B on NPU #12339

sgwhat commented Nov 5, 2024 •

edited

Loading

sgwhat commented Nov 6, 2024

plusbang left a comment

plusbang Nov 6, 2024

plusbang Nov 6, 2024

[NPU] Add Optimized Support for Llama3.2-1B/3B on NPU #12339

[NPU] Add Optimized Support for Llama3.2-1B/3B on NPU #12339

Conversation

sgwhat commented Nov 5, 2024 • edited Loading

Description

1. Why the change?

2. User API changes

3. Summary of the change

4. How to test?

5. To Do

sgwhat commented Nov 6, 2024

plusbang left a comment

Choose a reason for hiding this comment

plusbang Nov 6, 2024

Choose a reason for hiding this comment

plusbang Nov 6, 2024

Choose a reason for hiding this comment

sgwhat commented Nov 5, 2024 •

edited

Loading