Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NPU] Add Optimized Support for Llama3.2-1B/3B on NPU #12339

Merged
merged 10 commits into from
Nov 6, 2024

Conversation

sgwhat
Copy link
Contributor

@sgwhat sgwhat commented Nov 5, 2024

Description

1. Why the change?

Support llama3.2-1b/3b-instruct model on Intel NPU.

2. User API changes

  • Upgrade the required version of transformers to 4.45.0, everything else remains consistent with llama3.

3. Summary of the change

  • Support running the llama model on npu based on transformers 4.45.0.

4. How to test?

  • LNL test llama-3.2-1b/3b with transformers==4.45.0
  • LNL test llama-2-7b with transformers==4.40.0

5. To Do

  • Merge this PR with llama_mp.py
  • Add document and example

@sgwhat sgwhat changed the title [WIP] Add Optimized Support for Llama3.2-1B/3B on NPU [NPU] Add Optimized Support for Llama3.2-1B/3B on NPU Nov 6, 2024
@sgwhat sgwhat requested review from jason-dai and plusbang November 6, 2024 08:05
@sgwhat
Copy link
Contributor Author

sgwhat commented Nov 6, 2024

Shall we also upgrade Recommended NPU Driver Version in our document later? @jason-dai

Copy link
Contributor

@plusbang plusbang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also confirm this works for llama2-7b L0 pipeline example, other LGTM.

@@ -7,6 +7,8 @@ In this directory, you will find examples on how to directly run HuggingFace `tr
|------------|----------------------------------------------------------------|
| Llama2 | [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) |
| Llama3 | [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) |
| Llama3.2-1B | [meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct) |
| Llama3.2-3B | [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could merge to one line?

)
if model.config.num_hidden_layers == 28 or model.config.num_hidden_layers == 16:
# llama-3.2-3B & llama-3.2-1B
llama_model_forward = gen_llama_32_fused_model_forward(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel we could also add transformers version check here.

@sgwhat sgwhat merged commit a7b6668 into intel-analytics:main Nov 6, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants