-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NPU] Add Optimized Support for Llama3.2-1B/3B on NPU #12339
Conversation
Shall we also upgrade |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also confirm this works for llama2-7b L0 pipeline example, other LGTM.
@@ -7,6 +7,8 @@ In this directory, you will find examples on how to directly run HuggingFace `tr | |||
|------------|----------------------------------------------------------------| | |||
| Llama2 | [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) | | |||
| Llama3 | [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) | | |||
| Llama3.2-1B | [meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct) | | |||
| Llama3.2-3B | [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could merge to one line?
) | ||
if model.config.num_hidden_layers == 28 or model.config.num_hidden_layers == 16: | ||
# llama-3.2-3B & llama-3.2-1B | ||
llama_model_forward = gen_llama_32_fused_model_forward( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel we could also add transformers version check here.
Description
1. Why the change?
Support llama3.2-1b/3b-instruct model on Intel NPU.
2. User API changes
3. Summary of the change
4. How to test?
llama-3.2-1b/3b
withtransformers==4.45.0
llama-2-7b
withtransformers==4.40.0
5. To Do
llama_mp.py