-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NPU] Support Baichuan groupwise & gw code refactor #12337
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please carefully verify models, other LGTM.
mlp_module_names = ["down_proj", "up_proj", "gate_proj"] | ||
if ( | ||
isinstance(module, (Qwen2Attention, LlamaAttention)) | ||
or module.__class__.__name__ in ['MiniCPMAttention', 'Attention'] | ||
or module.__class__.__name__ in ['MiniCPMAttention'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we also update the following check of module.__class__.__name__ in ['MiniCPMMLP', 'MLP']
?
@@ -115,8 +124,10 @@ def __init__( | |||
attention_mask = self.create_input_op((self.batch_size, 1, 1, self.max_seq_len + 1), | |||
dtype=np.int64) | |||
else: | |||
attention_mask = self.create_input_op((self.batch_size, 1, self.seq_len, self.seq_len), | |||
dtype=np.int64) | |||
# attention_mask = self.create_input_op((self.batch_size, 1, self.seq_len, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this directly.
Description
TODOs:
1. Why the change?
2. User API changes
3. Summary of the change
4. How to test?
1234
). And paste your action link here once it has been successfully finished.5. New dependencies
- Dependency1
- Dependency2
- ...
- Dependency1 and license1
- Dependency2 and license2
- ...