-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add FuseMLP to NPU Llama #11565
Add FuseMLP to NPU Llama #11565
Conversation
@@ -0,0 +1,163 @@ | |||
from intel_npu_acceleration_library.backend.factory import NNFactory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change to lowbitmlp.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure.
import uuid | ||
|
||
|
||
class QuantizedMLP(NNFactory): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change to LowBitMLP
|
||
# TODO: separate it into a single file | ||
@torch.no_grad() | ||
def run_factory( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change to run_model
backend_cls: Any, | ||
op_id: Optional[str] = None, | ||
) -> torch.Tensor: | ||
"""Run a factory operation. Depending on the datatype of the weights it runs a float or quantized operation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Run an NPU model
|
||
input_shapes = [elem.shape for elem in x_np] | ||
if models is None: | ||
_model_cache[key] = deque([backend_cls(*input_shapes) for i in range(4)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make 4 configurable
Currently bias is not considered. |
No description provided.