Add FuseMLP to NPU Llama #11565

hkvision · 2024-07-11T09:55:00Z

No description provided.

jason-dai · 2024-07-11T10:03:20Z

python/llm/src/ipex_llm/transformers/npu_models/fusedmlp.py

@@ -0,0 +1,163 @@
+from intel_npu_acceleration_library.backend.factory import NNFactory


change to lowbitmlp.py

jason-dai · 2024-07-11T10:03:32Z

python/llm/src/ipex_llm/transformers/npu_models/fusedmlp.py

+import uuid
+
+
+class QuantizedMLP(NNFactory):


change to LowBitMLP

jason-dai · 2024-07-11T10:04:10Z

python/llm/src/ipex_llm/transformers/npu_models/fusedmlp.py

+
+# TODO: separate it into a single file
+@torch.no_grad()
+def run_factory(


change to run_model

jason-dai · 2024-07-11T10:04:31Z

python/llm/src/ipex_llm/transformers/npu_models/fusedmlp.py

+    backend_cls: Any,
+    op_id: Optional[str] = None,
+) -> torch.Tensor:
+    """Run a factory operation. Depending on the datatype of the weights it runs a float or quantized operation.


Run an NPU model

jason-dai · 2024-07-11T10:07:35Z

python/llm/src/ipex_llm/transformers/npu_models/fusedmlp.py

+
+    input_shapes = [elem.shape for elem in x_np]
+    if models is None:
+        _model_cache[key] = deque([backend_cls(*input_shapes) for i in range(4)])


make 4 configurable

hkvision · 2024-07-12T11:10:30Z

Currently bias is not considered.
In the sample implementation, bias is not included in the op, just directly added to the output when the op finishes. For the fused mlp, may need to add bias into the overall graph as well.

hkvision added 2 commits July 11, 2024 17:59

add

88e1016

initial

722df73

jason-dai reviewed Jul 11, 2024

View reviewed changes

hkvision added 2 commits July 12, 2024 17:52

refactor and meet review

51f7326

fix style

8152e0f

hkvision marked this pull request as ready for review July 12, 2024 11:06

hkvision closed this Nov 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add FuseMLP to NPU Llama #11565

Add FuseMLP to NPU Llama #11565

hkvision commented Jul 11, 2024

jason-dai Jul 11, 2024

hkvision Jul 11, 2024

jason-dai Jul 11, 2024

jason-dai Jul 11, 2024

jason-dai Jul 11, 2024

jason-dai Jul 11, 2024

hkvision commented Jul 12, 2024

		@@ -0,0 +1,163 @@
		from intel_npu_acceleration_library.backend.factory import NNFactory

Add FuseMLP to NPU Llama #11565

Add FuseMLP to NPU Llama #11565

Conversation

hkvision commented Jul 11, 2024

jason-dai Jul 11, 2024

Choose a reason for hiding this comment

hkvision Jul 11, 2024

Choose a reason for hiding this comment

jason-dai Jul 11, 2024

Choose a reason for hiding this comment

jason-dai Jul 11, 2024

Choose a reason for hiding this comment

jason-dai Jul 11, 2024

Choose a reason for hiding this comment

jason-dai Jul 11, 2024

Choose a reason for hiding this comment

hkvision commented Jul 12, 2024