LLM: split chatglm3's mlp and use mlp fusion #10542

rnwang04 · 2024-03-26T04:24:37Z

Description

This split chatglm3's mlp and use mlp fusion, which can has ~1ms on MTL.
But quantize kv cache + mlp fusion will cause change of output on Arc & MTL (which seems a known issue)

1. Why the change?

2. User API changes

3. Summary of the change

4. How to test?

Unit test

rnwang04 added 2 commits March 25, 2024 22:04

temp save

b3278b6

fix

d5c27e8

rnwang04 marked this pull request as draft March 26, 2024 04:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLM: split chatglm3's mlp and use mlp fusion #10542

LLM: split chatglm3's mlp and use mlp fusion #10542

rnwang04 commented Mar 26, 2024 •

edited

Loading

LLM: split chatglm3's mlp and use mlp fusion #10542

Are you sure you want to change the base?

LLM: split chatglm3's mlp and use mlp fusion #10542

Conversation

rnwang04 commented Mar 26, 2024 • edited Loading

Description

1. Why the change?

2. User API changes

3. Summary of the change

4. How to test?

rnwang04 commented Mar 26, 2024 •

edited

Loading