Fix (llm): correct handling of attention mask shape #652

Giuseppe5 · 2023-07-05T08:54:51Z

No description provided.

* Examples: WIP LLM block quantization * Add support for block zero-point * Add torch-mlir custom op support * Add test linear Signed-off-by: Alessandro Pappalardo <[email protected]> * Update to custom matmul export Signed-off-by: Alessandro Pappalardo <[email protected]> * Fix errors Signed-off-by: Alessandro Pappalardo <[email protected]> * Fix output shape of custom op Signed-off-by: Alessandro Pappalardo <[email protected]> * Add lowering to torch_mlir for single layer Signed-off-by: Alessandro Pappalardo <[email protected]> * Some cleanups * WIP llm flow Signed-off-by: Alessandro Pappalardo <[email protected]> * Fix (examples/llm): typo in custom quant matmul op (#607) * Test act equalization support * Initial end to end flow * Initial support for QuantMHA on OPT * Fix act equalization * Typos in prints * Reorganize validate * Add initial per row quantizers * Add per row input quantization support * Support group quant slicing * Adopt SliceTensor for block weight partial quant * Add float16 support * Fix scale type name * Add support for LN affine merging * WIP currently broken * Clean up weight eq support * Set weight narrow range always to False * Add fx act equalization, fixes for float16 support * Fix validate * Fix backport imports * Fix example export Signed-off-by: Alessandro Pappalardo <[email protected]> * Fix value_trace call in ln affine merging * Add per tensor/row/group dynamic scale support, some dtype improvements * Fix (llm): correct handling of attention mask shape (#652) * ALways export in fp32 base dtype on CPU * Export improvements * Fix errors after latest PR --------- Signed-off-by: Alessandro Pappalardo <[email protected]> Co-authored-by: jinchen62 <[email protected]> Co-authored-by: Giuseppe Franco <[email protected]>

Fix (llm): correct handling of attention mask shape

d8a2bca

Giuseppe5 force-pushed the attetion_mask_llm branch from 8d48e24 to d8a2bca Compare July 6, 2023 09:46

volcacius merged commit 53ed201 into Xilinx:llm Jul 6, 2023

volcacius pushed a commit that referenced this pull request Jul 17, 2023

Fix (llm): correct handling of attention mask shape (#652)

a4881df

Giuseppe5 deleted the attetion_mask_llm branch July 21, 2023 19:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix (llm): correct handling of attention mask shape #652

Fix (llm): correct handling of attention mask shape #652

Giuseppe5 commented Jul 5, 2023

Fix (llm): correct handling of attention mask shape #652

Fix (llm): correct handling of attention mask shape #652

Conversation

Giuseppe5 commented Jul 5, 2023