Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix (llm): correct handling of attention mask shape #652

Merged
merged 1 commit into from
Jul 6, 2023

Conversation

Giuseppe5
Copy link
Collaborator

No description provided.

@volcacius volcacius merged commit 53ed201 into Xilinx:llm Jul 6, 2023
volcacius added a commit that referenced this pull request Jul 17, 2023
* Examples: WIP LLM block quantization

* Add support for block zero-point

* Add torch-mlir custom op support

* Add test linear

Signed-off-by: Alessandro Pappalardo <[email protected]>

* Update to custom matmul export

Signed-off-by: Alessandro Pappalardo <[email protected]>

* Fix errors

Signed-off-by: Alessandro Pappalardo <[email protected]>

* Fix output shape of custom op

Signed-off-by: Alessandro Pappalardo <[email protected]>

* Add lowering to torch_mlir for single layer

Signed-off-by: Alessandro Pappalardo <[email protected]>

* Some cleanups

* WIP llm flow

Signed-off-by: Alessandro Pappalardo <[email protected]>

* Fix (examples/llm): typo in custom quant matmul op (#607)

* Test act equalization support

* Initial end to end flow

* Initial support for QuantMHA on OPT

* Fix act equalization

* Typos in prints

* Reorganize validate

* Add initial per row quantizers

* Add per row input quantization support

* Support group quant slicing

* Adopt SliceTensor for block weight partial quant

* Add float16 support

* Fix scale type name

* Add support for LN affine merging

* WIP currently broken

* Clean up weight eq support

* Set weight narrow range always to False

* Add fx act equalization, fixes for float16 support

* Fix validate

* Fix backport imports

* Fix example export

Signed-off-by: Alessandro Pappalardo <[email protected]>

* Fix value_trace call in ln affine merging

* Add per tensor/row/group dynamic scale support, some dtype improvements

* Fix (llm): correct handling of attention mask shape (#652)

* ALways export in fp32 base dtype on CPU

* Export improvements

* Fix errors after latest PR

---------

Signed-off-by: Alessandro Pappalardo <[email protected]>
Co-authored-by: jinchen62 <[email protected]>
Co-authored-by: Giuseppe Franco <[email protected]>
@Giuseppe5 Giuseppe5 deleted the attetion_mask_llm branch July 21, 2023 19:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants