Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support loading checkpoints quantized using Autofp8 #286

Merged
merged 28 commits into from
Sep 25, 2024

Conversation

Yantom1
Copy link

@Yantom1 Yantom1 commented Sep 16, 2024

Support loading https://huggingface.co/collections/neuralmagic/fp8-llms-for-vllm-666742ed2b78b7ac8df13127

Skip cuda checks
Use scaled_fp8_quant instead of _scaled_mm
Fix weights and weight_scale for guudi2 flot8_e4m3fn range.

@michalkuligowski
Copy link

@Yantom1 how does that relate to #225? some of the code is duplicated, does the 225 should be closed?

@Yantom1
Copy link
Author

Yantom1 commented Sep 17, 2024

@Yantom1 how does that relate to #225? some of the code is duplicated, does the 225 should be closed?

yes, I closed

@Yantom1 Yantom1 requested a review from MrGeva September 19, 2024 12:20
@michalkuligowski
Copy link

@Yantom1 I see you merged it to extensions repo, this PR can be closed yes?

@Yantom1
Copy link
Author

Yantom1 commented Sep 22, 2024

@Yantom1 I see you merged it to extensions repo, this PR can be closed yes?

no, it is part of this change. In the extension repo I changed only ops.py file.

@Yantom1 Yantom1 closed this Sep 22, 2024
@Yantom1 Yantom1 reopened this Sep 22, 2024
@michalkuligowski michalkuligowski merged commit 29fb5ed into habana_main Sep 25, 2024
19 checks passed
kzawora-intel added a commit that referenced this pull request Oct 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants