Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Ascend NPU support for nf4 quant #1422

Open
wants to merge 2 commits into
base: multi-backend-refactor
Choose a base branch
from

Conversation

statelesshz
Copy link

@statelesshz statelesshz commented Nov 21, 2024

What does this PR do?

This PR adds Ascend NPU support for nf4 quant/dequant and allows QLoRA fine-tuning for LLMs using transformers, peft, and trl.

You may notice that the nf4 quantization method is currently implemented in PyTorch. This interim measure is due to the fact that the high-performance version implemented with AscendC is still in progress 😞 . Meanwhile, we've received feedback from many in the Ascend NPU community expressing their keen interest in using QLoRA to fine-tune LLMs as soon as possible, so there is this PR.

Related PR: huggingface/transformers#31512

Collaborators

@SlightwindSec @Ginray @MatrixPlayer

cc @Titus-von-Koeller @matthewdouglas

Co-authored-by: Slightwind <[email protected]>
Co-authored-by: Ginray <[email protected]>
@statelesshz
Copy link
Author

statelesshz commented Nov 21, 2024

asciicast

Refer to this blog, I did a E2E test on the llama2-7b-hf with QLoRA fine-tuning in my env with NPU device, it works 🤗.

Here is the script I used.

@baymax591
Copy link

Thanks a lot for sharing this PR and the video demo! Thanks to the demo, I was able to successfully run NF4 quant/dequant on the NPU with ease. The detailed explanation in the video really helped me understand the process and key steps. Looking forward to more updates in the future—great work!

@baymax591
Copy link

I hope this PR can be merged soon, as it provides valuable improvements. Looking forward to seeing it merged!
cc @Titus-von-Koeller

@SunMarc
Copy link
Contributor

SunMarc commented Nov 27, 2024

Nice work and thanks for the demo ! Can you have a look @matthewdouglas ?

Copy link

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@matthewdouglas
Copy link
Member

I will be able to look in more detail next week, but at first glance it looks nice. Thanks @statelesshz !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants