-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEAT: Add EETQ support in PEFT #1675
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for adding EETQ support. This implementation looks really smooth, nice!
Could you please add an entry to the quantization docs of PEFT? Maybe mention some pros/cons of EETQ in comparison to the quantization methods or a reference to where it's better explained (their README wasn't very helpful in this regard).
Did you see the CI error:
Error: The version '3.9' with architecture 'arm64' was not found for macOS 14.4.1.
Any idea what that is about?
|
||
|
||
if is_eetq_available(): | ||
from eetq import EetqLinear |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have lazy import as for bnb or is it not necessary for EETQ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I mean is should we indent all the code below to be inside of if is_eetq_available():
? Or is it not necessary because, unlike bnb, EETQ does not initialize cuda?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it does, let's indent it to be on the safe zon
src/peft/tuners/lora/eetq.py
Outdated
|
||
self._active_adapter = adapter_name | ||
self.update_layer(adapter_name, r, lora_alpha, lora_dropout, init_lora_weights, use_rslora) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would merging currently work with EETQ? I would assume not. Maybe we can raise an error when users try it?
src/peft/tuners/lora/eetq.py
Outdated
return result | ||
|
||
def merge(self, safe_merge: bool = False, adapter_names: Optional[list[str]] = None) -> None: | ||
raise ValueError("Merging LoRA layers is not supported for Eetq layers.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's also add unmerge
for completeness. I also wonder if ValueError
is best or if it should be TypeError
(maybe AttributeError
?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense!
|
||
|
||
if is_eetq_available(): | ||
from eetq import EetqLinear |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I mean is should we indent all the code below to be inside of if is_eetq_available():
? Or is it not necessary because, unlike bnb, EETQ does not initialize cuda?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for adding EETQ, LGTM. I have 2 nits regarding the documentation, up to you if you want to fix them.
import torch | ||
from transformers import EetqConfig | ||
|
||
config = EetqConfig("int8") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably requires the latest transformers, right? Maybe worth adding the min version?
Co-authored-by: Benjamin Bossan <[email protected]>
What does this PR do?
This PR adds EETQ quantized linear layers support in PEFT.
EETQ has been recently added in transformers and offers a rapid 8-bit quantization inference: huggingface/transformers#30262
Fixes: #1643
Learn more about EETQ here: https://github.com/NetEase-FuXi/EETQ
TODO
cc @SunMarc @BenjaminBossan @pacman100