-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FourierFT Support #1838
FourierFT Support #1838
Conversation
Thanks a lot for adding this new method to PEFT. The method looks quite interesting. Based on your name, I assume that you're one of the paper authors. I haven't done a full review yet. Could you add an example that we can run to see FourierFT in action? Maybe something based on LoRA, so that we can get a comparison of the effectiveness. For a full PR, we also need to add documentations and tests. Would that be something you could work on? If you need help with those, always feel free to ask. |
Thank you for the valuable comments! |
Thanks a lot. Let's start with the example first and then we can take a look at docs and tests. Regarding the tests, you could for instance check past PRs that add new methods. E.g. in the VeRA PR, you can see how the tests have been extended. To get a quick start to testing, I would, however, recommend to start by adding FourierFT to this test matrix: peft/tests/test_custom_models.py Line 59 in 683db0f
This should already cover like 90% of the relevant tests. Then run the tests locally with |
Got it. Working on it :D |
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Hi @BenjaminBossan , we have finished the examples, tests and docs. Feel free to reach us if anything is needed! |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Could you please run |
Thanks for the updates. Code style checks are still failing, possible you have a different version of ruff on your system (CI uses v0.2.2). Anyway, it should be easy to fix by either installing the same version of ruff and re-running, or by doing the following:
|
We're still seeing a failure mentioned in this comment:
You can run Also, I updated PEFT to use the latest ruff version 0.4.8. So if you want to use that one instead, feel free to merge the latest main branch and upgrade your local ruff. But that's optional, you can also just keep the PR as is. |
Co-authored-by: Benjamin Bossan <[email protected]>
Thanks! I just did not found the doc-builder... Anyway, should be OK now? Thank you a lot for the effort! |
I see that you already added a check for the By the way, you should be able to run the tests locally with |
Thanks! We have passed the tests. Feel free to reach us if anything is needed! |
Thanks for the fixes. There are still some issues with running ruff on the CI, could you please run ruff check --fix src tests examples docs
ruff format src tests examples docs
doc-builder style src/peft tests docs/source --max_len 119 The ruff version is v0.4.9. |
Hi, thanks for the suggestions and we have further fixed style errors based on ruff version 0.4.9. |
Did you also run |
Hi, the newly pushed codes have undergone the doc-builder style check with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for working on the remaining points. The PR is very advanced now and there are only some smaller issues left. Please check my comments.
On top, could you please also enable the adapter deletion tests for FourierFT:
Line 1043 in d37dde6
supported_peft_types = [ Line 1089 in d37dde6
supported_peft_types = [PeftType.LORA, PeftType.LOHA, PeftType.LOKR, PeftType.IA3, PeftType.OFT, PeftType.BOFT]
I think with the fix to random_loc_seed
that I describe in one of my comments, these tests should pass.
I did not have the opportunity yet to review the notebook yet, will do so next week.
src/peft/tuners/tuners_utils.py
Outdated
print(self.other_param_names) | ||
for attr in self.adapter_layer_names + self.other_param_names: | ||
print(attr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete please
src/peft/tuners/fourierft/layer.py
Outdated
# Mark the weight as unmerged | ||
self._disable_adapters = False | ||
self.merged_adapters = [] | ||
self.fourierft_random_loc_seed = kwargs.pop("random_loc_seed") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.fourierft_random_loc_seed = kwargs.pop("random_loc_seed") | |
self.fourierft_random_loc_seed = {} |
The logic needs to be changed a bit here: As is, there would be only one value for this, even if there are multiple adapters. Instead, this should be a dict with one value per adapter and the adapter_name
as key. So basically, treat it the same as e.g. fourierft_scaling
. The update_layer
method also needs to be updated to set the correct value for random_loc_seed
.
tests/test_custom_models.py
Outdated
@@ -865,7 +884,8 @@ def test_disable_adapters(self, test_name, model_id, config_cls, config_kwargs): | |||
) | |||
model = get_peft_model(model, config) | |||
model.eval() | |||
outputs_before = model(**X) | |||
with model.disable_adapter(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change still needs to be reverted.
src/peft/tuners/fourierft/layer.py
Outdated
return delta_weight | ||
|
||
|
||
# ------------------------------------------------------------------------------------------ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment can be removed.
src/peft/tuners/fourierft/config.py
Outdated
"The initialization of the Fourier weights. Set this to False if the spectrum are initialized to a standard normal distribution." | ||
"Set this to True if the spectrum are initialized to zeros." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"The initialization of the Fourier weights. Set this to False if the spectrum are initialized to a standard normal distribution." | |
"Set this to True if the spectrum are initialized to zeros." | |
"The initialization of the Fourier weights. Set this to False if the spectrum should be initialized to a standard normal distribution." | |
"Set this to True if the spectrum should be initialized to zeros." |
Also, out of curiosity: Did you run some tests with weights initialized as zeros? If yes, did you find a noticeable performance drop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, thanks very much for reviewing. In our latest PR version, we add the recommended tests and correct the errors. Regarding this curiosity, we found that the two initialization schemes both don't have an absolute advantage. But overall, the default Gaussian initialization scheme in our paper (init_weights=False
in codes) leads in more cases and converges faster.
Best.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for checking this.
Could you please still update the text to use "should be", which I think is clearer. Please update the docstring above accordingly and ensure that the character limit is respected.
src/peft/tuners/fourierft/layer.py
Outdated
n_frequency: int = 1000, | ||
scaling: float = 150.0, | ||
fan_in_fan_out: bool = False, # Set this to True if the layer to replace stores weight like (fan_in, fan_out) | ||
init_weights: Union[bool, str] = True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the default be False
, like in the config?
Yes, these two tests are successfully performed after we complete the fix to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for the recent changes. We're 99% there, I only found a few tiny issues left, which I think should be quickly fixed.
src/peft/tuners/fourierft/config.py
Outdated
The following examples of settings regarding 'n_frequency' can be used as reference for users. For NLU | ||
tasks with RoBERTa-base and RoBERTa-large models, adopting 'n_frequency': 100 can almost achieve similar | ||
results as 'r': 8 in LoRA. For image classification tasks with ViT-base and Vit-large models, adopting | ||
'n_frequency': 3000 can almost achieve similar results as 'r': 16 in LoRA. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for providing this detailed description, which should really help users pick a good value. I would add a little bit extra: As a naive user, when I read that n_frequency=3000
is roughly the same as r=16
in LoRA, I would think that FourierFT is much less parameter efficient, because 3000 >> 16. But of course this is an incorrect impression. So I would add a sentence to clarify the amount of trainable parameters.
src/peft/tuners/fourierft/layer.py
Outdated
# Mark the weight as unmerged | ||
self._disable_adapters = False | ||
self.merged_adapters = [] | ||
self.fourierft_random_loc_seed = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move this a few lines up to be next to the other fourier-specific params.
src/peft/tuners/fourierft/layer.py
Outdated
|
||
|
||
class FourierFTLinear(nn.Module, FourierFTLayer): | ||
# Lora implemented in a dense layer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Lora implemented in a dense layer | |
# FourierFT implemented in a dense layer |
src/peft/tuners/fourierft/model.py
Outdated
"random_loc_seed": fourierft_config.random_loc_seed, | ||
} | ||
kwargs["bias"] = bias | ||
if isinstance(target, FourierFTLinear): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about:
if isinstance(target, FourierFTLinear): | |
if isinstance(target, FourierFTLayer): |
in case that new layer types are added in the future.
"model_name_or_path = \"roberta-base\"\n", | ||
"task = \"mrpc\"\n", | ||
"peft_type = PeftType.FOURIERFT\n", | ||
"device = \"cuda:1\"\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not hard-code this. How about: device = "cuda" if torch.cuda.is_available() else "cpu"
?
"peft_type = PeftType.FOURIERFT\n", | ||
"device = \"cuda:1\"\n", | ||
"num_epochs = 5 # for better results, increase this number\n", | ||
"n_frequency = 1000 # for better results, increase this number\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's also put scaling
here, so that all hyper-params are in one place.
" target_modules=[\"query\", \"value\"],\n", | ||
" scaling = 150.0,\n", | ||
")\n", | ||
"head_lr = 6e-3\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add a comment about these two lr values?
@@ -0,0 +1,554 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's mention that this notebook requires evaluate
and scikit-learn
to run, as those are not PEFT dependencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the recent updates, we're almost finished now. Just some small issues left, please check.
src/peft/tuners/fourierft/config.py
Outdated
"The initialization of the Fourier weights. Set this to False if the spectrum are initialized to a standard normal distribution." | ||
"Set this to True if the spectrum are initialized to zeros." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for checking this.
Could you please still update the text to use "should be", which I think is clearer. Please update the docstring above accordingly and ensure that the character limit is respected.
} | ||
], | ||
"source": [ | ||
"# To run this notebook, you can use `pip install scikit-learn evaluate` to install additional dependencies out of the PEFT.\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sklearn is no longer needed, is it?
"# To run this notebook, you can use `pip install scikit-learn evaluate` to install additional dependencies out of the PEFT.\n", | |
"# To run this notebook, please run `pip install evaluate` to install additional dependencies not covered by PEFT.\n", |
Hi @BenjaminBossan, you can review the latest version of my PR, titled Best. |
src/peft/tuners/fourierft/model.py
Outdated
@@ -205,6 +205,8 @@ def __getattr__(self, name: str): | |||
try: | |||
return super().__getattr__(name) # defer to nn.Module's logic | |||
except AttributeError: | |||
if name == "base_model": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be if name == "model":
. The idea is to prevent an infinite recursion in the line below in case that self.model
is not yet set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks very much for your instruction, and we have fixed such recursion.
Thanks for fixing the recursion issue. We just merged another method, which results in a merge conflict. Could you please take care of it. The changes should be super easy. |
Yeah, we have resolved the (six) conflicts and you can kindly review the current version. Best. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for these updates, this PR is in an excellent state. Just a small thing I found is that we can extend the test coverage a bit further by adding multi-adapter tests. That's quite easy, just add the following lines:
MULTIPLE_ACTIVE_ADAPTERS_TEST_CASES = [
...
(
"FourierFT Same",
"fourierft",
FourierFTConfig,
{"n_frequency": 10, "target_modules": ["lin0"]},
{"n_frequency": 10, "target_modules": ["lin0"]},
),
(
"FourierFT Different",
"fourierft",
FourierFTConfig,
{"n_frequency": 10, "target_modules": ["lin0"]},
{"n_frequency": 10, "target_modules": ["lin1"]},
),
]
to the list defined here. I already checked this locally and the new tests pass.
These tests have been included for our method, and the PR is ready for review. The new tests passed also in my local machine. Best. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for this great PR, FourierFT looks very promising. All looks good to me, thanks for patiently incorporating my feedback.
Before merging, could you let me know whom to put on the co-author list, I see:
Co-authored-by: Chaos96 <[email protected]>
Co-authored-by: zqgao22 <[email protected]>
Co-authored-by: qichaoswang <[email protected]>
Co-authored-by: zgaoat <[email protected]>
Hi, thank you very much for your recent guidance, and we are glad to make this PR. As for the co-author list, may it include the following four:
Best. |
Paper Link: https://arxiv.org/abs/2405.03003
Thanks for reviewing!