Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FourierFT Support #1838

Merged
merged 26 commits into from
Jul 9, 2024
Merged

FourierFT Support #1838

merged 26 commits into from
Jul 9, 2024

Conversation

Phoveran
Copy link
Contributor

@Phoveran Phoveran commented Jun 9, 2024

Paper Link: https://arxiv.org/abs/2405.03003

Thanks for reviewing!

@BenjaminBossan
Copy link
Member

Thanks a lot for adding this new method to PEFT. The method looks quite interesting. Based on your name, I assume that you're one of the paper authors.

I haven't done a full review yet. Could you add an example that we can run to see FourierFT in action? Maybe something based on LoRA, so that we can get a comparison of the effectiveness.

For a full PR, we also need to add documentations and tests. Would that be something you could work on? If you need help with those, always feel free to ask.

@Phoveran
Copy link
Contributor Author

Thank you for the valuable comments!
I will be working on examples and documents. For tests, could you please give me some instruction on that?

@BenjaminBossan
Copy link
Member

Thanks a lot. Let's start with the example first and then we can take a look at docs and tests.

Regarding the tests, you could for instance check past PRs that add new methods. E.g. in the VeRA PR, you can see how the tests have been extended.

To get a quick start to testing, I would, however, recommend to start by adding FourierFT to this test matrix:

TEST_CASES = [

This should already cover like 90% of the relevant tests. Then run the tests locally with pytest tests/test_custom_models.py -k "ftfourier" and see if the tests pass.

@Phoveran
Copy link
Contributor Author

Got it. Working on it :D

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@Phoveran
Copy link
Contributor Author

Hi @BenjaminBossan , we have finished the examples, tests and docs. Feel free to reach us if anything is needed!

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@BenjaminBossan
Copy link
Member

Could you please run make style? As is, the CI is failing because of ruff.

@BenjaminBossan
Copy link
Member

Thanks for the updates. Code style checks are still failing, possible you have a different version of ruff on your system (CI uses v0.2.2).

Anyway, it should be easy to fix by either installing the same version of ruff and re-running, or by doing the following:

  • Reset the changes in lycoris_utils.py
  • Reset the changes in router.py (in general, you shouldn't have to update files that your PR didn't touch)
  • This line is too long, put the last word "model" on the next line.

@BenjaminBossan
Copy link
Member

We're still seeing a failure mentioned in this comment:

  • This line is too long, put the last word "model" on the next line.

You can run doc-builder style src/peft tests docs/source --max_len 119 locally to check it.

Also, I updated PEFT to use the latest ruff version 0.4.8. So if you want to use that one instead, feel free to merge the latest main branch and upgrade your local ruff. But that's optional, you can also just keep the PR as is.

@Phoveran
Copy link
Contributor Author

Thanks! I just did not found the doc-builder... Anyway, should be OK now? Thank you a lot for the effort!

@BenjaminBossan
Copy link
Member

I see that you already added a check for the n_frequency argument, nice. However, the tests will still fail because the custom model used for the test is very small. Could you please update the n_frequency argument here to be < 200? Just add an entry to the dict with the arguments.

By the way, you should be able to run the tests locally with pytest tests/ -k fourier. That way, you don't have to wait for the CI to run in order to see if the tests pass.

@Phoveran
Copy link
Contributor Author

Thanks! We have passed the tests. Feel free to reach us if anything is needed!

@BenjaminBossan
Copy link
Member

Thanks for the fixes. There are still some issues with running ruff on the CI, could you please run make style, or, alternatively:

ruff check --fix src tests examples docs
ruff format src tests examples docs
doc-builder style src/peft tests docs/source --max_len 119

The ruff version is v0.4.9.

@zqgao22
Copy link
Contributor

zqgao22 commented Jun 20, 2024

Thanks for the fixes. There are still some issues with running ruff on the CI, could you please run make style, or, alternatively:

ruff check --fix src tests examples docs
ruff format src tests examples docs
doc-builder style src/peft tests docs/source --max_len 119

The ruff version is v0.4.9.

Hi, thanks for the suggestions and we have further fixed style errors based on ruff version 0.4.9.

@BenjaminBossan
Copy link
Member

Did you also run doc-builder style src/peft tests docs/source --max_len 119? It seems like there are still issues.

@zqgao22
Copy link
Contributor

zqgao22 commented Jun 20, 2024

Did you also run doc-builder style src/peft tests docs/source --max_len 119? It seems like there are still issues.

Hi, the newly pushed codes have undergone the doc-builder style check with hf-doc-builder=0.6.0.dev0.
Thanks for your further review.

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for working on the remaining points. The PR is very advanced now and there are only some smaller issues left. Please check my comments.

On top, could you please also enable the adapter deletion tests for FourierFT:

I think with the fix to random_loc_seed that I describe in one of my comments, these tests should pass.

I did not have the opportunity yet to review the notebook yet, will do so next week.

Comment on lines 654 to 656
print(self.other_param_names)
for attr in self.adapter_layer_names + self.other_param_names:
print(attr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete please

src/peft/tuners/fourierft/config.py Show resolved Hide resolved
# Mark the weight as unmerged
self._disable_adapters = False
self.merged_adapters = []
self.fourierft_random_loc_seed = kwargs.pop("random_loc_seed")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.fourierft_random_loc_seed = kwargs.pop("random_loc_seed")
self.fourierft_random_loc_seed = {}

The logic needs to be changed a bit here: As is, there would be only one value for this, even if there are multiple adapters. Instead, this should be a dict with one value per adapter and the adapter_name as key. So basically, treat it the same as e.g. fourierft_scaling. The update_layer method also needs to be updated to set the correct value for random_loc_seed.

src/peft/utils/constants.py Show resolved Hide resolved
@@ -865,7 +884,8 @@ def test_disable_adapters(self, test_name, model_id, config_cls, config_kwargs):
)
model = get_peft_model(model, config)
model.eval()
outputs_before = model(**X)
with model.disable_adapter():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change still needs to be reverted.

return delta_weight


# ------------------------------------------------------------------------------------------
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment can be removed.

Comment on lines 167 to 168
"The initialization of the Fourier weights. Set this to False if the spectrum are initialized to a standard normal distribution."
"Set this to True if the spectrum are initialized to zeros."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"The initialization of the Fourier weights. Set this to False if the spectrum are initialized to a standard normal distribution."
"Set this to True if the spectrum are initialized to zeros."
"The initialization of the Fourier weights. Set this to False if the spectrum should be initialized to a standard normal distribution."
"Set this to True if the spectrum should be initialized to zeros."

Also, out of curiosity: Did you run some tests with weights initialized as zeros? If yes, did you find a noticeable performance drop?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks very much for reviewing. In our latest PR version, we add the recommended tests and correct the errors. Regarding this curiosity, we found that the two initialization schemes both don't have an absolute advantage. But overall, the default Gaussian initialization scheme in our paper (init_weights=False in codes) leads in more cases and converges faster.

Best.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for checking this.

Could you please still update the text to use "should be", which I think is clearer. Please update the docstring above accordingly and ensure that the character limit is respected.

n_frequency: int = 1000,
scaling: float = 150.0,
fan_in_fan_out: bool = False, # Set this to True if the layer to replace stores weight like (fan_in, fan_out)
init_weights: Union[bool, str] = True,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the default be False, like in the config?

@zqgao22
Copy link
Contributor

zqgao22 commented Jul 1, 2024

Thanks a lot for working on the remaining points. The PR is very advanced now and there are only some smaller issues left. Please check my comments.

On top, could you please also enable the adapter deletion tests for FourierFT:

I think with the fix to random_loc_seed that I describe in one of my comments, these tests should pass.

I did not have the opportunity yet to review the notebook yet, will do so next week.

Yes, these two tests are successfully performed after we complete the fix to random_loc_seed.

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for the recent changes. We're 99% there, I only found a few tiny issues left, which I think should be quickly fixed.

The following examples of settings regarding 'n_frequency' can be used as reference for users. For NLU
tasks with RoBERTa-base and RoBERTa-large models, adopting 'n_frequency': 100 can almost achieve similar
results as 'r': 8 in LoRA. For image classification tasks with ViT-base and Vit-large models, adopting
'n_frequency': 3000 can almost achieve similar results as 'r': 16 in LoRA.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for providing this detailed description, which should really help users pick a good value. I would add a little bit extra: As a naive user, when I read that n_frequency=3000 is roughly the same as r=16 in LoRA, I would think that FourierFT is much less parameter efficient, because 3000 >> 16. But of course this is an incorrect impression. So I would add a sentence to clarify the amount of trainable parameters.

src/peft/tuners/fourierft/config.py Show resolved Hide resolved
# Mark the weight as unmerged
self._disable_adapters = False
self.merged_adapters = []
self.fourierft_random_loc_seed = {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this a few lines up to be next to the other fourier-specific params.



class FourierFTLinear(nn.Module, FourierFTLayer):
# Lora implemented in a dense layer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Lora implemented in a dense layer
# FourierFT implemented in a dense layer

"random_loc_seed": fourierft_config.random_loc_seed,
}
kwargs["bias"] = bias
if isinstance(target, FourierFTLinear):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about:

Suggested change
if isinstance(target, FourierFTLinear):
if isinstance(target, FourierFTLayer):

in case that new layer types are added in the future.

src/peft/utils/constants.py Show resolved Hide resolved
"model_name_or_path = \"roberta-base\"\n",
"task = \"mrpc\"\n",
"peft_type = PeftType.FOURIERFT\n",
"device = \"cuda:1\"\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not hard-code this. How about: device = "cuda" if torch.cuda.is_available() else "cpu"?

"peft_type = PeftType.FOURIERFT\n",
"device = \"cuda:1\"\n",
"num_epochs = 5 # for better results, increase this number\n",
"n_frequency = 1000 # for better results, increase this number\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also put scaling here, so that all hyper-params are in one place.

" target_modules=[\"query\", \"value\"],\n",
" scaling = 150.0,\n",
")\n",
"head_lr = 6e-3\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add a comment about these two lr values?

@@ -0,0 +1,554 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's mention that this notebook requires evaluate and scikit-learn to run, as those are not PEFT dependencies.

@BenjaminBossan
Copy link
Member

@zqgao22 @Phoveran Let me know if this is ready for review.

@zqgao22
Copy link
Contributor

zqgao22 commented Jul 2, 2024

@zqgao22 @Phoveran Let me know if this is ready for review.

Yes, this current PR version is ready for review.

Best

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the recent updates, we're almost finished now. Just some small issues left, please check.

Comment on lines 167 to 168
"The initialization of the Fourier weights. Set this to False if the spectrum are initialized to a standard normal distribution."
"Set this to True if the spectrum are initialized to zeros."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for checking this.

Could you please still update the text to use "should be", which I think is clearer. Please update the docstring above accordingly and ensure that the character limit is respected.

}
],
"source": [
"# To run this notebook, you can use `pip install scikit-learn evaluate` to install additional dependencies out of the PEFT.\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sklearn is no longer needed, is it?

Suggested change
"# To run this notebook, you can use `pip install scikit-learn evaluate` to install additional dependencies out of the PEFT.\n",
"# To run this notebook, please run `pip install evaluate` to install additional dependencies not covered by PEFT.\n",

@zqgao22
Copy link
Contributor

zqgao22 commented Jul 3, 2024

Hi @BenjaminBossan, you can review the latest version of my PR, titled small changes, ready for review. Thanks a lot.

Best.

@@ -205,6 +205,8 @@ def __getattr__(self, name: str):
try:
return super().__getattr__(name) # defer to nn.Module's logic
except AttributeError:
if name == "base_model":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be if name == "model":. The idea is to prevent an infinite recursion in the line below in case that self.model is not yet set.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks very much for your instruction, and we have fixed such recursion.

@BenjaminBossan
Copy link
Member

Thanks for fixing the recursion issue. We just merged another method, which results in a merge conflict. Could you please take care of it. The changes should be super easy.

@zqgao22
Copy link
Contributor

zqgao22 commented Jul 5, 2024

Thanks for fixing the recursion issue. We just merged another method, which results in a merge conflict. Could you please take care of it. The changes should be super easy.

Yeah, we have resolved the (six) conflicts and you can kindly review the current version.

Best.

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for these updates, this PR is in an excellent state. Just a small thing I found is that we can extend the test coverage a bit further by adding multi-adapter tests. That's quite easy, just add the following lines:

MULTIPLE_ACTIVE_ADAPTERS_TEST_CASES = [
    ...
    (
        "FourierFT Same",
        "fourierft",
        FourierFTConfig,
        {"n_frequency": 10, "target_modules": ["lin0"]},
        {"n_frequency": 10, "target_modules": ["lin0"]},
    ),
    (
        "FourierFT Different",
        "fourierft",
        FourierFTConfig,
        {"n_frequency": 10, "target_modules": ["lin0"]},
        {"n_frequency": 10, "target_modules": ["lin1"]},
    ),
]

to the list defined here. I already checked this locally and the new tests pass.

@zqgao22
Copy link
Contributor

zqgao22 commented Jul 9, 2024

Thanks for these updates, this PR is in an excellent state. Just a small thing I found is that we can extend the test coverage a bit further by adding multi-adapter tests. That's quite easy, just add the following lines:

MULTIPLE_ACTIVE_ADAPTERS_TEST_CASES = [
    ...
    (
        "FourierFT Same",
        "fourierft",
        FourierFTConfig,
        {"n_frequency": 10, "target_modules": ["lin0"]},
        {"n_frequency": 10, "target_modules": ["lin0"]},
    ),
    (
        "FourierFT Different",
        "fourierft",
        FourierFTConfig,
        {"n_frequency": 10, "target_modules": ["lin0"]},
        {"n_frequency": 10, "target_modules": ["lin1"]},
    ),
]

to the list defined here. I already checked this locally and the new tests pass.

These tests have been included for our method, and the PR is ready for review. The new tests passed also in my local machine.

Best.

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this great PR, FourierFT looks very promising. All looks good to me, thanks for patiently incorporating my feedback.

Before merging, could you let me know whom to put on the co-author list, I see:

Co-authored-by: Chaos96 <[email protected]>
Co-authored-by: zqgao22 <[email protected]>
Co-authored-by: qichaoswang <[email protected]>
Co-authored-by: zgaoat <[email protected]>

@zqgao22
Copy link
Contributor

zqgao22 commented Jul 9, 2024

Before merging, could you let me know whom to put on the co-author list, I see:

Hi, thank you very much for your recent guidance, and we are glad to make this PR.

As for the co-author list, may it include the following four:

  1. zqgao22 [email protected]
  2. Chaos96 [email protected]
  3. Phoveran [email protected]
  4. DSAILatHKUST [email protected]

Best.

@BenjaminBossan BenjaminBossan merged commit e72a96f into huggingface:main Jul 9, 2024
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants