Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding BOFT: Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization #1326

Merged
merged 94 commits into from
Apr 12, 2024

Conversation

yfeng95
Copy link
Contributor

@yfeng95 yfeng95 commented Jan 7, 2024

Adding a new fine-tuning method:
BOFT: Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization
Orthogonal Butterfly (BOFT) is a generic method designed for finetuning foundation models. It improves the paramter efficiency of the finetuning paradigm -- Orthogonal Finetuning (OFT), by taking inspiration from Cooley-Tukey fast Fourier transform, showing favorable results across finetuning different foundation models, including large vision transformers, large language models and text-to-image diffusion models.

We've added and tested:

  • boft implementation in peft src
  • test modules
  • examples for Text-to-Image Generation
  • docs for boft

Will add more examples in the near future :)

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@BenjaminBossan
Copy link
Member

Thanks a lot for this very extensive PR. I didn't have a closer look yet, but wanted to set the stage by asking some clarifying questions:

  1. You probably considered this, but just in case: Would it be possible to add this as an extension to the existing OFT adapter in PEFT instead of a completely separate method?
  2. You added extensive examples, but AFAICT no further docs or explanations. Could you give a quick overview of those different examples so that we know what we're looking at? Is this all new code or is at adapted from somewhere?
  3. It looks like you added custom cuda kernels. Would those not require extra steps to ensure that they're distributed with the package when users install it?

@yfeng95
Copy link
Contributor Author

yfeng95 commented Jan 8, 2024

Hi @BenjaminBossan,
Great thanks for checking this! For your questions:

  1. Indeed we previously considered this, but later realized that the implementations of BOFT and OFT are very different, so decided to pr as a completely separate method.
  2. We have docs here: 5566e48 The code is adapted from other examples. Please let us know if anything else is missing :)
  3. Yes, it does NOT require extra steps for installation. Only when people running BOFT, it will automatically compile the cuda kernels.

@BenjaminBossan
Copy link
Member

Thanks for answering:

  1. I thought so, thanks for confirming.
  2. I missed those, thanks. Regarding the examples being adapted, if not already done, could you please add references to where that code is being adapted from?
  3. I see. Are you sure that when a package is built (python setup.py sdist), it will automatically include the non-Python files? I thought this was not the case.

I'll take a closer look at the PR in the upcoming days. It could take a bit longer since it's so big :)

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super good, thanks a lot for your constant work put into BOFT. I only have a few very minor comments and nits left, please check. After this, I'll wait for a second review by my colleagues when they're back in office and then this should be ready to be merged.

## Set up your environment
Start by cloning the PEFT repository:

```python
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
```python
```bash

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.


Set up your environment: install PEFT, and all the required libraries. At the time of writing this guide we recommend installing PEFT from source.

```python
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
```python
```bash

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.


Run evaluation on the sampled images to evaluate the landmark reprojection error:

```python
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
```python
```bash

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.


## Finetune Dreambooth with BOFT

```python
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
```python
```bash

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@@ -41,4 +41,4 @@ markers = [
"multi_gpu_tests: tests that run on multiple GPUs",
"regression: whether to run regression suite test",
"bitsandbytes: select bitsandbytes integration tests"
]
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get the empty line back? :)

requirements.txt Outdated
setuptools
Ninja
Copy link
Member

@BenjaminBossan BenjaminBossan Mar 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this? I think it should be removed. If this is required for tests to pass, you can add it to extras["test"] in setup.py.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted.

tests/test_custom_models.py Show resolved Hide resolved
butterfly_oft_mat_batch = torch.bmm(self.boft_P, butterfly_oft_mat_batch)
butterfly_oft_mat = butterfly_oft_mat_batch[0]

for i in range(1, butterfly_oft_mat_batch.shape[0]):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is still not covered. When we look at the line coverage, we get:

src/peft/tuners/boft/layer.py 465 76 84% 45, 57, 94-96, 100-102, 132, 196, 202-206, 209-216, 219-223, 234, 247, 253, 257, 265, 271, 275, 282, 290, 293, 319, 342, 377, 450, 489, 511-512, 541, 552, 571, 582, 593, 613-614, 644, 659, 665, 676, 682, 687-692, 700, 705-710, 717, 721-725, 751, 817-818, 857, 868, 889, 900, 911, 942-943

As you can see, line 911 is not covered. I also checked locally with the debugger that this line is never run. Could you please check again that the tests you added really cover this line?

@Zeju1997
Copy link
Contributor

@BenjaminBossan Can you check again? Thx.

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for your continued effort put into BOFT. This is a great PR with a lot of additions, excellent examples and extensive tests. Thanks also for your patience with my reviews.

The PR is now good to be merged from my point of view. As mentioned, for these big PRs, we should have a second review. Let's add one from @pacman100 or @younesbelkada once they're back in office.

@@ -41,4 +41,4 @@ markers = [
"multi_gpu_tests: tests that run on multiple GPUs",
"regression: whether to run regression suite test",
"bitsandbytes: select bitsandbytes integration tests"
]
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it looks like there is a diff here, but if it's there despite the copy, I guess we can just ignore it.

butterfly_oft_mat_batch = torch.bmm(self.boft_P, butterfly_oft_mat_batch)
butterfly_oft_mat = butterfly_oft_mat_batch[0]

for i in range(1, butterfly_oft_mat_batch.shape[0]):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, the latest test coverage result still shows this line as uncovered (e.g. here). But I'm fine with the testing overall, up to you if you want to check this further or not.

@Zeju1997
Copy link
Contributor

@BenjaminBossan Thank you so much also for your support along the way. This is actually the first time we made a contribution to a public repo from huggingface, we also lacked the experience with how to properly test / check the code formatting. We also want to sincerely thank you for your patience with us. :D

@Zeju1997
Copy link
Contributor

Zeju1997 commented Apr 8, 2024

@BenjaminBossan Hi, just wondering when can we merge BOFT into PEFT?

@BenjaminBossan
Copy link
Member

Sorry for the delay, @pacman100 plans to review this PR very soon.

Copy link
Contributor

@pacman100 pacman100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @yfeng95 for the commendable job on adding BOFT with detailed docs, thorough examples and usecases, clear implemenatation with custom CUDA kernels and thorough tests 🔥🚀✨! Left minor nits and a comment related to a test.

"task_type": "CAUSAL_LM",
},
filter_params_func=skip_adalora_and_gpt2,
filter_params_func=skip_boft_and_gpt2,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should also be skipped for AdaLoRA as before, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right. Added another function to skip for both configs.


`BOFTConfig` allows you to control how OFT/BOFT is applied to the base model through the following parameters:

- `boft_block_size`: the BOFT matrix block size across different layers, expressed in `int`. Smaller block size results in sparser update matrices with fewer trainable paramters. **Note**, please choose `boft_block_size` to be dividable to most layer's input dimension (`in_features`), e.g., 4, 8, 16. Also, please only
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `boft_block_size`: the BOFT matrix block size across different layers, expressed in `int`. Smaller block size results in sparser update matrices with fewer trainable paramters. **Note**, please choose `boft_block_size` to be dividable to most layer's input dimension (`in_features`), e.g., 4, 8, 16. Also, please only
- `boft_block_size`: the BOFT matrix block size across different layers, expressed in `int`. Smaller block size results in sparser update matrices with fewer trainable paramters. **Note**, please choose `boft_block_size` to be divisible by most layer's input dimension (`in_features`), e.g., 4, 8, 16. Also, please only

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.


- `boft_block_size`: the BOFT matrix block size across different layers, expressed in `int`. Smaller block size results in sparser update matrices with fewer trainable paramters. **Note**, please choose `boft_block_size` to be dividable to most layer's input dimension (`in_features`), e.g., 4, 8, 16. Also, please only
specify either `boft_block_size` or `boft_block_num`, but not both simultaneously or leaving both to 0, because `boft_block_size` x `boft_block_num` must equal the layer's input dimension.
- `boft_block_num`: the number of BOFT matrix blocks across different layers, expressed in `int`. Fewer blocks result in sparser update matrices with fewer trainable paramters. **Note**, please choose `boft_block_num` to be dividable to most layer's input dimension (`in_features`), e.g., 4, 8, 16. Also, please only
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `boft_block_num`: the number of BOFT matrix blocks across different layers, expressed in `int`. Fewer blocks result in sparser update matrices with fewer trainable paramters. **Note**, please choose `boft_block_num` to be dividable to most layer's input dimension (`in_features`), e.g., 4, 8, 16. Also, please only
- `boft_block_num`: the number of BOFT matrix blocks across different layers, expressed in `int`. Fewer blocks result in sparser update matrices with fewer trainable paramters. **Note**, please choose `boft_block_num` to be divisible by most layer's input dimension (`in_features`), e.g., 4, 8, 16. Also, please only

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@Zeju1997
Copy link
Contributor

@pacman100 Hi, thanks for the review. I updated the errors. Best.

@BenjaminBossan
Copy link
Member

@Zeju1997 could you please run make style?

@Zeju1997
Copy link
Contributor

@Zeju1997 could you please run make style?

Yes, pushed again.

@BenjaminBossan
Copy link
Member

BenjaminBossan commented Apr 12, 2024

@yfeng95 @YuliangXiu Thanks a lot for your hard work on BOFT and this very comprehensive PR (and your patience!). We're ready to merge this now. While merging, I saw that the automatically created co-author list was a bit messy, do you have a preference how to adjust that:

Co-authored-by: Zeju1997 <[email protected]>
Co-authored-by: Zeju1997 <[email protected]>
Co-authored-by: Yuliang Xiu <[email protected]>
Co-authored-by: Zeju Qiu <[email protected]>
Co-authored-by: Zeju1997 <[email protected]>

@YuliangXiu
Copy link
Contributor

YuliangXiu commented Apr 12, 2024

Co-authored-by: Zeju Qiu <[email protected]>
Co-authored-by: Yuliang Xiu <[email protected]>
Co-authored-by: Yao Feng <[email protected]>

@BenjaminBossan BenjaminBossan merged commit 8111699 into huggingface:main Apr 12, 2024
14 checks passed
DTennant pushed a commit to DTennant/peft that referenced this pull request Apr 16, 2024
DTennant pushed a commit to DTennant/peft that referenced this pull request Apr 19, 2024
DTennant pushed a commit to DTennant/peft that referenced this pull request Apr 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants