Adding BOFT: Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization #1326

yfeng95 · 2024-01-07T01:35:45Z

Adding a new fine-tuning method:
BOFT: Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization
Orthogonal Butterfly (BOFT) is a generic method designed for finetuning foundation models. It improves the paramter efficiency of the finetuning paradigm -- Orthogonal Finetuning (OFT), by taking inspiration from Cooley-Tukey fast Fourier transform, showing favorable results across finetuning different foundation models, including large vision transformers, large language models and text-to-image diffusion models.

We've added and tested:

boft implementation in peft src
test modules
examples for Text-to-Image Generation
docs for boft

Will add more examples in the near future :)

review-notebook-app · 2024-01-07T01:35:49Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

BenjaminBossan · 2024-01-08T11:38:57Z

Thanks a lot for this very extensive PR. I didn't have a closer look yet, but wanted to set the stage by asking some clarifying questions:

You probably considered this, but just in case: Would it be possible to add this as an extension to the existing OFT adapter in PEFT instead of a completely separate method?
You added extensive examples, but AFAICT no further docs or explanations. Could you give a quick overview of those different examples so that we know what we're looking at? Is this all new code or is at adapted from somewhere?
It looks like you added custom cuda kernels. Would those not require extra steps to ensure that they're distributed with the package when users install it?

yfeng95 · 2024-01-08T15:10:18Z

Hi @BenjaminBossan,
Great thanks for checking this! For your questions:

Indeed we previously considered this, but later realized that the implementations of BOFT and OFT are very different, so decided to pr as a completely separate method.
We have docs here: 5566e48 The code is adapted from other examples. Please let us know if anything else is missing :)
Yes, it does NOT require extra steps for installation. Only when people running BOFT, it will automatically compile the cuda kernels.

BenjaminBossan · 2024-01-08T15:40:02Z

Thanks for answering:

I thought so, thanks for confirming.
I missed those, thanks. Regarding the examples being adapted, if not already done, could you please add references to where that code is being adapted from?
I see. Are you sure that when a package is built (python setup.py sdist), it will automatically include the non-Python files? I thought this was not the case.

I'll take a closer look at the PR in the upcoming days. It could take a bit longer since it's so big :)

BenjaminBossan

Super good, thanks a lot for your constant work put into BOFT. I only have a few very minor comments and nits left, please check. After this, I'll wait for a second review by my colleagues when they're back in office and then this should be ready to be merged.

BenjaminBossan · 2024-03-25T12:42:36Z

examples/boft_controlnet/boft_controlnet.md

+## Set up your environment
+Start by cloning the PEFT repository:
+
+```python


Suggested change

```python

```bash

BenjaminBossan · 2024-03-25T12:42:51Z

examples/boft_controlnet/boft_controlnet.md

+
+Set up your environment: install PEFT, and all the required libraries. At the time of writing this guide we recommend installing PEFT from source.
+
+```python


Suggested change

```python

```bash

BenjaminBossan · 2024-03-25T12:43:27Z

examples/boft_controlnet/boft_controlnet.md

+
+Run evaluation on the sampled images to evaluate the landmark reprojection error:
+
+```python


Suggested change

```python

```bash

BenjaminBossan · 2024-03-25T12:44:57Z

examples/boft_dreambooth/boft_dreambooth.md

+
+## Finetune Dreambooth with BOFT
+
+```python


Suggested change

```python

```bash

BenjaminBossan · 2024-03-25T12:46:16Z

pyproject.toml

@@ -41,4 +41,4 @@ markers = [
    "multi_gpu_tests: tests that run on multiple GPUs",
    "regression: whether to run regression suite test",
    "bitsandbytes: select bitsandbytes integration tests"
-]
+]


Can we get the empty line back? :)

BenjaminBossan · 2024-03-25T12:47:41Z

requirements.txt

+setuptools
+Ninja


Do we need this? I think it should be removed. If this is required for tests to pass, you can add it to extras["test"] in setup.py.

tests/test_custom_models.py

BenjaminBossan · 2024-03-25T12:55:38Z

src/peft/tuners/boft/layer.py

+                butterfly_oft_mat_batch = torch.bmm(self.boft_P, butterfly_oft_mat_batch)
+                butterfly_oft_mat = butterfly_oft_mat_batch[0]
+
+                for i in range(1, butterfly_oft_mat_batch.shape[0]):


This line is still not covered. When we look at the line coverage, we get:

src/peft/tuners/boft/layer.py 465 76 84% 45, 57, 94-96, 100-102, 132, 196, 202-206, 209-216, 219-223, 234, 247, 253, 257, 265, 271, 275, 282, 290, 293, 319, 342, 377, 450, 489, 511-512, 541, 552, 571, 582, 593, 613-614, 644, 659, 665, 676, 682, 687-692, 700, 705-710, 717, 721-725, 751, 817-818, 857, 868, 889, 900, 911, 942-943

As you can see, line 911 is not covered. I also checked locally with the debugger that this line is never run. Could you please check again that the tests you added really cover this line?

Zeju1997 · 2024-03-25T16:08:59Z

@BenjaminBossan Can you check again? Thx.

BenjaminBossan

Thanks a lot for your continued effort put into BOFT. This is a great PR with a lot of additions, excellent examples and extensive tests. Thanks also for your patience with my reviews.

The PR is now good to be merged from my point of view. As mentioned, for these big PRs, we should have a second review. Let's add one from @pacman100 or @younesbelkada once they're back in office.

BenjaminBossan · 2024-03-25T16:52:59Z

pyproject.toml

@@ -41,4 +41,4 @@ markers = [
    "multi_gpu_tests: tests that run on multiple GPUs",
    "regression: whether to run regression suite test",
    "bitsandbytes: select bitsandbytes integration tests"
-]
+]


Well, it looks like there is a diff here, but if it's there despite the copy, I guess we can just ignore it.

BenjaminBossan · 2024-03-25T16:57:21Z

src/peft/tuners/boft/layer.py

+                butterfly_oft_mat_batch = torch.bmm(self.boft_P, butterfly_oft_mat_batch)
+                butterfly_oft_mat = butterfly_oft_mat_batch[0]
+
+                for i in range(1, butterfly_oft_mat_batch.shape[0]):


Hmm, the latest test coverage result still shows this line as uncovered (e.g. here). But I'm fine with the testing overall, up to you if you want to check this further or not.

Zeju1997 · 2024-03-25T17:05:57Z

@BenjaminBossan Thank you so much also for your support along the way. This is actually the first time we made a contribution to a public repo from huggingface, we also lacked the experience with how to properly test / check the code formatting. We also want to sincerely thank you for your patience with us. :D

Zeju1997 · 2024-04-08T19:28:02Z

@BenjaminBossan Hi, just wondering when can we merge BOFT into PEFT?

BenjaminBossan · 2024-04-09T09:52:47Z

Sorry for the delay, @pacman100 plans to review this PR very soon.

pacman100

Thank you @yfeng95 for the commendable job on adding BOFT with detailed docs, thorough examples and usecases, clear implemenatation with custom CUDA kernels and thorough tests 🔥🚀✨! Left minor nits and a comment related to a test.

pacman100 · 2024-04-10T11:45:36Z

tests/test_decoder_models.py

                "task_type": "CAUSAL_LM",
            },
-            filter_params_func=skip_adalora_and_gpt2,
+            filter_params_func=skip_boft_and_gpt2,


this should also be skipped for AdaLoRA as before, right?

Yes, you are right. Added another function to skip for both configs.

pacman100 · 2024-04-10T15:05:16Z

docs/source/conceptual_guides/oft.md

+
+`BOFTConfig` allows you to control how OFT/BOFT is applied to the base model through the following parameters:
+
+- `boft_block_size`: the BOFT matrix block size across different layers, expressed in `int`. Smaller block size results in sparser update matrices with fewer trainable paramters. **Note**, please choose `boft_block_size` to be dividable to most layer's input dimension (`in_features`), e.g., 4, 8, 16. Also, please only 


Suggested change

- `boft_block_size`: the BOFT matrix block size across different layers, expressed in `int`. Smaller block size results in sparser update matrices with fewer trainable paramters. **Note**, please choose `boft_block_size` to be dividable to most layer's input dimension (`in_features`), e.g., 4, 8, 16. Also, please only

- `boft_block_size`: the BOFT matrix block size across different layers, expressed in `int`. Smaller block size results in sparser update matrices with fewer trainable paramters. **Note**, please choose `boft_block_size` to be divisible by most layer's input dimension (`in_features`), e.g., 4, 8, 16. Also, please only

pacman100 · 2024-04-10T15:05:33Z

docs/source/conceptual_guides/oft.md

+
+- `boft_block_size`: the BOFT matrix block size across different layers, expressed in `int`. Smaller block size results in sparser update matrices with fewer trainable paramters. **Note**, please choose `boft_block_size` to be dividable to most layer's input dimension (`in_features`), e.g., 4, 8, 16. Also, please only 
+specify either `boft_block_size` or `boft_block_num`, but not both simultaneously or leaving both to 0, because `boft_block_size` x `boft_block_num` must equal the layer's input dimension.
+- `boft_block_num`: the number of BOFT matrix blocks across different layers, expressed in `int`. Fewer blocks result in sparser update matrices with fewer trainable paramters. **Note**, please choose `boft_block_num` to be dividable to most layer's input dimension (`in_features`), e.g., 4, 8, 16. Also, please only 


Suggested change

- `boft_block_num`: the number of BOFT matrix blocks across different layers, expressed in `int`. Fewer blocks result in sparser update matrices with fewer trainable paramters. **Note**, please choose `boft_block_num` to be dividable to most layer's input dimension (`in_features`), e.g., 4, 8, 16. Also, please only

- `boft_block_num`: the number of BOFT matrix blocks across different layers, expressed in `int`. Fewer blocks result in sparser update matrices with fewer trainable paramters. **Note**, please choose `boft_block_num` to be divisible by most layer's input dimension (`in_features`), e.g., 4, 8, 16. Also, please only

Zeju1997 · 2024-04-10T21:25:46Z

@pacman100 Hi, thanks for the review. I updated the errors. Best.

BenjaminBossan · 2024-04-11T09:07:10Z

@Zeju1997 could you please run make style?

Zeju1997 · 2024-04-11T14:09:10Z

@Zeju1997 could you please run make style?

Yes, pushed again.

BenjaminBossan · 2024-04-12T09:28:49Z

@yfeng95 @YuliangXiu Thanks a lot for your hard work on BOFT and this very comprehensive PR (and your patience!). We're ready to merge this now. While merging, I saw that the automatically created co-author list was a bit messy, do you have a preference how to adjust that:

Co-authored-by: Zeju1997 <[email protected]>
Co-authored-by: Zeju1997 <[email protected]>
Co-authored-by: Yuliang Xiu <[email protected]>
Co-authored-by: Zeju Qiu <[email protected]>
Co-authored-by: Zeju1997 <[email protected]>

YuliangXiu · 2024-04-12T10:12:58Z

Co-authored-by: Zeju Qiu <[email protected]>
Co-authored-by: Yuliang Xiu <[email protected]>
Co-authored-by: Yao Feng <[email protected]>

…1326) Implements https://hf.co/papers/2311.06243. --------- Co-authored-by: Zeju Qiu <[email protected]> Co-authored-by: Yuliang Xiu <[email protected]> Co-authored-by: Yao Feng <[email protected]>

Zeju1997 and others added 24 commits November 6, 2023 17:19

update peft for boft release

1d35f1e

finished boft implementation, testing testing still required

6126b88

update for peft release

c3c9d3f

update peft boft for release

b7e33ce

update peft boft for release

1dbcd92

update peft boft for release

c1079e8

update

cdacbc9

update

4678666

update

1864b87

update

ba52330

update

af3f02a

update

79ee1e2

update

fbdc392

update inference landmakr

9e0f34e

update

0370f8d

update boft controlnet eval

4e7a678

update

478eddb

update readme, fix src

3b18cef

add a bit test

bb946e5

update

e50a00c

Merge branch 'main' of https://github.com/yfeng95/peft

4b37d4e

update

d26582c

update for upstream merge

96a06f0

update for upstream merge

7dade8a

YuliangXiu approved these changes Jan 7, 2024

View reviewed changes

update the docs of OFT, add the implementtion of BOFT

1336eed

Zeju1997 added 5 commits March 25, 2024 00:52

Merged changes from upstream/main

e23ff03

Merged changes from upstream/main

9cc42d0

updating boft for test decoder models

7943a9a

update fdb cuda for passing tests

1deefbb

fixing errors in make quality

9891c58

BenjaminBossan reviewed Mar 25, 2024

View reviewed changes

Zeju1997 added 2 commits March 25, 2024 16:43

adding additional tests for boft conv2d

9d88096

update make style / quality

4d9b22c

update small error

a3bd8af

BenjaminBossan approved these changes Mar 25, 2024

View reviewed changes

pacman100 reviewed Apr 10, 2024

View reviewed changes

pacman100 approved these changes Apr 10, 2024

View reviewed changes

update minor errors for BOFT release

13d7ec9

Merge branch 'main' into main

c8527b9

running make style

8df1134

BenjaminBossan merged commit 8111699 into huggingface:main Apr 12, 2024
14 checks passed

YuliangXiu mentioned this pull request Apr 12, 2024

update figure assets of BOFT #1642

Merged

sayakpaul mentioned this pull request Apr 15, 2024

Failure with source installations astral-sh/uv#3031

Closed


		Set up your environment: install PEFT, and all the required libraries. At the time of writing this guide we recommend installing PEFT from source.

		```python


		Run evaluation on the sampled images to evaluate the landmark reprojection error:

		```python


		`BOFTConfig` allows you to control how OFT/BOFT is applied to the base model through the following parameters:

		- `boft_block_size`: the BOFT matrix block size across different layers, expressed in `int`. Smaller block size results in sparser update matrices with fewer trainable paramters. Note, please choose `boft_block_size` to be dividable to most layer's input dimension (`in_features`), e.g., 4, 8, 16. Also, please only

	- `boft_block_num`: the number of BOFT matrix blocks across different layers, expressed in `int`. Fewer blocks result in sparser update matrices with fewer trainable paramters. Note, please choose `boft_block_num` to be dividable to most layer's input dimension (`in_features`), e.g., 4, 8, 16. Also, please only
	- `boft_block_num`: the number of BOFT matrix blocks across different layers, expressed in `int`. Fewer blocks result in sparser update matrices with fewer trainable paramters. Note, please choose `boft_block_num` to be divisible by most layer's input dimension (`in_features`), e.g., 4, 8, 16. Also, please only

		setuptools
		Ninja

Adding BOFT: Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization #1326

Adding BOFT: Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization #1326

Conversation

yfeng95 commented Jan 7, 2024

review-notebook-app bot commented Jan 7, 2024

BenjaminBossan commented Jan 8, 2024

yfeng95 commented Jan 8, 2024

BenjaminBossan commented Jan 8, 2024

BenjaminBossan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BenjaminBossan Mar 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Zeju1997 commented Mar 25, 2024

BenjaminBossan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Zeju1997 commented Mar 25, 2024

Zeju1997 commented Apr 8, 2024

BenjaminBossan commented Apr 9, 2024

pacman100 left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Zeju1997 commented Apr 10, 2024

BenjaminBossan commented Apr 11, 2024

Zeju1997 commented Apr 11, 2024

BenjaminBossan commented Apr 12, 2024 • edited Loading

YuliangXiu commented Apr 12, 2024 • edited Loading

BenjaminBossan Mar 25, 2024 •

edited

Loading

pacman100 left a comment •

edited

Loading

BenjaminBossan commented Apr 12, 2024 •

edited

Loading

YuliangXiu commented Apr 12, 2024 •

edited

Loading