-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding BOFT: Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization #1326
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Thanks a lot for this very extensive PR. I didn't have a closer look yet, but wanted to set the stage by asking some clarifying questions:
|
Hi @BenjaminBossan,
|
Thanks for answering:
I'll take a closer look at the PR in the upcoming days. It could take a bit longer since it's so big :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super good, thanks a lot for your constant work put into BOFT. I only have a few very minor comments and nits left, please check. After this, I'll wait for a second review by my colleagues when they're back in office and then this should be ready to be merged.
## Set up your environment | ||
Start by cloning the PEFT repository: | ||
|
||
```python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
```python | |
```bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
|
||
Set up your environment: install PEFT, and all the required libraries. At the time of writing this guide we recommend installing PEFT from source. | ||
|
||
```python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
```python | |
```bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
|
||
Run evaluation on the sampled images to evaluate the landmark reprojection error: | ||
|
||
```python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
```python | |
```bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
|
||
## Finetune Dreambooth with BOFT | ||
|
||
```python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
```python | |
```bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
@@ -41,4 +41,4 @@ markers = [ | |||
"multi_gpu_tests: tests that run on multiple GPUs", | |||
"regression: whether to run regression suite test", | |||
"bitsandbytes: select bitsandbytes integration tests" | |||
] | |||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get the empty line back? :)
requirements.txt
Outdated
setuptools | ||
Ninja |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this? I think it should be removed. If this is required for tests to pass, you can add it to extras["test"]
in setup.py
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleted.
butterfly_oft_mat_batch = torch.bmm(self.boft_P, butterfly_oft_mat_batch) | ||
butterfly_oft_mat = butterfly_oft_mat_batch[0] | ||
|
||
for i in range(1, butterfly_oft_mat_batch.shape[0]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is still not covered. When we look at the line coverage, we get:
src/peft/tuners/boft/layer.py 465 76 84% 45, 57, 94-96, 100-102, 132, 196, 202-206, 209-216, 219-223, 234, 247, 253, 257, 265, 271, 275, 282, 290, 293, 319, 342, 377, 450, 489, 511-512, 541, 552, 571, 582, 593, 613-614, 644, 659, 665, 676, 682, 687-692, 700, 705-710, 717, 721-725, 751, 817-818, 857, 868, 889, 900, 911, 942-943
As you can see, line 911 is not covered. I also checked locally with the debugger that this line is never run. Could you please check again that the tests you added really cover this line?
@BenjaminBossan Can you check again? Thx. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for your continued effort put into BOFT. This is a great PR with a lot of additions, excellent examples and extensive tests. Thanks also for your patience with my reviews.
The PR is now good to be merged from my point of view. As mentioned, for these big PRs, we should have a second review. Let's add one from @pacman100 or @younesbelkada once they're back in office.
@@ -41,4 +41,4 @@ markers = [ | |||
"multi_gpu_tests: tests that run on multiple GPUs", | |||
"regression: whether to run regression suite test", | |||
"bitsandbytes: select bitsandbytes integration tests" | |||
] | |||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it looks like there is a diff here, but if it's there despite the copy, I guess we can just ignore it.
butterfly_oft_mat_batch = torch.bmm(self.boft_P, butterfly_oft_mat_batch) | ||
butterfly_oft_mat = butterfly_oft_mat_batch[0] | ||
|
||
for i in range(1, butterfly_oft_mat_batch.shape[0]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, the latest test coverage result still shows this line as uncovered (e.g. here). But I'm fine with the testing overall, up to you if you want to check this further or not.
@BenjaminBossan Thank you so much also for your support along the way. This is actually the first time we made a contribution to a public repo from huggingface, we also lacked the experience with how to properly test / check the code formatting. We also want to sincerely thank you for your patience with us. :D |
@BenjaminBossan Hi, just wondering when can we merge BOFT into PEFT? |
Sorry for the delay, @pacman100 plans to review this PR very soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @yfeng95 for the commendable job on adding BOFT with detailed docs, thorough examples and usecases, clear implemenatation with custom CUDA kernels and thorough tests 🔥🚀✨! Left minor nits and a comment related to a test.
tests/test_decoder_models.py
Outdated
"task_type": "CAUSAL_LM", | ||
}, | ||
filter_params_func=skip_adalora_and_gpt2, | ||
filter_params_func=skip_boft_and_gpt2, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should also be skipped for AdaLoRA
as before, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right. Added another function to skip for both configs.
docs/source/conceptual_guides/oft.md
Outdated
|
||
`BOFTConfig` allows you to control how OFT/BOFT is applied to the base model through the following parameters: | ||
|
||
- `boft_block_size`: the BOFT matrix block size across different layers, expressed in `int`. Smaller block size results in sparser update matrices with fewer trainable paramters. **Note**, please choose `boft_block_size` to be dividable to most layer's input dimension (`in_features`), e.g., 4, 8, 16. Also, please only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- `boft_block_size`: the BOFT matrix block size across different layers, expressed in `int`. Smaller block size results in sparser update matrices with fewer trainable paramters. **Note**, please choose `boft_block_size` to be dividable to most layer's input dimension (`in_features`), e.g., 4, 8, 16. Also, please only | |
- `boft_block_size`: the BOFT matrix block size across different layers, expressed in `int`. Smaller block size results in sparser update matrices with fewer trainable paramters. **Note**, please choose `boft_block_size` to be divisible by most layer's input dimension (`in_features`), e.g., 4, 8, 16. Also, please only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
docs/source/conceptual_guides/oft.md
Outdated
|
||
- `boft_block_size`: the BOFT matrix block size across different layers, expressed in `int`. Smaller block size results in sparser update matrices with fewer trainable paramters. **Note**, please choose `boft_block_size` to be dividable to most layer's input dimension (`in_features`), e.g., 4, 8, 16. Also, please only | ||
specify either `boft_block_size` or `boft_block_num`, but not both simultaneously or leaving both to 0, because `boft_block_size` x `boft_block_num` must equal the layer's input dimension. | ||
- `boft_block_num`: the number of BOFT matrix blocks across different layers, expressed in `int`. Fewer blocks result in sparser update matrices with fewer trainable paramters. **Note**, please choose `boft_block_num` to be dividable to most layer's input dimension (`in_features`), e.g., 4, 8, 16. Also, please only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- `boft_block_num`: the number of BOFT matrix blocks across different layers, expressed in `int`. Fewer blocks result in sparser update matrices with fewer trainable paramters. **Note**, please choose `boft_block_num` to be dividable to most layer's input dimension (`in_features`), e.g., 4, 8, 16. Also, please only | |
- `boft_block_num`: the number of BOFT matrix blocks across different layers, expressed in `int`. Fewer blocks result in sparser update matrices with fewer trainable paramters. **Note**, please choose `boft_block_num` to be divisible by most layer's input dimension (`in_features`), e.g., 4, 8, 16. Also, please only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
@pacman100 Hi, thanks for the review. I updated the errors. Best. |
@Zeju1997 could you please run |
Yes, pushed again. |
@yfeng95 @YuliangXiu Thanks a lot for your hard work on BOFT and this very comprehensive PR (and your patience!). We're ready to merge this now. While merging, I saw that the automatically created co-author list was a bit messy, do you have a preference how to adjust that:
|
|
…1326) Implements https://hf.co/papers/2311.06243. --------- Co-authored-by: Zeju Qiu <[email protected]> Co-authored-by: Yuliang Xiu <[email protected]> Co-authored-by: Yao Feng <[email protected]>
…1326) Implements https://hf.co/papers/2311.06243. --------- Co-authored-by: Zeju Qiu <[email protected]> Co-authored-by: Yuliang Xiu <[email protected]> Co-authored-by: Yao Feng <[email protected]>
…1326) Implements https://hf.co/papers/2311.06243. --------- Co-authored-by: Zeju Qiu <[email protected]> Co-authored-by: Yuliang Xiu <[email protected]> Co-authored-by: Yao Feng <[email protected]>
Adding a new fine-tuning method:
BOFT: Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization
Orthogonal Butterfly (BOFT) is a generic method designed for finetuning foundation models. It improves the paramter efficiency of the finetuning paradigm -- Orthogonal Finetuning (OFT), by taking inspiration from Cooley-Tukey fast Fourier transform, showing favorable results across finetuning different foundation models, including large vision transformers, large language models and text-to-image diffusion models.
We've added and tested:
Will add more examples in the near future :)