Add support for layer replication in LoRA #1368

siddartha-RE · 2024-01-18T05:52:21Z

This PR adds the ability to duplicate layers in a model according to a layer map
and then fine tune separate lora adapters for the layers post duplication. This
allows expanding a model to larger model and then fine tuning that model with
very little extra memory compared to the original smaller model.

younesbelkada

Thanks @siddartha-RE for this feature! Can you elaborate more on its usage by providing some possible snippets of the API? Would you be happy to extend the current documentation and test suite to add that feature as well?

siddartha-RE · 2024-01-22T15:22:46Z

Hi Younes, definitely, sorry I should have marked it draft. I am in the process of training and uploading a model to HF to demonstrate its use. I can use the same config to build out an example of usage and figure out testing. I will get it done in a day or two. I will also elaborate on the PR description. If it is not clear the basic idea is to be able to fine tune models with layers replicated (like has done with SOLAR / Goliath / ...) but with separate lora adjustments for layers that have been duplicated. This allows training very large models (for example 120B) with memory usage close to that of the base 70B model. - Siddartha

…

On Mon, Jan 22, 2024 at 8:52 AM Younes Belkada ***@***.***> wrote: ***@***.**** commented on this pull request. Thanks @siddartha-RE <https://github.com/siddartha-RE> for this feature! Can you elaborate more on its usage by providing some possible snippets of the API? Would you be happy to extend the current documentation and test suite to add that feature as well? — Reply to this email directly, view it on GitHub <#1368 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ANENV55D3WRSWBGTWIX574TYPZ4JNAVCNFSM6AAAAABB7WCNZ6VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMYTQMZWGY2TMNRQHA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

siddartha-RE · 2024-01-24T23:29:43Z

I have created an example model using this PR:
https://huggingface.co/abacusai/Fewshot-Metamath-OrcaVicuna-Mistral-10B
and the model description explains the motivation for this PR.

I have also added a test to verify and added documentation in the code. I am not quite sure where to add additional docs. Should I modify one of the existing task docs that discuss LoRA (did not see an obvious candidate) or should I check a new example under examples/causal_language_modelling?

pacman100

Thank you @siddartha-RE, I like the idea, the implementation is clear and I went over the POC experiment you have shared. I think this is a good feature to have. I would like to know the thoughts of @BenjaminBossan and @younesbelkada on this.

siddartha-RE · 2024-01-31T20:59:55Z

Following up here with an ablation study:

Model	Trainable Params	Train Loss	Eval Loss	GSM8K	TruthfulQA
Mistral 7B	0	-	-	0.374	0.426
Mistral 10B	0	-	-	0.290	0.407
Mistral 7B + LoRA r=12	31M	0.412	0.366	0.514	0.499
Mistral 10B + LoRA r=8	31M	0.401	0.363	0.663	0.540

The 10B model was built by layer stacking the 7B model with the following config [[0, 16], [8, 24], [24, 32]] for a total of 48 layers initialized from the 32 layers in the original model. We ran LoRA finetuning on the original 32 layer model and 48 layer model (using this PR) and adjusting the r parameters so that trainable parameters were exactly the same in both cases. You can see that the 48 layers model clearly achieved better performance.

I think this is a pretty compelling approach for training large models without incurring the full memory cost.

BenjaminBossan

Thanks for providing this interesting new technique.

From my understanding of reading the code, this effectively modifies the base model by copying a couple of existing layers (with shared weights). Then, by applying different LoRA weights on these copies, it is ensured that these copied layers can learn different things.

From my understanding, the copy-functionality is not tied to LoRA at all. The same principle could be applied to IA³, LoHa, etc. Therefore, I think it would make sense to remove this functionality from LoraModel and make it a generic utility function. In practice, the usage would look something like this:

base_model = AutoModelForCausalLM.from_pretrained(...)
layer_replication_map = [(0, 16), (8, 24), (16, 32)]
extended_model = replicate_layers(base_model, layer_replication_map)
config = LoraConfig(...)  # or IA3Config or ...
peft_model = get_peft_model(extended_model, config)

The advantage of this approach would be to make it generic and not overburden the existing LoRA implementation with something that is not fundamentally related to LoRA.

The disadvantage would be:

as a user, I have to run the same extension code again when loading the trained adapter
as it's not part of the LoraConfig, we cannot easily check if merging is allowed

IMO, this is still a better approach, but please tell me what you think about my proposal.

siddartha-RE · 2024-02-08T16:40:23Z

IMO, this is still a better approach, but please tell me what you think about my proposal.

I did consider how to keep it independent of LoRA. The thought I had was to make it just a separate adapter that can then be used with the multi adapter support that has recently landed to allow mixing with other adapters. The reasons I did not do it is because multi adapter seems to still be evolving and I didn't know how many adapters this would actually realistically work with. Second, I thought it was better to see if this method was generally interesting to the community before trying a more complex implementation.

One definite goal is to allow this to be loaded without any custom code so I would like the solution to work with any code that current just works with PeftConfig.from_pretrained(...)

BenjaminBossan · 2024-02-09T11:21:22Z

I did consider how to keep it independent of LoRA. The thought I had was to make it just a separate adapter that can then be used with the multi adapter support that has recently landed to allow mixing with other adapters.

I wonder if we even need a full new adapter class for this feature. What if we just make it a function that can be called on the base model before applying LoRA or another method?

siddartha-RE · 2024-02-09T14:35:11Z

But say I want to submit a model to HF. How would I push a config that would allow people to load the model without custom code to invoke the hook.

…

On Fri, Feb 9, 2024, 5:21 AM Benjamin Bossan ***@***.***> wrote: I did consider how to keep it independent of LoRA. The thought I had was to make it just a separate adapter that can then be used with the multi adapter support that has recently landed to allow mixing with other adapters. I wonder if we even need a full new adapter class for this feature. What if we just make it a function that can be called on the base model before applying LoRA or another method? — Reply to this email directly, view it on GitHub <#1368 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ANENV52RDMYWK3VRSRANHP3YSYBD7AVCNFSM6AAAAABB7WCNZ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZVG42DONRSGI> . You are receiving this because you were mentioned.Message ID: ***@***.***>

BenjaminBossan · 2024-02-09T14:56:47Z

But say I want to submit a model to HF. How would I push a config that would allow people to load the model without custom code to invoke the hook.

I definitely see your point of wanting to avoid requiring custom code. Loading from HF Hub would still work for the full model state dict, right? But of course, that would be a huge waste of space, so it'd be nice to have another solution. Still, I don't think the solution is to bundle this with LoRA, as the two are independent.

I could see the argument for having a separate adapter type with its own config, which could be combined with LoRA in a separate step, but I'm not sure if that's really better than having custom code, which comes down to a single function call 🤔

At the end of the day, if the other maintainers would rather have this bundled with LoRA, I can certainly be convinced :) My personal preference is still a separate function call though.

siddartha-RE · 2024-02-09T16:07:50Z

I can certainly factor this into a separately usable function, but perhaps also leave it as a functionality in LoRA which I see as one of the primary ways to leverage this efficiently. Keep in mind its not just the issue of downloading the weights. This approach even keeps GPU memory usage almost the same as the base model. Effectively we have trained 120B models using the memory of a 70B model. Because the lora adapters are distinct for all the layers in the effectively 120B model it is really like have double the number of independent layers.

…

On Fri, Feb 9, 2024 at 6:57 AM Benjamin Bossan ***@***.***> wrote: But say I want to submit a model to HF. How would I push a config that would allow people to load the model without custom code to invoke the hook. I definitely see your point of wanting to avoid requiring custom code. Loading from HF Hub would still work for the full model state dict, right? But of course, that would be a huge waste of space, so it'd be nice to have another solution. Still, I don't think the solution is to bundle this with LoRA, as the two are independent. I could see the argument for having a separate adapter type with its own config, which could be combined with LoRA in a separate step, but I'm not sure if that's really better than having custom code, which comes down to a single function call 🤔 At the end of the day, if the other maintainers would rather have this bundled with LoRA, I can certainly be convinced :) My personal preference is still a separate function call though. — Reply to this email directly, view it on GitHub <#1368 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ANENV552HFR6XXSR4IOVEYDYSY2LZAVCNFSM6AAAAABB7WCNZ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZWGA3TQOBXGA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

BenjaminBossan · 2024-02-09T16:14:38Z

I can certainly factor this into a separately usable function, but perhaps also leave it as a functionality in LoRA which I see as one of the primary ways to leverage this efficiently.

This may be a good compromise. As a separate function, it should also make it easier to add it to the other methods. I'm a little wary that we're overburdening the LoRA options, which are already very many, with the features not being very modular or easy to compose, but that's an issue for another day.

Keep in mind its not just the issue of downloading the weights. This approach even keeps GPU memory usage almost the same as the base model. Effectively we have trained 120B models using the memory of a 70B model. Because the lora adapters are distinct for all the layers in the effectively 120B model it is really like have double the number of independent layers.

Yes, this is great, but it's not directly related to integrating this into LoraModel, right?

siddartha-RE · 2024-02-09T16:21:29Z

Ok will follow up and factor this into a separate function. Yes, the second part was more the issue with limitations of exporting as a merged base model.

…

On Fri, Feb 9, 2024 at 8:14 AM Benjamin Bossan ***@***.***> wrote: I can certainly factor this into a separately usable function, but perhaps also leave it as a functionality in LoRA which I see as one of the primary ways to leverage this efficiently. This may be a good compromise. As a separate function, it should also make it easier to add it to the other methods. I'm a little wary that we're overburdening the LoRA options, which are already very many, with the features not being very modular or easy to compose, but that's an issue for another day. Keep in mind its not just the issue of downloading the weights. This approach even keeps GPU memory usage almost the same as the base model. Effectively we have trained 120B models using the memory of a 70B model. Because the lora adapters are distinct for all the layers in the effectively 120B model it is really like have double the number of independent layers. Yes, this is great, but it's not directly related to integrating this into LoraModel, right? — Reply to this email directly, view it on GitHub <#1368 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ANENV5YKC3OTMWZXMSR77VTYSZDPVAVCNFSM6AAAAABB7WCNZ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZWGIYDMNJUGM> . You are receiving this because you were mentioned.Message ID: ***@***.***>

BenjaminBossan · 2024-02-14T10:31:49Z

@siddartha-RE Let us know once this is ready for review.

siddartha-RE · 2024-02-14T15:23:16Z

@siddartha-RE Let us know once this is ready for review.

I think this is ready. I did the refactoring of the function so it is in utils. LoRA just hooks into it.

BenjaminBossan

Thanks for the update. I have some comments, please take a look.

src/peft/tuners/lora/config.py

src/peft/tuners/lora/model.py

src/peft/tuners/lora/config.py

src/peft/tuners/tuners_utils.py

tests/test_decoder_models.py

HuggingFaceDocBuilderDev · 2024-02-20T11:04:59Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

BenjaminBossan

I think we're getting close to finishing this PR. I have a few comments left, please check them out.

Also, please always run make style once you're finished. We now also have a pre-commit config in case you want to add a pre-commit hook.

src/peft/tuners/tuners_utils.py

tests/test_decoder_models.py

src/peft/tuners/lora/config.py

src/peft/tuners/tuners_utils.py

src/peft/tuners/lora/config.py

siddartha-RE · 2024-02-26T16:24:41Z

Sorry about not running the code style checks. Ran the test but forgot to rerun make quality/style. Ran it as part of the latest commit.

BenjaminBossan

Thanks for the updates, I only have a few smaller comments left.

Also, I think it would be nice to add a section to the docs, perhaps here. This way, you'll have a better chance that users will discover this feature. Ideally, a nice example could be documented there, or a link to a working example added.

src/peft/tuners/lora/model.py

src/peft/tuners/tuners_utils.py

src/peft/tuners/lora/config.py

BenjaminBossan

Thanks a lot for the recent changes. The documentation is much clearer now and users should be able to understand what this does and find further information.

I found a couple of typos etc., otherwise this LGTM.

One question: Would we be able to check in the unit test that the weights are shared instead of creating a new copy?

docs/source/developer_guides/lora.md

src/peft/tuners/lora/config.py

src/peft/tuners/lora/model.py

siddartha-RE · 2024-03-05T20:31:30Z

@BenjaminBossan addressed comments and updated the unit test to check that some weights are shared and others are distinct.

BenjaminBossan

Thanks a lot, this LGTM.

I found a minor error in a docstring, no big deal.

Let's have a final review by @younesbelkada or @pacman100 before merging.

src/peft/tuners/tuners_utils.py

Fix typo in doc string. Co-authored-by: Benjamin Bossan <[email protected]>

siddartha-RE · 2024-03-08T16:26:54Z

@BenjaminBossan do you have a sense for when this PR is likely to be merged?

pacman100

Wow! Thank you @siddartha-RE for all the work on adding support for memory efficient way of replicating layers and inserting LoRAs on those for efficient fine-tuning! 🔥🚀✨

* Add support for layer replication in LoRA * Add test and update docs * Address review comments * Code cleanup and additional model support * Add docs, address comments * Add link to example model * Improve test and fix typos * Update src/peft/tuners/tuners_utils.py Fix typo in doc string. Co-authored-by: Benjamin Bossan <[email protected]> --------- Co-authored-by: Benjamin Bossan <[email protected]>

hohoCode · 2024-04-04T08:45:56Z

Is this possible to support flan-t5? Because t5 still has a huge commercial base and been used widely. @siddartha-RE @BenjaminBossan

Might look like so in the replicate_layers() function (plus other changes maybe):
layers = model.encoder.block #instead of .layers
Thanks!

BenjaminBossan · 2024-04-04T09:29:45Z

I think there is no technical reason why this shouldn't work, but with the given way of specifying the layers, I'm not sure how to indicate if layers in the encoder or decoder part should be replicated. Maybe Siddartha has an idea.

siddartha-RE · 2024-04-04T14:45:55Z

I think there is no technical reason why this shouldn't work, but with the given way of specifying the layers, I'm not sure how to indicate if layers in the encoder or decoder part should be replicated. Maybe Siddartha has an idea.

I think the easiest option here may be to promote the replication config to Union[Dict[str, list], list] if it is a dict then there will be separate entries for encoder and decoder and architectures that support can do the right thing.

BenjaminBossan · 2024-04-08T12:17:32Z

I think the easiest option here may be to promote the replication config to Union[Dict[str, list], list] if it is a dict then there will be separate entries for encoder and decoder and architectures that support can do the right thing.

Sounds good, are you interested in adding this? I checked mergekit and AFAICT there is no specification for how to define this for encoder-decoder models. It would be unfortunate if they come up with a different standard, but I guess we could add that later.

hohoCode · 2024-04-09T23:32:01Z

Another quick thing I tried on Flan-T5 is probably also a quick bug: the total number of final duplicated blocks cannot be more than the original total number of blocks.

So for Flan-T5 with 24 original blocks, if we do "layer_replication=[[0, 24], [22, 24]]". This will lead to an 'out of index error', since the new total number of blocks is 26, more than 24.

However, if we do "[[0, 7], [9, 24], [22, 24]]" that will not have the error since there are just 24 blocks. Just to report this for T5. Thanks!

xzuyn · 2024-04-24T11:41:48Z

Could this possibly be updated to optionally set the replicated layer's o_proj and down_proj to zero so that the performance remains the same as the base before training?

I think this idea initially comes from LLaMa-Pro, and was later tested on chargoddard's mistral-11b-slimorca.

Quoting Charles here:

The base model for this came from a variation on Undi's Mistral 11B recipe. The o_proj and down_proj tensors were set to zero in the added layers, making the output exactly identical to Mistral 7B before training.

~~Benchmarks look good locally but still evaluating actual usefulness.~~ Update: this turned out great! 10/10 would recommend as a training approach.

BenjaminBossan · 2024-04-25T10:15:40Z

Could this possibly be updated to optionally set the replicated layer's o_proj and down_proj to zero so that the performance remains the same as the base before training?

Also described in this answer.ai blogpost.

KimJaehee0725 · 2024-07-09T17:40:45Z

@siddartha-RE Hello. First of all, thanks for your brilliant idea. I want to deep dive into the mechanism. For that, could you give me the specific configurations of experiments of the models you trained, https://huggingface.co/abacusai/Fewshot-Metamath-OrcaVicuna-Mistral-10B.

It will be really helpful for my work.

Thanks!

younesbelkada reviewed Jan 22, 2024

View reviewed changes

siddartha-RE force-pushed the user/siddartha/duplicate_layers branch from 8ed0641 to 961d0bb Compare January 24, 2024 23:33

siddartha-RE requested a review from younesbelkada January 24, 2024 23:33

pacman100 reviewed Jan 29, 2024

View reviewed changes

siddartha-RE requested a review from pacman100 January 31, 2024 20:59

BenjaminBossan reviewed Feb 6, 2024

View reviewed changes

siddartha-RE requested a review from BenjaminBossan February 8, 2024 16:40

siddartha-RE force-pushed the user/siddartha/duplicate_layers branch from 961d0bb to eecb76b Compare February 12, 2024 20:26

BenjaminBossan requested changes Feb 15, 2024

View reviewed changes

siddartha-RE force-pushed the user/siddartha/duplicate_layers branch from eecb76b to 547cade Compare February 19, 2024 17:10

siddartha-RE requested a review from BenjaminBossan February 19, 2024 17:12

BenjaminBossan requested changes Feb 20, 2024

View reviewed changes

siddartha-RE requested a review from BenjaminBossan February 26, 2024 16:24

BenjaminBossan requested changes Feb 27, 2024

View reviewed changes

siddartha-RE added 2 commits March 4, 2024 16:00

Add support for layer replication in LoRA

697a220

Add test and update docs

5534dad

siddartha-RE added 4 commits March 4, 2024 16:02

Address review comments

81e482d

Code cleanup and additional model support

97ee03e

Add docs, address comments

b6880a5

Add link to example model

3b1e693

siddartha-RE force-pushed the user/siddartha/duplicate_layers branch from 11b489e to 3b1e693 Compare March 4, 2024 16:02

siddartha-RE requested a review from BenjaminBossan March 4, 2024 16:04

BenjaminBossan requested changes Mar 5, 2024

View reviewed changes

Improve test and fix typos

35d8bc7

siddartha-RE force-pushed the user/siddartha/duplicate_layers branch from 1da5e1d to 35d8bc7 Compare March 5, 2024 20:29

siddartha-RE requested a review from BenjaminBossan March 5, 2024 20:31

BenjaminBossan approved these changes Mar 6, 2024

View reviewed changes

src/peft/tuners/tuners_utils.py Outdated Show resolved Hide resolved

Update src/peft/tuners/tuners_utils.py

77685ac

Fix typo in doc string. Co-authored-by: Benjamin Bossan <[email protected]>

pacman100 approved these changes Mar 12, 2024

View reviewed changes

pacman100 merged commit 5471c9a into huggingface:main Mar 12, 2024
14 checks passed

winglian mentioned this pull request Mar 27, 2024

support layer replication for peft and fix rslora integration axolotl-ai-cloud/axolotl#1445

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for layer replication in LoRA #1368

Add support for layer replication in LoRA #1368

siddartha-RE commented Jan 18, 2024

younesbelkada left a comment

siddartha-RE commented Jan 22, 2024 via email

siddartha-RE commented Jan 24, 2024

pacman100 left a comment

siddartha-RE commented Jan 31, 2024

BenjaminBossan left a comment

siddartha-RE commented Feb 8, 2024

BenjaminBossan commented Feb 9, 2024

siddartha-RE commented Feb 9, 2024 via email

BenjaminBossan commented Feb 9, 2024

siddartha-RE commented Feb 9, 2024 via email

BenjaminBossan commented Feb 9, 2024

siddartha-RE commented Feb 9, 2024 via email

BenjaminBossan commented Feb 14, 2024

siddartha-RE commented Feb 14, 2024

BenjaminBossan left a comment

HuggingFaceDocBuilderDev commented Feb 20, 2024

BenjaminBossan left a comment •

edited

Loading

siddartha-RE commented Feb 26, 2024

BenjaminBossan left a comment

BenjaminBossan left a comment

siddartha-RE commented Mar 5, 2024

BenjaminBossan left a comment

siddartha-RE commented Mar 8, 2024

pacman100 left a comment

hohoCode commented Apr 4, 2024

BenjaminBossan commented Apr 4, 2024

siddartha-RE commented Apr 4, 2024

BenjaminBossan commented Apr 8, 2024

hohoCode commented Apr 9, 2024 •

edited

Loading

xzuyn commented Apr 24, 2024 •

edited

Loading

BenjaminBossan commented Apr 25, 2024

KimJaehee0725 commented Jul 9, 2024

Add support for layer replication in LoRA #1368

Add support for layer replication in LoRA #1368

Conversation

siddartha-RE commented Jan 18, 2024

younesbelkada left a comment

Choose a reason for hiding this comment

siddartha-RE commented Jan 22, 2024 via email

siddartha-RE commented Jan 24, 2024

pacman100 left a comment

Choose a reason for hiding this comment

siddartha-RE commented Jan 31, 2024

BenjaminBossan left a comment

Choose a reason for hiding this comment

siddartha-RE commented Feb 8, 2024

BenjaminBossan commented Feb 9, 2024

siddartha-RE commented Feb 9, 2024 via email

BenjaminBossan commented Feb 9, 2024

siddartha-RE commented Feb 9, 2024 via email

BenjaminBossan commented Feb 9, 2024

siddartha-RE commented Feb 9, 2024 via email

BenjaminBossan commented Feb 14, 2024

siddartha-RE commented Feb 14, 2024

BenjaminBossan left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Feb 20, 2024

BenjaminBossan left a comment • edited Loading

Choose a reason for hiding this comment

siddartha-RE commented Feb 26, 2024

BenjaminBossan left a comment

Choose a reason for hiding this comment

BenjaminBossan left a comment

Choose a reason for hiding this comment

siddartha-RE commented Mar 5, 2024

BenjaminBossan left a comment

Choose a reason for hiding this comment

siddartha-RE commented Mar 8, 2024

pacman100 left a comment

Choose a reason for hiding this comment

hohoCode commented Apr 4, 2024

BenjaminBossan commented Apr 4, 2024

siddartha-RE commented Apr 4, 2024

BenjaminBossan commented Apr 8, 2024

hohoCode commented Apr 9, 2024 • edited Loading

xzuyn commented Apr 24, 2024 • edited Loading

BenjaminBossan commented Apr 25, 2024

KimJaehee0725 commented Jul 9, 2024

BenjaminBossan left a comment •

edited

Loading

hohoCode commented Apr 9, 2024 •

edited

Loading

xzuyn commented Apr 24, 2024 •

edited

Loading