-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for layer replication in LoRA #1368
Add support for layer replication in LoRA #1368
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @siddartha-RE for this feature! Can you elaborate more on its usage by providing some possible snippets of the API? Would you be happy to extend the current documentation and test suite to add that feature as well?
Hi Younes,
definitely, sorry I should have marked it draft. I am in the process of
training and uploading a model to HF to demonstrate its use. I can use the
same config to build out an example of usage and figure out testing. I will
get it done in a day or two. I will also elaborate on the PR description.
If it is not clear the basic idea is to be able to fine tune models with
layers replicated (like has done with SOLAR / Goliath / ...) but with
separate lora adjustments for layers that have been duplicated. This allows
training very large models (for example 120B) with memory usage close to
that of the base 70B model.
- Siddartha
…On Mon, Jan 22, 2024 at 8:52 AM Younes Belkada ***@***.***> wrote:
***@***.**** commented on this pull request.
Thanks @siddartha-RE <https://github.com/siddartha-RE> for this feature!
Can you elaborate more on its usage by providing some possible snippets of
the API? Would you be happy to extend the current documentation and test
suite to add that feature as well?
—
Reply to this email directly, view it on GitHub
<#1368 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ANENV55D3WRSWBGTWIX574TYPZ4JNAVCNFSM6AAAAABB7WCNZ6VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMYTQMZWGY2TMNRQHA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I have created an example model using this PR: I have also added a test to verify and added documentation in the code. I am not quite sure where to add additional docs. Should I modify one of the existing task docs that discuss LoRA (did not see an obvious candidate) or should I check a new example under |
8ed0641
to
961d0bb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @siddartha-RE, I like the idea, the implementation is clear and I went over the POC experiment you have shared. I think this is a good feature to have. I would like to know the thoughts of @BenjaminBossan and @younesbelkada on this.
Following up here with an ablation study:
The 10B model was built by layer stacking the 7B model with the following config I think this is a pretty compelling approach for training large models without incurring the full memory cost. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for providing this interesting new technique.
From my understanding of reading the code, this effectively modifies the base model by copying a couple of existing layers (with shared weights). Then, by applying different LoRA weights on these copies, it is ensured that these copied layers can learn different things.
From my understanding, the copy-functionality is not tied to LoRA at all. The same principle could be applied to IA³, LoHa, etc. Therefore, I think it would make sense to remove this functionality from LoraModel
and make it a generic utility function. In practice, the usage would look something like this:
base_model = AutoModelForCausalLM.from_pretrained(...)
layer_replication_map = [(0, 16), (8, 24), (16, 32)]
extended_model = replicate_layers(base_model, layer_replication_map)
config = LoraConfig(...) # or IA3Config or ...
peft_model = get_peft_model(extended_model, config)
The advantage of this approach would be to make it generic and not overburden the existing LoRA implementation with something that is not fundamentally related to LoRA.
The disadvantage would be:
- as a user, I have to run the same extension code again when loading the trained adapter
- as it's not part of the
LoraConfig
, we cannot easily check if merging is allowed
IMO, this is still a better approach, but please tell me what you think about my proposal.
I did consider how to keep it independent of LoRA. The thought I had was to make it just a separate adapter that can then be used with the multi adapter support that has recently landed to allow mixing with other adapters. The reasons I did not do it is because multi adapter seems to still be evolving and I didn't know how many adapters this would actually realistically work with. Second, I thought it was better to see if this method was generally interesting to the community before trying a more complex implementation. One definite goal is to allow this to be loaded without any custom code so I would like the solution to work with any code that current just works with PeftConfig.from_pretrained(...) |
I wonder if we even need a full new adapter class for this feature. What if we just make it a function that can be called on the base model before applying LoRA or another method? |
But say I want to submit a model to HF. How would I push a config that
would allow people to load the model without custom code to invoke the hook.
…On Fri, Feb 9, 2024, 5:21 AM Benjamin Bossan ***@***.***> wrote:
I did consider how to keep it independent of LoRA. The thought I had was
to make it just a separate adapter that can then be used with the multi
adapter support that has recently landed to allow mixing with other
adapters.
I wonder if we even need a full new adapter class for this feature. What
if we just make it a function that can be called on the base model before
applying LoRA or another method?
—
Reply to this email directly, view it on GitHub
<#1368 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ANENV52RDMYWK3VRSRANHP3YSYBD7AVCNFSM6AAAAABB7WCNZ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZVG42DONRSGI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I definitely see your point of wanting to avoid requiring custom code. Loading from HF Hub would still work for the full model state dict, right? But of course, that would be a huge waste of space, so it'd be nice to have another solution. Still, I don't think the solution is to bundle this with LoRA, as the two are independent. I could see the argument for having a separate adapter type with its own config, which could be combined with LoRA in a separate step, but I'm not sure if that's really better than having custom code, which comes down to a single function call 🤔 At the end of the day, if the other maintainers would rather have this bundled with LoRA, I can certainly be convinced :) My personal preference is still a separate function call though. |
I can certainly factor this into a separately usable function, but perhaps
also leave it as a functionality in LoRA which I see as one of the primary
ways to leverage this efficiently. Keep in mind its not just the issue of
downloading the weights. This approach even keeps GPU memory usage almost
the same as the base model. Effectively we have trained 120B models using
the memory of a 70B model. Because the lora adapters are distinct for all
the layers in the effectively 120B model it is really like have double the
number of independent layers.
…On Fri, Feb 9, 2024 at 6:57 AM Benjamin Bossan ***@***.***> wrote:
But say I want to submit a model to HF. How would I push a config that
would allow people to load the model without custom code to invoke the hook.
I definitely see your point of wanting to avoid requiring custom code.
Loading from HF Hub would still work for the full model state dict, right?
But of course, that would be a huge waste of space, so it'd be nice to have
another solution. Still, I don't think the solution is to bundle this with
LoRA, as the two are independent.
I could see the argument for having a separate adapter type with its own
config, which could be combined with LoRA in a separate step, but I'm not
sure if that's really better than having custom code, which comes down to a
single function call 🤔
At the end of the day, if the other maintainers would rather have this
bundled with LoRA, I can certainly be convinced :) My personal preference
is still a separate function call though.
—
Reply to this email directly, view it on GitHub
<#1368 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ANENV552HFR6XXSR4IOVEYDYSY2LZAVCNFSM6AAAAABB7WCNZ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZWGA3TQOBXGA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
This may be a good compromise. As a separate function, it should also make it easier to add it to the other methods. I'm a little wary that we're overburdening the LoRA options, which are already very many, with the features not being very modular or easy to compose, but that's an issue for another day.
Yes, this is great, but it's not directly related to integrating this into |
Ok will follow up and factor this into a separate function.
Yes, the second part was more the issue with limitations of exporting as a
merged base model.
…On Fri, Feb 9, 2024 at 8:14 AM Benjamin Bossan ***@***.***> wrote:
I can certainly factor this into a separately usable function, but perhaps
also leave it as a functionality in LoRA which I see as one of the primary
ways to leverage this efficiently.
This may be a good compromise. As a separate function, it should also make
it easier to add it to the other methods. I'm a little wary that we're
overburdening the LoRA options, which are already very many, with the
features not being very modular or easy to compose, but that's an issue for
another day.
Keep in mind its not just the issue of downloading the weights. This
approach even keeps GPU memory usage almost the same as the base model.
Effectively we have trained 120B models using the memory of a 70B model.
Because the lora adapters are distinct for all the layers in the
effectively 120B model it is really like have double the number of
independent layers.
Yes, this is great, but it's not directly related to integrating this into
LoraModel, right?
—
Reply to this email directly, view it on GitHub
<#1368 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ANENV5YKC3OTMWZXMSR77VTYSZDPVAVCNFSM6AAAAABB7WCNZ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZWGIYDMNJUGM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
961d0bb
to
eecb76b
Compare
@siddartha-RE Let us know once this is ready for review. |
I think this is ready. I did the refactoring of the function so it is in utils. LoRA just hooks into it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update. I have some comments, please take a look.
eecb76b
to
547cade
Compare
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're getting close to finishing this PR. I have a few comments left, please check them out.
Also, please always run make style
once you're finished. We now also have a pre-commit config in case you want to add a pre-commit hook.
Sorry about not running the code style checks. Ran the test but forgot to rerun make quality/style. Ran it as part of the latest commit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates, I only have a few smaller comments left.
Also, I think it would be nice to add a section to the docs, perhaps here. This way, you'll have a better chance that users will discover this feature. Ideally, a nice example could be documented there, or a link to a working example added.
11b489e
to
3b1e693
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the recent changes. The documentation is much clearer now and users should be able to understand what this does and find further information.
I found a couple of typos etc., otherwise this LGTM.
One question: Would we be able to check in the unit test that the weights are shared instead of creating a new copy?
1da5e1d
to
35d8bc7
Compare
@BenjaminBossan addressed comments and updated the unit test to check that some weights are shared and others are distinct. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot, this LGTM.
I found a minor error in a docstring, no big deal.
Let's have a final review by @younesbelkada or @pacman100 before merging.
Fix typo in doc string. Co-authored-by: Benjamin Bossan <[email protected]>
@BenjaminBossan do you have a sense for when this PR is likely to be merged? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow! Thank you @siddartha-RE for all the work on adding support for memory efficient way of replicating layers and inserting LoRAs on those for efficient fine-tuning! 🔥🚀✨
* Add support for layer replication in LoRA * Add test and update docs * Address review comments * Code cleanup and additional model support * Add docs, address comments * Add link to example model * Improve test and fix typos * Update src/peft/tuners/tuners_utils.py Fix typo in doc string. Co-authored-by: Benjamin Bossan <[email protected]> --------- Co-authored-by: Benjamin Bossan <[email protected]>
* Add support for layer replication in LoRA * Add test and update docs * Address review comments * Code cleanup and additional model support * Add docs, address comments * Add link to example model * Improve test and fix typos * Update src/peft/tuners/tuners_utils.py Fix typo in doc string. Co-authored-by: Benjamin Bossan <[email protected]> --------- Co-authored-by: Benjamin Bossan <[email protected]>
Is this possible to support flan-t5? Because t5 still has a huge commercial base and been used widely. @siddartha-RE @BenjaminBossan Might look like so in the replicate_layers() function (plus other changes maybe): |
I think there is no technical reason why this shouldn't work, but with the given way of specifying the layers, I'm not sure how to indicate if layers in the encoder or decoder part should be replicated. Maybe Siddartha has an idea. |
I think the easiest option here may be to promote the replication config to |
Sounds good, are you interested in adding this? I checked mergekit and AFAICT there is no specification for how to define this for encoder-decoder models. It would be unfortunate if they come up with a different standard, but I guess we could add that later. |
Another quick thing I tried on Flan-T5 is probably also a quick bug: the total number of final duplicated blocks cannot be more than the original total number of blocks. So for Flan-T5 with 24 original blocks, if we do "layer_replication=[[0, 24], [22, 24]]". This will lead to an 'out of index error', since the new total number of blocks is 26, more than 24. However, if we do "[[0, 7], [9, 24], [22, 24]]" that will not have the error since there are just 24 blocks. Just to report this for T5. Thanks! |
Could this possibly be updated to optionally set the replicated layer's I think this idea initially comes from LLaMa-Pro, and was later tested on chargoddard's Quoting Charles here:
|
Also described in this answer.ai blogpost. |
@siddartha-RE Hello. First of all, thanks for your brilliant idea. I want to deep dive into the mechanism. For that, could you give me the specific configurations of experiments of the models you trained, https://huggingface.co/abacusai/Fewshot-Metamath-OrcaVicuna-Mistral-10B. It will be really helpful for my work. Thanks! |
This PR adds the ability to duplicate layers in a model according to a layer map
and then fine tune separate lora adapters for the layers post duplication. This
allows expanding a model to larger model and then fine tuning that model with
very little extra memory compared to the original smaller model.