-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training PEFT models with new tokens being added to the embedding layers and tokenizer #1147
Training PEFT models with new tokens being added to the embedding layers and tokenizer #1147
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
The documentation is not available anymore as the PR was closed or merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this feature, it should help many users with this type of issue.
I have a couple of comments, please take a look. Also, some more general concerns:
base_model_layers_to_save
only makes sense when the corresponding layer is atarget_module
, right? IIUC, when a user adds a module tobase_model_layers_to_save
that is not atarget_module
, right now this would just be silently ignored. I wonder if we shouldn't perform a check to ensure that each entry inbase_model_layers_to_save
has found a match and raise an error otherwise. I would imagine that it can be very frustrating for users when they save the model after a long time of training and only later they find out that one part was silently ignored. This requirement should also be documented.- This feature doesn't work for prompt learning because we rely on
base_layer
, right? I think we can make it work for prompt learning, though, by just checking the name withoutbase_layer
. WDYT? - Let's add a unit test for this feature.
1. Add `is_embedding_layer_resized` parameter to `save_pretrained` 2. Fix the deduplication in README when adding PEFT details. 3. `save_pretrained` should only save the model when `is_main_process=True` which is one of the parameters of `save_pretrained`.
Hello @BenjaminBossan, I have changed the code quite a bit based on the feedback [here] (https://huggingface.slack.com/archives/C04L3MWLE6B/p1700238526932559?thread_ts=1699626507.649479&cid=C04L3MWLE6B)
As there is only need to save the embedding layers when they are resized, I have changed the code to achieve it. Also fixed the below issues:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for reworking the PR to make it more user-friendly, as well as for adding the is_main_process
argument and for fixing the model card. Also well done adding a notebook to illustrate the new feature.
The logic for determining the embedding layer, even if only a few lines of code, was actually more intricate than I thought and took me some time to understand, but it looks correct to me.
I left a couple of comments. I'd say none of them are hard blockers, but please do take a look. I think my main concern would be the comment about when to automatically toggle the saving of embedding layers.
examples/causal_language_modeling/peft_lora_clm_with_additional_tokens.ipynb
Outdated
Show resolved
Hide resolved
examples/causal_language_modeling/peft_lora_clm_with_additional_tokens.ipynb
Outdated
Show resolved
Hide resolved
Co-Authored-By: Benjamin Bossan <[email protected]>
Hello @BenjaminBossan, I've addressed all the comments, Thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fantastic, great implementation, examples, docs, and tests.
I found a typo in the test name, apart from that this can be merged.
"2. Finetuning on a specific language wherein language spoecific tokens are added, e.g., korean tokens being added to vocabulary for finetuning LLM on Korean datasets.\n", | ||
"3. Instruction finetuning to return outputs in certain format to enable agent behaviour new tokens such as `<|FUNCTIONS|>`, `<|BROWSE|>`, `<|TEXT2IMAGE|>`, `<|ASR|>`, `<|TTS|>`, `<|GENERATECODE|>`, `<|RAG|>`.\n", | ||
"\n", | ||
"In such cases, you add the Embedding modules to the LORA `target_modules`. PEFT will take care of saving the embedding layers with the new added tokens along with the adapter weights that were trained on the specific initialization of the embeddings weights of the added tokens." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great explanation.
Co-authored-by: Benjamin Bossan <[email protected]>
…ers and tokenizer (huggingface#1147) * add support for saving base layers weights along with adapter weights * Update save_and_load.py * Add an example showing the usage of the added feature * refactor the functionality * fix * refactoring code 1. Add `is_embedding_layer_resized` parameter to `save_pretrained` 2. Fix the deduplication in README when adding PEFT details. 3. `save_pretrained` should only save the model when `is_main_process=True` which is one of the parameters of `save_pretrained`. * update example * fix the model card * fix model card * 😅 * fix model card * automate setting `is_embedding_layer_resized` * nits * Update peft_lora_clm_with_additional_tokens.ipynb * add test * fix tests * maybe fixes the issue? * address comments Co-Authored-By: Benjamin Bossan <[email protected]> * Apply suggestions from code review Co-authored-by: Benjamin Bossan <[email protected]> --------- Co-authored-by: Benjamin Bossan <[email protected]>
Resolves huggingface#1300 Sourab added the feature to store the embedding layers alongside the adapter in huggingface#1147. This PR adds an entry to the documentation to explain the new feature.
Resolves #1300 Sourab added the feature to store the embedding layers alongside the adapter in #1147. This PR adds an entry to the documentation to explain the new feature. --------- Co-authored-by: Steven Liu <[email protected]>
I have a question about this pr. Is it that before this pr was merged it wasn't possible to both do both lora fine-tuning as well as save the lora adapters for Why I ask is because i see that when I run the snippet included in this comment here, I see a massive difference in the size of the saved adapter files between 0.6.2 and 0.7.0 |
I still see this issue of size increase with peft 0.10.0. Is the size increase due to an issue or is it fine? Is it going to be fixed or will the behavior remain with increase in size? |
Please check out Sourab's comment, does it make clear why you see the size increase and what you can do about it? If not, could you please clarify what's still unclear to you? |
Thanks Benjamin. I understand that emb layers (embed_tokens & lm_head) will be saved by default, but does it mean the size inflation is obvious? Earlier too the modules used to be saved explicitly but it was done with lesser size. For cases where token vocab is not changing they too will now have to leave with bigger size? Or is it recommended not to save emb layers if there is no change in token vocab? |
No, the embedding is not saved by default. The default setting is Note that you can always pass
Indeed, when the vocab is not changed (no fine-tuning, no additional tokens being added), it does not need to be included in the saved file.
Sorry, I don't understand this question, could you please clarify? |
I am Continuous pretraining with Lora a pretrainined LLM for non-English language without extending the tokenizer as it already has the needed tokens. For non-English lang it is recommended to add emb layers. So in such case since i need emb layers which peft will save as save_emb_layers flag is true it will lead to bigger size adaptor? |
You have to be more precise here: Are you fully fine-tuning the embedding layer -- in that case, it has to be saved, leading to a much larger file size. Or are you training a LoRA weight on top of the embedding layer -- in that case, only those LoRA weights must be saved, not the whole embedding layer, leading to only a slight increase of the file size (compared to not training the embedding at all). |
Thank Benjamin. I am doing pretraining with Lora on existing pretrained model Gemma-7B for augmenting non-English corpus with a big corpus. I know Full FT would have been good but cost would have been higher. |
@amitagh Since this discussion is only tangentially related to the PR, could you pleas open a thread in the PEFT discussions and describe your problem there? Please include as many details as you can share, like the |
What does this PR do?
save_embedding_layers=True
to thesave_pretrained
method. If not explicitly passed, we try our best to guess if the common embedding module names are present intarget_modules
of the config.save_pretrained
doesn't save on main process and as such when working in multi GPU environment, it leads to broken checkpoints as many processes are simultaneously writing to it. This is fixed now withis_main_process
argument which is inline with that of transformers.n
steps/epochs. Fixed this so that the created README is clean.Todo: