Skip to content

Commit

Permalink
deepspeed-chat: train v_head when only optimizing lora
Browse files Browse the repository at this point in the history
When using only optimize lora, we still need to train the v_head parameter.

Change-Id: I252c3ee69819997bf336482c6779b070f2e76df8
Signed-off-by: Moshe Island <[email protected]>
  • Loading branch information
mosheisland committed Oct 5, 2023
1 parent bfad08f commit 738aa32
Show file tree
Hide file tree
Showing 2 changed files with 4 additions and 3 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -252,7 +252,8 @@ def main():
args.lora_module_name,
args.lora_dim)
if args.only_optimize_lora:
rm_model = only_optimize_lora_parameters(rm_model)
rm_model = only_optimize_lora_parameters(
rm_model, force_optimize_params=['v_head.weight'])
rm_model = make_model_gradient_checkpointing_compatible(rm_model)

train_phase = 2
Expand Down
4 changes: 2 additions & 2 deletions applications/DeepSpeed-Chat/training/utils/module/lora.py
Original file line number Diff line number Diff line change
Expand Up @@ -131,10 +131,10 @@ def convert_lora_to_linear_layer(model):
return model


def only_optimize_lora_parameters(model):
def only_optimize_lora_parameters(model, force_optimize_params=[]):
# turn off the gradient of all the parameters except the LoRA parameters
for name, param in model.named_parameters():
if "lora_right_weight" in name or "lora_left_weight" in name:
if "lora_right_weight" in name or "lora_left_weight" in name or name in force_optimize_params:
param.requires_grad = True
else:
param.requires_grad = False
Expand Down

0 comments on commit 738aa32

Please sign in to comment.