deepspeed-chat: fix weight decay configuration #755

mosheisland · 2023-10-03T13:46:15Z

Current default name used to detect LN layers is "LayerNorm.weight". This does not work for the following models:

opt: uses "layer_norm"
llama: uses "norm" and "layernorm"
bloom: uses "layernorm" and "ln_f"

Therefore, modify the default names to accomodate for the above. Also, compare names in lower-caps to capture models with different caps.

Change-Id: I5b805df2663c62daf3d9c8a31a973742e344e76b

applications/DeepSpeed-Chat/training/utils/utils.py

mosheisland · 2023-10-08T06:50:26Z

@lekurile, are you ok with this PR?

Current default name used to detect LN layers is "LayerNorm.weight". This does not work for the following models: - opt: uses "layer_norm" - llama: uses "norm" and "layernorm" - bloom: uses "layernorm" and "ln_f" Therefore, modify the default names to accomodate for the above. Also, compare names in lower-caps to capture models with different caps. Change-Id: I5b805df2663c62daf3d9c8a31a973742e344e76b Signed-off-by: Moshe Island <[email protected]>

mosheisland requested review from jeffra, samyam, tjruwase, ShadenSmith, conglongli, awan-10, eltonzheng, minjiaz, RezaYazdaniAminabadi, duli2012, mrwyattii, yaozhewei, arashb and xiaoxiawu-microsoft as code owners October 3, 2023 13:46

tjruwase requested review from lekurile and removed request for arashb, ShadenSmith, jeffra, duli2012, samyam, conglongli, mrwyattii, yaozhewei, eltonzheng, minjiaz, RezaYazdaniAminabadi and xiaoxiawu-microsoft October 3, 2023 14:01

mosheisland force-pushed the 6_fix_wd_config branch from 358c3ae to 281fc2a Compare October 3, 2023 17:58

lekurile requested changes Oct 3, 2023

View reviewed changes

applications/DeepSpeed-Chat/training/utils/utils.py Show resolved Hide resolved

lekurile approved these changes Oct 4, 2023

View reviewed changes

mosheisland force-pushed the 6_fix_wd_config branch from c63d79a to 5bba361 Compare October 5, 2023 05:26

mosheisland force-pushed the 6_fix_wd_config branch from f327d6a to a273e6b Compare October 12, 2023 12:46

tjruwase added 2 commits October 16, 2023 16:21

Merge branch 'master' into 6_fix_wd_config

9cf9f4c

Merge branch 'master' into 6_fix_wd_config

7f99357

tjruwase merged commit 8d850ba into microsoft:master Oct 16, 2023
1 check passed

mosheisland deleted the 6_fix_wd_config branch October 17, 2023 06:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deepspeed-chat: fix weight decay configuration #755

deepspeed-chat: fix weight decay configuration #755

mosheisland commented Oct 3, 2023

mosheisland commented Oct 8, 2023

deepspeed-chat: fix weight decay configuration #755

deepspeed-chat: fix weight decay configuration #755

Conversation

mosheisland commented Oct 3, 2023

mosheisland commented Oct 8, 2023