Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deepspeed-chat: fix weight decay configuration #755

Merged
merged 3 commits into from
Oct 16, 2023

Conversation

mosheisland
Copy link
Contributor

Current default name used to detect LN layers is "LayerNorm.weight". This does not work for the following models:

  • opt: uses "layer_norm"
  • llama: uses "norm" and "layernorm"
  • bloom: uses "layernorm" and "ln_f"

Therefore, modify the default names to accomodate for the above. Also, compare names in lower-caps to capture models with different caps.

Change-Id: I5b805df2663c62daf3d9c8a31a973742e344e76b

@mosheisland
Copy link
Contributor Author

@lekurile, are you ok with this PR?

Current default name used to detect LN layers is "LayerNorm.weight".
This does not work for the following models:
- opt: uses "layer_norm"
- llama: uses "norm" and "layernorm"
- bloom: uses "layernorm" and "ln_f"

Therefore, modify the default names to accomodate for the above.
Also, compare names in lower-caps to capture models with different caps.

Change-Id: I5b805df2663c62daf3d9c8a31a973742e344e76b
Signed-off-by: Moshe Island <[email protected]>
@tjruwase tjruwase merged commit 8d850ba into microsoft:master Oct 16, 2023
1 check passed
@mosheisland mosheisland deleted the 6_fix_wd_config branch October 17, 2023 06:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants