Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
deepspeed-chat: fix weight decay configuration
Current default name used to detect LN layers is "LayerNorm.weight". This does not work for the following models: - opt: uses "layer_norm" - llama: uses "norm" and "layernorm" - bloom: uses "layernorm" and "ln_f" Therefore, modify the default names to accomodate for the above. Also, compare names in lower-caps to capture models with different caps. Change-Id: I5b805df2663c62daf3d9c8a31a973742e344e76b Signed-off-by: Moshe Island <[email protected]>
- Loading branch information