[Bug Report] hook_normalized is inconsistent between RMSNorm and LayerNorm #747
Labels
breaking-change
bug
Something isn't working
complexity-moderate
Moderately complicated issues for people who have intermediate experience with the code
In layer_norm.py
hook_normalized
is after the gain and bias weights are used, in rms_norm.py it's before. This is inconsistent and highly confusing IMO, and should be fixedRMS
LN
This is an irritating situation, since I think it's super bad to be inconsistent as eg code won't transfer from an RMS Norm model to an LN model, and there'll be silent errors. However, making it consistent would be (technically) a breaking change, though I'd guess it's not widely used behaviour?
I personally think hook_normalized should be changed to be before the gain and bias weights in LN, since that's what normalized intuitively means. This is what it originally meant, I then changed it about two years ago in the early days of the library, I think because that was then "the thing that is input into the next layer". But now we have attn_input and mlp_input hooks so who cares.
Note that this is not an issue if you fold the LN weights, it's equivalent
cc @ArthurConmy @bryce13950
The text was updated successfully, but these errors were encountered: