Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug Report] hook_normalized is inconsistent between RMSNorm and LayerNorm #747

Open
neelnanda-io opened this issue Oct 6, 2024 · 1 comment
Labels
breaking-change bug Something isn't working complexity-moderate Moderately complicated issues for people who have intermediate experience with the code

Comments

@neelnanda-io
Copy link
Collaborator

In layer_norm.py hook_normalized is after the gain and bias weights are used, in rms_norm.py it's before. This is inconsistent and highly confusing IMO, and should be fixed

RMS

x = self.hook_normalized(x / scale).to(self.cfg.dtype)  # [batch, pos, length]
return x * self.w

LN

return self.hook_normalized(x * self.w + self.b).to(self.cfg.dtype)

This is an irritating situation, since I think it's super bad to be inconsistent as eg code won't transfer from an RMS Norm model to an LN model, and there'll be silent errors. However, making it consistent would be (technically) a breaking change, though I'd guess it's not widely used behaviour?

I personally think hook_normalized should be changed to be before the gain and bias weights in LN, since that's what normalized intuitively means. This is what it originally meant, I then changed it about two years ago in the early days of the library, I think because that was then "the thing that is input into the next layer". But now we have attn_input and mlp_input hooks so who cares.

Note that this is not an issue if you fold the LN weights, it's equivalent

cc @ArthurConmy @bryce13950

@bryce13950
Copy link
Collaborator

3.0 is coming sooner rather than later. We can definitely work this into that release.

@bryce13950 bryce13950 added bug Something isn't working complexity-moderate Moderately complicated issues for people who have intermediate experience with the code labels Oct 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking-change bug Something isn't working complexity-moderate Moderately complicated issues for people who have intermediate experience with the code
Projects
None yet
Development

No branches or pull requests

2 participants