[Bug Report] hook_normalized is inconsistent between RMSNorm and LayerNorm #747

neelnanda-io · 2024-10-06T17:29:42Z

In layer_norm.py hook_normalized is after the gain and bias weights are used, in rms_norm.py it's before. This is inconsistent and highly confusing IMO, and should be fixed

RMS

x = self.hook_normalized(x / scale).to(self.cfg.dtype)  # [batch, pos, length]
return x * self.w

LN

return self.hook_normalized(x * self.w + self.b).to(self.cfg.dtype)

This is an irritating situation, since I think it's super bad to be inconsistent as eg code won't transfer from an RMS Norm model to an LN model, and there'll be silent errors. However, making it consistent would be (technically) a breaking change, though I'd guess it's not widely used behaviour?

I personally think hook_normalized should be changed to be before the gain and bias weights in LN, since that's what normalized intuitively means. This is what it originally meant, I then changed it about two years ago in the early days of the library, I think because that was then "the thing that is input into the next layer". But now we have attn_input and mlp_input hooks so who cares.

Note that this is not an issue if you fold the LN weights, it's equivalent

cc @ArthurConmy @bryce13950

The text was updated successfully, but these errors were encountered:

bryce13950 · 2024-10-15T22:37:29Z

3.0 is coming sooner rather than later. We can definitely work this into that release.

bryce13950 added the breaking-change label Oct 15, 2024

bryce13950 added bug Something isn't working complexity-moderate Moderately complicated issues for people who have intermediate experience with the code labels Oct 15, 2024

degenfabian mentioned this issue Nov 1, 2024

Restore consistency of hook_normalized between LayerNorm and RMSNorm #770

Open

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug Report] hook_normalized is inconsistent between RMSNorm and LayerNorm #747

[Bug Report] hook_normalized is inconsistent between RMSNorm and LayerNorm #747

neelnanda-io commented Oct 6, 2024

bryce13950 commented Oct 15, 2024

[Bug Report] hook_normalized is inconsistent between RMSNorm and LayerNorm #747

[Bug Report] hook_normalized is inconsistent between RMSNorm and LayerNorm #747

Comments

neelnanda-io commented Oct 6, 2024

bryce13950 commented Oct 15, 2024