fix: Always initialize bias to zero for ColumnParallelLinear. #1490

gesanqiu · 2023-10-27T12:26:33Z

This PR should fix the problem of #1411.

In #1181, @zhuohan123 refactor the codes of ColumnParallelLinear, and removed the self.bias.zero_() statement, which result in some unknow behaviors in the model init phase. vLLM will do a forward inprofile_num_available_blocks, and hidden_state will include nan value when executing the second DecodeLayer forwarding.
To be more precise, some wired phenomenon happened to the self.bias of ColumnParallelLinear:

self.bias will include some large value from the second attention layer if we just init the whole model;

These larger value will result in some nan value in the attn_output.
If I add a breakpoint at this statement:

        self.qkv_proj = ColumnParallelLinear(
            hidden_size,
            3 * self.total_num_heads * self.head_dim,
            bias=True,
            gather_output=False,
        )

step into its construct function, run the __init__() step by step, self.bias seems to be a zero tensor, but still bring in some accuracy error when forwarding, qkv, _ = self.qkv_proj(hidden_states) is different to the result in vLLM==0.2.0.

I'm no a expert of Pytorch, and can't explain why these wired phenomenon happened. But this PR could fix the bug.

fix: Always initialize bias to zero for ColumnParallelLinear.

84468bf

gesanqiu mentioned this pull request Oct 27, 2023

InternLM 20B Support #1411

Closed

fix: don't skip first special token.

239ac1b

gesanqiu closed this Oct 28, 2023

gesanqiu deleted the bug-fix branch October 28, 2023 09:27

gesanqiu restored the bug-fix branch October 28, 2023 09:29

gesanqiu deleted the bug-fix branch October 28, 2023 09:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Always initialize bias to zero for ColumnParallelLinear. #1490

fix: Always initialize bias to zero for ColumnParallelLinear. #1490

gesanqiu commented Oct 27, 2023

fix: Always initialize bias to zero for ColumnParallelLinear. #1490

fix: Always initialize bias to zero for ColumnParallelLinear. #1490

Conversation

gesanqiu commented Oct 27, 2023