Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Always initialize bias to zero for ColumnParallelLinear. #1490

Closed
wants to merge 2 commits into from

Conversation

gesanqiu
Copy link
Contributor

This PR should fix the problem of #1411.

In #1181, @zhuohan123 refactor the codes of ColumnParallelLinear, and removed the self.bias.zero_() statement, which result in some unknow behaviors in the model init phase. vLLM will do a forward inprofile_num_available_blocks, and hidden_state will include nan value when executing the second DecodeLayer forwarding.
To be more precise, some wired phenomenon happened to the self.bias of ColumnParallelLinear:

  1. self.bias will include some large value from the second attention layer if we just init the whole model;
    image
    These larger value will result in some nan value in the attn_output.

  2. If I add a breakpoint at this statement:

        self.qkv_proj = ColumnParallelLinear(
            hidden_size,
            3 * self.total_num_heads * self.head_dim,
            bias=True,
            gather_output=False,
        )

step into its construct function, run the __init__() step by step, self.bias seems to be a zero tensor, but still bring in some accuracy error when forwarding, qkv, _ = self.qkv_proj(hidden_states) is different to the result in vLLM==0.2.0.
image
image

I'm no a expert of Pytorch, and can't explain why these wired phenomenon happened. But this PR could fix the bug.

@gesanqiu gesanqiu mentioned this pull request Oct 27, 2023
@gesanqiu gesanqiu closed this Oct 28, 2023
@gesanqiu gesanqiu deleted the bug-fix branch October 28, 2023 09:27
@gesanqiu gesanqiu restored the bug-fix branch October 28, 2023 09:29
@gesanqiu gesanqiu deleted the bug-fix branch October 28, 2023 09:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant