Skip to content

Commit

Permalink
bugfix with beta value in attention computation
Browse files Browse the repository at this point in the history
  • Loading branch information
Ofir Press committed Sep 16, 2021
1 parent 06dc2d7 commit 5b5afb2
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion megatron/model/transformer.py
Original file line number Diff line number Diff line change
Expand Up @@ -304,7 +304,7 @@ def forward(self, hidden_states, attention_mask, layer_past=None,
matmul_result,
query_layer.transpose(0, 1), # [b * np, sq, hn]
key_layer.transpose(0, 1).transpose(1, 2), # [b * np, hn, sk]
beta=0.0, alpha=(1.0/self.norm_factor))
beta=0.0 if alibi is None else 1.0, alpha=(1.0/self.norm_factor))

# change view to [b, np, sq, sk]
attention_scores = matmul_result.view(*output_size)
Expand Down

0 comments on commit 5b5afb2

Please sign in to comment.