Skip to content

Commit

Permalink
Force MHA QKV onto fp32 (NVIDIA#5391) (NVIDIA#5395)
Browse files Browse the repository at this point in the history
Signed-off-by: smajumdar <[email protected]>

Signed-off-by: smajumdar <[email protected]>

Signed-off-by: smajumdar <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Hainan Xu <[email protected]>
  • Loading branch information
2 people authored and Hainan Xu committed Nov 29, 2022
1 parent dbc368a commit 919bd68
Showing 1 changed file with 5 additions and 0 deletions.
5 changes: 5 additions & 0 deletions nemo/collections/asr/parts/submodules/multi_head_attention.py
Original file line number Diff line number Diff line change
Expand Up @@ -142,6 +142,8 @@ def forward(self, query, key, value, mask, pos_emb=None, cache=None, cache_next=

if torch.is_autocast_enabled():
query, key, value = query.to(torch.float32), key.to(torch.float32), value.to(torch.float32)

# temporary until we solve this more gracefully
with avoid_float16_autocast_context():
q, k, v = self.forward_qkv(query, key, value)
scores = torch.matmul(q, k.transpose(-2, -1)) / self.s_d_k
Expand Down Expand Up @@ -218,6 +220,9 @@ def forward(self, query, key, value, mask, pos_emb, cache=None, cache_next=None)
"""
key, value, query = self.update_cache(key=key, value=value, query=query, cache=cache, cache_next=cache_next)

if torch.is_autocast_enabled():
query, key, value = query.to(torch.float32), key.to(torch.float32), value.to(torch.float32)

# temporary until we solve this more gracefully
with avoid_float16_autocast_context():
q, k, v = self.forward_qkv(query, key, value)
Expand Down

0 comments on commit 919bd68

Please sign in to comment.