Self-Attention Mask Expansion Issue #235

fanoprcs · 2024-10-01T11:01:20Z

Assume a mask of [F, F, F, T, T]. In the encoder, this mask is expanded as follows:
slf_attn_mask = mask.unsqueeze(1).expand(-1, max_len, -1)
This results in the following mask:
[F, F, F, T, T]
[F, F, F, T, T]
[F, F, F, T, T]
[F, F, F, T, T]
[F, F, F, T, T]
The expanded mask is then passed into the scaled dot-product attention module. However, I think this may not be correct, as the fourth and fifth words should not be calculating attention at all.

I think the correct version should be:
[F, F, F, T, T]
[F, F, F, T, T]
[F, F, F, T, T]
[T, T, T, T, T]
[T, T, T, T, T]
Could someone clarify if this is an issue or a misunderstanding by me.

fanoprcs closed this as completed Oct 1, 2024

fanoprcs reopened this Oct 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Self-Attention Mask Expansion Issue #235

Self-Attention Mask Expansion Issue #235

fanoprcs commented Oct 1, 2024

Self-Attention Mask Expansion Issue #235

Self-Attention Mask Expansion Issue #235

Comments

fanoprcs commented Oct 1, 2024