Skip to content

Commit

Permalink
[Bugfix][Model] Fix Mllama SDPA illegal memory access for batched mul…
Browse files Browse the repository at this point in the history
…ti-image (vllm-project#9626)

Signed-off-by: mgoin <[email protected]>
Signed-off-by: Erkin Sagiroglu <[email protected]>
  • Loading branch information
mgoin authored and Erkin Sagiroglu committed Oct 26, 2024
1 parent cb869eb commit e81a041
Showing 1 changed file with 5 additions and 3 deletions.
8 changes: 5 additions & 3 deletions vllm/model_executor/models/mllama.py
Original file line number Diff line number Diff line change
Expand Up @@ -795,17 +795,19 @@ def attention_with_mask(
kv_len = k.shape[0]
q = q.transpose(0, 1).view(self.num_local_key_value_heads,
self.num_key_value_groups, q_len,
self.head_dim)
self.head_dim).contiguous()
k = k.transpose(0,
1)[:,
None, :, :].expand(self.num_local_key_value_heads,
self.num_key_value_groups,
kv_len, self.head_dim)
kv_len,
self.head_dim).contiguous()
v = v.transpose(0,
1)[:,
None, :, :].expand(self.num_local_key_value_heads,
self.num_key_value_groups,
kv_len, self.head_dim)
kv_len,
self.head_dim).contiguous()
attention_mask = attention_mask.view(1, 1, q_len, kv_len)
output = F.scaled_dot_product_attention(q,
k,
Expand Down

0 comments on commit e81a041

Please sign in to comment.