[Llama3.2-11b-vision] Add support for text-only inference through generator api #17105

skhorasganiTT · 2025-01-24T23:57:41Z

Ticket

tenstorrent/vllm#53

Problem description

Llama3.2-11b-vision did not support text-only prompts

What's changed

Added support for text-only prompts in prefill by propagating text_only_inference to the cross attention transformer to skip cross attention layers entirely (possible since prefill is done with 1 user at a time)
Added support for text-only prompts in decode by enabling the use of the full_text_mask in the TtLlamaCrossAttentionTransformerBlock::forward and propagating the full_text_mask_expand_11SD input
Added a test to simple_vision_demo.py for processing a batch with mixed text-only and text-image prompts
Deleted unused forward function in llama_vision_model.py
(unrelated to changes above) updated old decode pcc in test_llama_cross_attention_transformer_text.py

Checklist

Post commit CI passes
Blackhole Post commit (if applicable)
Model regression CI testing passes (if applicable)
Device performance regression CI testing passes (if applicable)
(For models and ops writers) Full new models tests passes
New/Existing tests provide coverage for changes

…eratory api WIP - prefill Signed-off-by: Salar Hosseini <[email protected]>

Signed-off-by: Salar Hosseini <[email protected]>

…rompts Signed-off-by: Salar Hosseini <[email protected]>

Signed-off-by: Salar Hosseini <[email protected]>

skhorasganiTT · 2025-01-25T00:02:15Z

T3K unit/freq/demo tests: https://github.com/tenstorrent/tt-metal/actions/runs/12959616354
Single-card demo tests: https://github.com/tenstorrent/tt-metal/actions/runs/12959618787
All post-commit tests: https://github.com/tenstorrent/tt-metal/actions/runs/12959625351

skhorasganiTT added 5 commits January 23, 2025 17:15

[Llama3.2-11b-vision] Add support for text-only inference through gen…

87f8bf8

…eratory api WIP - prefill Signed-off-by: Salar Hosseini <[email protected]>

support for text-only mllama inference wip 2

28a47dc

Signed-off-by: Salar Hosseini <[email protected]>

Add full_text_mask_expand_11SD back for decode to support text-only p…

db9114b

…rompts Signed-off-by: Salar Hosseini <[email protected]>

fix xattn_transformer and xattn_block tests

faf7187

Signed-off-by: Salar Hosseini <[email protected]>

Add text-only prompts option to vision demo

32bf5a8

Signed-off-by: Salar Hosseini <[email protected]>

skhorasganiTT requested a review from cglagovichTT January 24, 2025 23:57

skhorasganiTT marked this pull request as ready for review January 25, 2025 00:07

skhorasganiTT requested review from yieldthought, mtairum and uaydonat as code owners January 25, 2025 00:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Llama3.2-11b-vision] Add support for text-only inference through generator api #17105

[Llama3.2-11b-vision] Add support for text-only inference through generator api #17105

skhorasganiTT commented Jan 24, 2025

skhorasganiTT commented Jan 25, 2025

[Llama3.2-11b-vision] Add support for text-only inference through generator api #17105

Are you sure you want to change the base?

[Llama3.2-11b-vision] Add support for text-only inference through generator api #17105

Conversation

skhorasganiTT commented Jan 24, 2025

Ticket

Problem description

What's changed

Checklist

skhorasganiTT commented Jan 25, 2025