#13204: adjust matmul program config selection for some sharded output scenarios #13819

bbradelTT · 2024-10-15T17:20:40Z

Ticket

Link to Github Issue #13204

Problem description

The matmul program config selection for a single core where Kt is not a multiple of 2 and output is sharded leads to choosing a program config that cannot handle sharded outputs

What's changed

Add checks for if the output is sharded, and choose an appropriate in0_block_w and mcast config

Checklist

Post commit CI passes https://github.com/tenstorrent/tt-metal/actions/runs/11351181128
Blackhole Post commit (if applicable) N/A
Model regression CI testing passes (if applicable) https://github.com/tenstorrent/tt-metal/actions/runs/11351188736
Device performance regression CI testing passes (if applicable) https://github.com/tenstorrent/tt-metal/actions/runs/11351192404 no worse than main https://github.com/tenstorrent/tt-metal/actions/runs/11350856932
New/Existing tests provide coverage for changes

T3k frequent tests passes https://github.com/tenstorrent/tt-metal/actions/runs/11351278842

TT-BrianLiu

Looks good! One suggesion about in0_block_w in create_simple_matmul_program_config

TT-BrianLiu · 2024-10-15T21:03:52Z

ttnn/cpp/ttnn/operations/matmul/device/matmul_op.cpp

+    // MatmulMultiCoreProgramConfig does not support sharded output.
+    // Reduce in0_block_w if necessary to choose other configs.
+    if (mem_config.is_sharded() and Kt % in0_block_w != 0) {
+        in0_block_w = 1;
+    }
+
    if (num_blocks_x * num_blocks_y <= num_cores_x * num_cores_y and Kt % in0_block_w == 0) {
        CoreCoord core_range = get_core_range(num_blocks_y, num_blocks_x, num_cores_y, num_cores_x);
-        if (core_range.y == 1) {
+        bool use_mcast_config = mem_config.is_sharded() and core_range.y == 0;
+        if (core_range.y == 1 or (use_mcast_config and mem_config.memory_layout == TensorMemoryLayout::WIDTH_SHARDED)) {


Would it be better to do something like this for in0_block_w:

uint32_t in0_block_w = (Kt % 2 == 0) ? 2 : 1;

and remove the code from lines 366-371 and the check for Kt % in0_block_w == 0 on line 372?

Yes it would. Unfortunately, that would require updating the PCCs for models/tests. I'm hoping to refactor this code, and will look at it then.

…t scenarios

…sharded output scenarios

…arded output scenarios (tenstorrent#13819) * tenstorrent#13204: adjust matmul program config selection for some sharded output scenarios * tenstorrent#13204: adjust matmul program config selection for some height/block sharded output scenarios * tenstorrent#13204: add interleaved input sharded output matmul test

@odjuricicTT

Workaround introduced in #894 is not needed anymore. The issue was fixed in metal tenstorrent/tt-metal#13819. Closes #891 FYI @odjuricicTT

@odjuricicTT

Workaround introduced in #894 is not needed anymore. The issue was fixed in metal tenstorrent/tt-metal#13819. Closes #891 FYI @odjuricicTT

@odjuricicTT

Workaround introduced in #894 is not needed anymore. The issue was fixed in metal tenstorrent/tt-metal#13819. Closes #891 FYI @odjuricicTT

@odjuricicTT

Workaround introduced in #894 is not needed anymore. The issue was fixed in metal tenstorrent/tt-metal#13819. Closes #891 FYI @odjuricicTT

bbradelTT requested review from ayerofieiev-tt, dmakoviichuk-tt, rfurko-tt, cfjchu, TT-BrianLiu, razorback3, dongjin-na and yugaoTT as code owners October 15, 2024 17:20

bbradelTT temporarily deployed to dev October 15, 2024 17:23 — with GitHub Actions Inactive

bbradelTT temporarily deployed to dev October 15, 2024 17:24 — with GitHub Actions Inactive

bbradelTT temporarily deployed to dev October 15, 2024 17:29 — with GitHub Actions Inactive

bbradelTT temporarily deployed to dev October 15, 2024 17:30 — with GitHub Actions Inactive

bbradelTT temporarily deployed to dev October 15, 2024 17:34 — with GitHub Actions Inactive

bbradelTT temporarily deployed to dev October 15, 2024 17:35 — with GitHub Actions Inactive

bbradelTT had a problem deploying to dev October 15, 2024 17:35 — with GitHub Actions Failure

bbradelTT temporarily deployed to dev October 15, 2024 17:42 — with GitHub Actions Inactive

TT-BrianLiu approved these changes Oct 15, 2024

View reviewed changes

bbradelTT added 3 commits October 15, 2024 21:11

#13204: adjust matmul program config selection for some sharded outpu…

deeb18d

…t scenarios

#13204: adjust matmul program config selection for some height/block …

87eb8cf

…sharded output scenarios

#13204: add interleaved input sharded output matmul test

fe55daf

bbradelTT force-pushed the bbradel-13204_out branch from 4cc5332 to fe55daf Compare October 15, 2024 21:12

bbradelTT merged commit b5ea74f into main Oct 15, 2024
6 of 7 checks passed

bbradelTT deleted the bbradel-13204_out branch October 15, 2024 21:19

bbradelTT mentioned this pull request Oct 15, 2024

[Bug Report] ttnn.matmul fails to chose program config #13204

Closed

azecevicTT added a commit to tenstorrent/tt-mlir that referenced this pull request Nov 13, 2024

Matmul1DProgramConfig workaround removal

697653b

Workaround introduced in #894 is not needed anymore. The issue was fixed in metal tenstorrent/tt-metal#13819. Closes #891 FYI @odjuricicTT

azecevicTT mentioned this pull request Nov 13, 2024

Matmul1DProgramConfig workaround removal tenstorrent/tt-mlir#1248

Merged

azecevicTT added a commit to tenstorrent/tt-mlir that referenced this pull request Nov 13, 2024

Matmul1DProgramConfig workaround removal

b2c0315

Workaround introduced in #894 is not needed anymore. The issue was fixed in metal tenstorrent/tt-metal#13819. Closes #891 FYI @odjuricicTT

azecevicTT added a commit to tenstorrent/tt-mlir that referenced this pull request Nov 14, 2024

Matmul1DProgramConfig workaround removal

a000cc7

Workaround introduced in #894 is not needed anymore. The issue was fixed in metal tenstorrent/tt-metal#13819. Closes #891 FYI @odjuricicTT

azecevicTT added a commit to tenstorrent/tt-mlir that referenced this pull request Nov 14, 2024

Matmul1DProgramConfig workaround removal (#1248)

6dd5eba

Workaround introduced in #894 is not needed anymore. The issue was fixed in metal tenstorrent/tt-metal#13819. Closes #891 FYI @odjuricicTT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

#13204: adjust matmul program config selection for some sharded output scenarios #13819

#13204: adjust matmul program config selection for some sharded output scenarios #13819

bbradelTT commented Oct 15, 2024 •

edited

Loading

TT-BrianLiu left a comment

TT-BrianLiu Oct 15, 2024

bbradelTT Oct 15, 2024

#13204: adjust matmul program config selection for some sharded output scenarios #13819

#13204: adjust matmul program config selection for some sharded output scenarios #13819

Conversation

bbradelTT commented Oct 15, 2024 • edited Loading

Ticket

Problem description

What's changed

Checklist

TT-BrianLiu left a comment

Choose a reason for hiding this comment

TT-BrianLiu Oct 15, 2024

Choose a reason for hiding this comment

bbradelTT Oct 15, 2024

Choose a reason for hiding this comment

bbradelTT commented Oct 15, 2024 •

edited

Loading