Improve parallelisation strategy for conv2d #15171

pavlejosipovic · 2024-11-18T16:04:17Z

Conv2d determines number of work items along height and width dim and than tries to map to an array of tensix cores in 1d or 2d depending on the sharding strategy selected. In case this number of work items is a primer number or close to the prime number, conv2d will map one core or a handful of cores.
In addition to poor performance in these cases, this leads to out-of-memory issues since conv2d can use small number of cores to store the sharded tensors, and single core gets to process large chunk of work.

To improve on this we need to pad number of work items so that we can distribute the the workload in an effective way over the grid of tensix cores.

# Please enter the commit message for your changes. Lines starting # with '#' will be kept; you may remove them yourself if you want to. # An empty message aborts the commit. # # Date: Tue Nov 5 10:20:56 2024 +0000 # # On branch pjosipovic/conv2d_better_parallelization # Your branch is up to date with 'origin/pjosipovic/conv2d_better_parallelization'. # # Changes to be committed: # modified: models/demos/ttnn_resnet/tt/ttnn_functional_resnet50_new_conv_api.py # modified: models/demos/wormhole/stable_diffusion/tt/ttnn_functional_downsample_2d_new_conv.py # modified: models/demos/wormhole/stable_diffusion/tt/ttnn_functional_resnetblock2d_new_conv.py # modified: models/demos/yolov4/ttnn/downsample1.py # modified: models/experimental/functional_unet/tt/unet_shallow_ttnn.py # modified: tests/sweep_framework/sweeps/conv2d/short/conv2d_short_sweep.py # modified: tests/ttnn/unit_tests/operations/test_maxpool2d.py # modified: tests/ttnn/unit_tests/operations/test_new_conv2d.py # modified: ttnn/cpp/ttnn/operations/conv/conv2d/conv2d.cpp # modified: ttnn/cpp/ttnn/operations/conv/conv2d/conv2d.hpp # modified: ttnn/cpp/ttnn/operations/conv/conv2d/conv2d_pybind.cpp # modified: ttnn/cpp/ttnn/operations/conv/conv2d/device/conv2d_op.cpp # modified: ttnn/cpp/ttnn/operations/conv/conv2d/device/conv2d_op.hpp # modified: ttnn/cpp/ttnn/operations/conv/conv2d/device/conv2d_op_sharded_program_factory.cpp # modified: ttnn/cpp/ttnn/operations/conv/conv2d/device/conv2d_op_width_sharded_program_factory.cpp # modified: ttnn/cpp/ttnn/operations/conv/conv_transpose2d/conv_transpose2d.cpp # modified: ttnn/cpp/ttnn/operations/matmul/device/matmul_op.cpp # modified: ttnn/cpp/ttnn/operations/pool/maxpool/max_pool2d.cpp # modified: ttnn/ttnn/__init__.py # modified: ttnn/ttnn/operations/conv2d.py #

kmabeeTT · 2024-11-28T05:48:18Z

Hi @pavlejosipovic - We hit compile error in tt-mlir after uplifting to latest tt-metal that includes this change because new field enable_channels_padding on determine_parallel_config() was added but we don't set it. What is guidance for setting this field? FYI @LPanosTT

- due to tt-metal changes from tenstorrent/tt-metal#15171

pavlejosipovic · 2024-11-28T10:31:17Z

Hi @pavlejosipovic - We hit compile error in tt-mlir after uplifting to latest tt-metal that includes this change because new field enable_channels_padding on determine_parallel_config() was added but we don't set it. What is guidance for setting this field? FYI @LPanosTT

For conv ops it should be set to true, and for max_pool it should be set to false.
Hopefully, we will enable this feature for max_pool as well, and get auto-sharing support as well so you won't have to use this function.

- due to tt-metal changes from tenstorrent/tt-metal#15171 (cherry picked from commit 14cbe04)

pavlejosipovic added bug Something isn't working CNNs op_cat: conv2D 2D convolution for CNNs P1 labels Nov 18, 2024

pavlejosipovic self-assigned this Nov 18, 2024

pavlejosipovic mentioned this issue Nov 18, 2024

#15171: Better parallelization strategy #15172

Merged

5 tasks

pavlejosipovic pushed a commit that referenced this issue Nov 19, 2024

#15171: Better parallelization strategy

b0f0830

ayerofieiev-tt added this to PyTorch 2.0 TT-NN Compiler Nov 21, 2024

ayerofieiev-tt moved this to Todo in PyTorch 2.0 TT-NN Compiler Nov 21, 2024

ayerofieiev-tt moved this from Todo to In Progress in PyTorch 2.0 TT-NN Compiler Nov 21, 2024

pavlejosipovic pushed a commit that referenced this issue Nov 25, 2024

#15171: Better parallelization strategy

ff4f0cc

pavlejosipovic pushed a commit that referenced this issue Nov 25, 2024

#15171: Better parallelization strategy

1610f6f

pavlejosipovic pushed a commit that referenced this issue Nov 25, 2024

#15171: Better parallelization strategy

20b3639

pavlejosipovic pushed a commit that referenced this issue Nov 25, 2024

#15171: Better parallelization strategy

632ec64

pavlejosipovic pushed a commit that referenced this issue Nov 25, 2024

#15171: PR feedback

4d0a28a

pavlejosipovic pushed a commit that referenced this issue Nov 27, 2024

#15171: Better parallelization strategy

a64a5f5

pavlejosipovic pushed a commit that referenced this issue Nov 27, 2024

#15171: PR feedback

33fbf4f

pavlejosipovic pushed a commit that referenced this issue Nov 27, 2024

#15171: Better parallelization strategy

8ccf99e

pavlejosipovic pushed a commit that referenced this issue Nov 27, 2024

#15171: Better parallelization strategy

236622e

kmabeeTT added a commit to tenstorrent/tt-mlir that referenced this issue Nov 28, 2024

Try enable_channels_padding=false to solve compile error

14cbe04

- due to tt-metal changes from tenstorrent/tt-metal#15171

kmabeeTT added a commit to tenstorrent/tt-mlir that referenced this issue Nov 28, 2024

Try enable_channels_padding=false to solve compile error

2e2ecfd

- due to tt-metal changes from tenstorrent/tt-metal#15171 (cherry picked from commit 14cbe04)

pavlejosipovic closed this as completed Dec 2, 2024

github-project-automation bot moved this from In Progress to Done in PyTorch 2.0 TT-NN Compiler Dec 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve parallelisation strategy for conv2d #15171

Improve parallelisation strategy for conv2d #15171

pavlejosipovic commented Nov 18, 2024

kmabeeTT commented Nov 28, 2024

pavlejosipovic commented Nov 28, 2024

Improve parallelisation strategy for conv2d #15171

Improve parallelisation strategy for conv2d #15171

Comments

pavlejosipovic commented Nov 18, 2024

kmabeeTT commented Nov 28, 2024

pavlejosipovic commented Nov 28, 2024