Use auto-tuner to improve conv2d_gemm performance #6117

giuseros · 2020-07-22T22:46:38Z

High level description of this contribution

The following tuning entities have been introduced:

Unrolling and vectorizing input matrix transform
Reordering gemm to exploit parallel threads
Unrolling gemm_quantized intrinsic
Interleaving gemm_quantized intrinsic

Main files touched:

topi/python/topi/arm_cpu/tensor_intrin.py
topi/python/topi/arm_cpu/conv2_gemm.py

RFC

The RFC for this submission is available here

Change-Id: Icd3ab005663f78a80672e71ef368f6d0efa4a401

giuseros · 2020-07-22T22:55:35Z

cc @FrozenGene @anijain2305 @u99127

Please note that since I will be off from Friday (for 15 days), I might turn this into a draft and pick it up when I come back (since it is self-contained it should not be an issue)

giuseros · 2020-08-12T09:36:19Z

Hi @anijain2305 , I just got back from holidays and ready for reviewing this !

The following tuning entities have been introduced: - Unrolling and vectorizing input matrix transform - Reordering gemm to exploit parallel threads - Unrolling `gemm_quantized` intrinsic - Interleaving `gemm_quantized` intrinsic Change-Id: Icd3ab005663f78a80672e71ef368f6d0efa4a401

Change-Id: Id27b6de705b16b93df8e885868961fa0321497be

Change-Id: I77d880424c3e7ce9de67c970ddb2cf2a92b52f79

giuseros · 2020-08-18T16:58:05Z

Hi @FrozenGene , @anijain2305 ,
Any update on this?

Thanks a lot,
Giuseppe

anijain2305

LGTM! @FrozenGene Please review and merge if it looks good.

FrozenGene · 2020-08-19T15:49:42Z

i can not spare time in reviewing this. Will do it tomorrow.

FrozenGene

last comment

python/tvm/topi/arm_cpu/conv2d_int8.py

Change-Id: Ic58d1138ab96d58d12f5855f0e1044f10d9e6e9b

giuseros · 2020-08-25T19:29:53Z

Hi @FrozenGene ,
Any update on this?

Thanks again,
Giuseppe

python/tvm/topi/arm_cpu/tensor_intrin.py

FrozenGene · 2020-08-26T06:35:39Z

Thanks @giuseros @anijain2305 It is merged now.

* Use auto-tuner to improve conv2d_gemm performance The following tuning entities have been introduced: - Unrolling and vectorizing input matrix transform - Reordering gemm to exploit parallel threads - Unrolling `gemm_quantized` intrinsic - Interleaving `gemm_quantized` intrinsic Change-Id: Icd3ab005663f78a80672e71ef368f6d0efa4a401 * Rebasing Change-Id: Id27b6de705b16b93df8e885868961fa0321497be * Fix python linting Change-Id: I77d880424c3e7ce9de67c970ddb2cf2a92b52f79 * Fusing batch into inner dimensions before parallelizing Change-Id: Ic58d1138ab96d58d12f5855f0e1044f10d9e6e9b

FrozenGene added the status: need review label Jul 23, 2020

giuseros marked this pull request as draft July 23, 2020 17:07

tqchen assigned anijain2305 Aug 10, 2020

tqchen added the status: need update need update based on feedbacks label Aug 10, 2020

giuseros marked this pull request as ready for review August 12, 2020 09:35

Giuseppe Rossini added 2 commits August 12, 2020 12:06

Rebasing

07a330a

Change-Id: Id27b6de705b16b93df8e885868961fa0321497be

giuseros force-pushed the autotuner_improvements branch from f3565f2 to 07a330a Compare August 12, 2020 11:18

Fix python linting

5bd2a4d

Change-Id: I77d880424c3e7ce9de67c970ddb2cf2a92b52f79

anijain2305 approved these changes Aug 18, 2020

View reviewed changes

FrozenGene approved these changes Aug 21, 2020

View reviewed changes

python/tvm/topi/arm_cpu/conv2d_int8.py Outdated Show resolved Hide resolved

Fusing batch into inner dimensions before parallelizing

1e9ab68

Change-Id: Ic58d1138ab96d58d12f5855f0e1044f10d9e6e9b

FrozenGene approved these changes Aug 26, 2020

View reviewed changes

FrozenGene reviewed Aug 26, 2020

View reviewed changes

python/tvm/topi/arm_cpu/tensor_intrin.py Show resolved Hide resolved

FrozenGene merged commit 617949d into apache:master Aug 26, 2020

ZihengJiang mentioned this pull request Sep 25, 2020

TVM v0.7 Release Note Candidate #6486

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use auto-tuner to improve conv2d_gemm performance #6117

Use auto-tuner to improve conv2d_gemm performance #6117

giuseros commented Jul 22, 2020

giuseros commented Jul 22, 2020

giuseros commented Aug 12, 2020

giuseros commented Aug 18, 2020

anijain2305 left a comment

FrozenGene commented Aug 19, 2020

FrozenGene left a comment

giuseros commented Aug 25, 2020

FrozenGene commented Aug 26, 2020

Use auto-tuner to improve conv2d_gemm performance #6117

Use auto-tuner to improve conv2d_gemm performance #6117

Conversation

giuseros commented Jul 22, 2020

High level description of this contribution

RFC

giuseros commented Jul 22, 2020

giuseros commented Aug 12, 2020

giuseros commented Aug 18, 2020

anijain2305 left a comment

Choose a reason for hiding this comment

FrozenGene commented Aug 19, 2020

FrozenGene left a comment

Choose a reason for hiding this comment

giuseros commented Aug 25, 2020

FrozenGene commented Aug 26, 2020