Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use auto-tuner to improve conv2d_gemm performance #6117

Merged
merged 4 commits into from
Aug 26, 2020

Conversation

giuseros
Copy link
Contributor

High level description of this contribution

The following tuning entities have been introduced:

  • Unrolling and vectorizing input matrix transform
  • Reordering gemm to exploit parallel threads
  • Unrolling gemm_quantized intrinsic
  • Interleaving gemm_quantized intrinsic

Main files touched:

  • topi/python/topi/arm_cpu/tensor_intrin.py
  • topi/python/topi/arm_cpu/conv2_gemm.py

RFC

The RFC for this submission is available here

Change-Id: Icd3ab005663f78a80672e71ef368f6d0efa4a401

@giuseros
Copy link
Contributor Author

cc @FrozenGene @anijain2305 @u99127

Please note that since I will be off from Friday (for 15 days), I might turn this into a draft and pick it up when I come back (since it is self-contained it should not be an issue)

@giuseros giuseros marked this pull request as draft July 23, 2020 17:07
@tqchen tqchen added the status: need update need update based on feedbacks label Aug 10, 2020
@giuseros giuseros marked this pull request as ready for review August 12, 2020 09:35
@giuseros
Copy link
Contributor Author

Hi @anijain2305 , I just got back from holidays and ready for reviewing this !

Giuseppe Rossini added 2 commits August 12, 2020 12:06
The following tuning entities have been introduced:
- Unrolling and vectorizing input matrix transform
- Reordering gemm to exploit parallel threads
- Unrolling `gemm_quantized` intrinsic
- Interleaving `gemm_quantized` intrinsic

Change-Id: Icd3ab005663f78a80672e71ef368f6d0efa4a401
Change-Id: Id27b6de705b16b93df8e885868961fa0321497be
@giuseros giuseros force-pushed the autotuner_improvements branch from f3565f2 to 07a330a Compare August 12, 2020 11:18
Change-Id: I77d880424c3e7ce9de67c970ddb2cf2a92b52f79
@giuseros
Copy link
Contributor Author

Hi @FrozenGene , @anijain2305 ,
Any update on this?

Thanks a lot,
Giuseppe

Copy link
Contributor

@anijain2305 anijain2305 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! @FrozenGene Please review and merge if it looks good.

@FrozenGene
Copy link
Member

i can not spare time in reviewing this. Will do it tomorrow.

Copy link
Member

@FrozenGene FrozenGene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

last comment

python/tvm/topi/arm_cpu/conv2d_int8.py Outdated Show resolved Hide resolved
Change-Id: Ic58d1138ab96d58d12f5855f0e1044f10d9e6e9b
@giuseros
Copy link
Contributor Author

Hi @FrozenGene ,
Any update on this?

Thanks again,
Giuseppe

@FrozenGene FrozenGene merged commit 617949d into apache:master Aug 26, 2020
@FrozenGene
Copy link
Member

Thanks @giuseros @anijain2305 It is merged now.

trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Aug 26, 2020
* Use auto-tuner to improve conv2d_gemm performance

The following tuning entities have been introduced:
- Unrolling and vectorizing input matrix transform
- Reordering gemm to exploit parallel threads
- Unrolling `gemm_quantized` intrinsic
- Interleaving `gemm_quantized` intrinsic

Change-Id: Icd3ab005663f78a80672e71ef368f6d0efa4a401

* Rebasing

Change-Id: Id27b6de705b16b93df8e885868961fa0321497be

* Fix python linting

Change-Id: I77d880424c3e7ce9de67c970ddb2cf2a92b52f79

* Fusing batch into inner dimensions before parallelizing

Change-Id: Ic58d1138ab96d58d12f5855f0e1044f10d9e6e9b
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Aug 26, 2020
* Use auto-tuner to improve conv2d_gemm performance

The following tuning entities have been introduced:
- Unrolling and vectorizing input matrix transform
- Reordering gemm to exploit parallel threads
- Unrolling `gemm_quantized` intrinsic
- Interleaving `gemm_quantized` intrinsic

Change-Id: Icd3ab005663f78a80672e71ef368f6d0efa4a401

* Rebasing

Change-Id: Id27b6de705b16b93df8e885868961fa0321497be

* Fix python linting

Change-Id: I77d880424c3e7ce9de67c970ddb2cf2a92b52f79

* Fusing batch into inner dimensions before parallelizing

Change-Id: Ic58d1138ab96d58d12f5855f0e1044f10d9e6e9b
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Aug 26, 2020
* Use auto-tuner to improve conv2d_gemm performance

The following tuning entities have been introduced:
- Unrolling and vectorizing input matrix transform
- Reordering gemm to exploit parallel threads
- Unrolling `gemm_quantized` intrinsic
- Interleaving `gemm_quantized` intrinsic

Change-Id: Icd3ab005663f78a80672e71ef368f6d0efa4a401

* Rebasing

Change-Id: Id27b6de705b16b93df8e885868961fa0321497be

* Fix python linting

Change-Id: I77d880424c3e7ce9de67c970ddb2cf2a92b52f79

* Fusing batch into inner dimensions before parallelizing

Change-Id: Ic58d1138ab96d58d12f5855f0e1044f10d9e6e9b
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Sep 2, 2020
* Use auto-tuner to improve conv2d_gemm performance

The following tuning entities have been introduced:
- Unrolling and vectorizing input matrix transform
- Reordering gemm to exploit parallel threads
- Unrolling `gemm_quantized` intrinsic
- Interleaving `gemm_quantized` intrinsic

Change-Id: Icd3ab005663f78a80672e71ef368f6d0efa4a401

* Rebasing

Change-Id: Id27b6de705b16b93df8e885868961fa0321497be

* Fix python linting

Change-Id: I77d880424c3e7ce9de67c970ddb2cf2a92b52f79

* Fusing batch into inner dimensions before parallelizing

Change-Id: Ic58d1138ab96d58d12f5855f0e1044f10d9e6e9b
trevor-m pushed a commit to neo-ai/tvm that referenced this pull request Sep 3, 2020
* Use auto-tuner to improve conv2d_gemm performance

The following tuning entities have been introduced:
- Unrolling and vectorizing input matrix transform
- Reordering gemm to exploit parallel threads
- Unrolling `gemm_quantized` intrinsic
- Interleaving `gemm_quantized` intrinsic

Change-Id: Icd3ab005663f78a80672e71ef368f6d0efa4a401

* Rebasing

Change-Id: Id27b6de705b16b93df8e885868961fa0321497be

* Fix python linting

Change-Id: I77d880424c3e7ce9de67c970ddb2cf2a92b52f79

* Fusing batch into inner dimensions before parallelizing

Change-Id: Ic58d1138ab96d58d12f5855f0e1044f10d9e6e9b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: need review status: need update need update based on feedbacks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants