-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use auto-tuner to improve conv2d_gemm performance #6117
Conversation
cc @FrozenGene @anijain2305 @u99127 Please note that since I will be off from Friday (for 15 days), I might turn this into a draft and pick it up when I come back (since it is self-contained it should not be an issue) |
Hi @anijain2305 , I just got back from holidays and ready for reviewing this ! |
The following tuning entities have been introduced: - Unrolling and vectorizing input matrix transform - Reordering gemm to exploit parallel threads - Unrolling `gemm_quantized` intrinsic - Interleaving `gemm_quantized` intrinsic Change-Id: Icd3ab005663f78a80672e71ef368f6d0efa4a401
f3565f2
to
07a330a
Compare
Change-Id: I77d880424c3e7ce9de67c970ddb2cf2a92b52f79
Hi @FrozenGene , @anijain2305 , Thanks a lot, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! @FrozenGene Please review and merge if it looks good.
i can not spare time in reviewing this. Will do it tomorrow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
last comment
Change-Id: Ic58d1138ab96d58d12f5855f0e1044f10d9e6e9b
Hi @FrozenGene , Thanks again, |
Thanks @giuseros @anijain2305 It is merged now. |
* Use auto-tuner to improve conv2d_gemm performance The following tuning entities have been introduced: - Unrolling and vectorizing input matrix transform - Reordering gemm to exploit parallel threads - Unrolling `gemm_quantized` intrinsic - Interleaving `gemm_quantized` intrinsic Change-Id: Icd3ab005663f78a80672e71ef368f6d0efa4a401 * Rebasing Change-Id: Id27b6de705b16b93df8e885868961fa0321497be * Fix python linting Change-Id: I77d880424c3e7ce9de67c970ddb2cf2a92b52f79 * Fusing batch into inner dimensions before parallelizing Change-Id: Ic58d1138ab96d58d12f5855f0e1044f10d9e6e9b
* Use auto-tuner to improve conv2d_gemm performance The following tuning entities have been introduced: - Unrolling and vectorizing input matrix transform - Reordering gemm to exploit parallel threads - Unrolling `gemm_quantized` intrinsic - Interleaving `gemm_quantized` intrinsic Change-Id: Icd3ab005663f78a80672e71ef368f6d0efa4a401 * Rebasing Change-Id: Id27b6de705b16b93df8e885868961fa0321497be * Fix python linting Change-Id: I77d880424c3e7ce9de67c970ddb2cf2a92b52f79 * Fusing batch into inner dimensions before parallelizing Change-Id: Ic58d1138ab96d58d12f5855f0e1044f10d9e6e9b
* Use auto-tuner to improve conv2d_gemm performance The following tuning entities have been introduced: - Unrolling and vectorizing input matrix transform - Reordering gemm to exploit parallel threads - Unrolling `gemm_quantized` intrinsic - Interleaving `gemm_quantized` intrinsic Change-Id: Icd3ab005663f78a80672e71ef368f6d0efa4a401 * Rebasing Change-Id: Id27b6de705b16b93df8e885868961fa0321497be * Fix python linting Change-Id: I77d880424c3e7ce9de67c970ddb2cf2a92b52f79 * Fusing batch into inner dimensions before parallelizing Change-Id: Ic58d1138ab96d58d12f5855f0e1044f10d9e6e9b
* Use auto-tuner to improve conv2d_gemm performance The following tuning entities have been introduced: - Unrolling and vectorizing input matrix transform - Reordering gemm to exploit parallel threads - Unrolling `gemm_quantized` intrinsic - Interleaving `gemm_quantized` intrinsic Change-Id: Icd3ab005663f78a80672e71ef368f6d0efa4a401 * Rebasing Change-Id: Id27b6de705b16b93df8e885868961fa0321497be * Fix python linting Change-Id: I77d880424c3e7ce9de67c970ddb2cf2a92b52f79 * Fusing batch into inner dimensions before parallelizing Change-Id: Ic58d1138ab96d58d12f5855f0e1044f10d9e6e9b
* Use auto-tuner to improve conv2d_gemm performance The following tuning entities have been introduced: - Unrolling and vectorizing input matrix transform - Reordering gemm to exploit parallel threads - Unrolling `gemm_quantized` intrinsic - Interleaving `gemm_quantized` intrinsic Change-Id: Icd3ab005663f78a80672e71ef368f6d0efa4a401 * Rebasing Change-Id: Id27b6de705b16b93df8e885868961fa0321497be * Fix python linting Change-Id: I77d880424c3e7ce9de67c970ddb2cf2a92b52f79 * Fusing batch into inner dimensions before parallelizing Change-Id: Ic58d1138ab96d58d12f5855f0e1044f10d9e6e9b
High level description of this contribution
The following tuning entities have been introduced:
gemm_quantized
intrinsicgemm_quantized
intrinsicMain files touched:
topi/python/topi/arm_cpu/tensor_intrin.py
topi/python/topi/arm_cpu/conv2_gemm.py
RFC
The RFC for this submission is available here
Change-Id: Icd3ab005663f78a80672e71ef368f6d0efa4a401