Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

summer-ospp 2022: 飞桨PaddlePaddle Sparse Conv开发和优化: gather-gemm-scatter fuse #46679

Merged
merged 88 commits into from
Oct 31, 2022

Conversation

umiswing
Copy link
Member

@umiswing umiswing commented Sep 30, 2022

PR types

Performance optimization

PR changes

APIs, Others

Describe

  1. Include cutlass in Paddle, add cutlass.cmake to download cutlass v2.9.1 automatically.
  2. Fuse gather, gemm and scatter in Conv3dCOOGpuKernel with cutlass, this fusion supports fp16, fp32 and fp64, but only works for channels cutlass supports. I remain the old code for other channels.
  3. Add gather_gemm_scatter.h and gather_gemm_scatter.cu. These two files define the details of the fused kernel.

Following is a comparison of running 500 times forward with gather-gemm-grouped fusion and without this fusion on GeForce RTX 3060 Mobile GPU. This fusion can achieve 1.80 speedups at most.

time(seconds)\shape 0(subm) 1(subm) 2(conv) 3(subm) 4(conv) 5(subm) 6(conv) 7(subm) 8(subm) 9(conv)
fp16
fused 0.948059366 0.948617946 0.918128433 0.654550546 1.247602738 0.643608378 3.765412761 0.467916381 0.469078923 0.166027756
old 0.963316033 0.978123623 1.100062921 0.85829027 1.841123472 1.073228994 6.164761058 0.813147195 0.815460634 0.211744664
spconv 2.1.21 0.370279025 0.356675464 1.255860301 0.67564418 1.989892343 0.646101865 4.651363121 0.397059219 0.399020844 0.274251656
old/fused 1.016092523 1.031103857 1.198157993 1.311266602 1.475728945 1.667518682 1.637207246 1.737804505 1.738429492 1.275357019
spconv/fused 0.390565231 0.375994852 1.367848174 1.032226135 1.594972728 1.00387423 1.235286386 0.848568751 0.850647566 1.651842214
fp32
fused 1.074077862 1.076842442 1.068928335 0.834602646 1.765838748 1.058910525 5.879794197 0.851036824 0.851216344 0.211274696
old 1.017575127 1.055912888 1.444613468 1.186929089 2.804123872 1.722058823 10.58918662 1.401214913 1.40421477 0.327167335
spconv 2.1.21 0.367197311 0.379458345 1.375788394 0.765276265 2.479875851 1.204778016 7.822352023 0.972676926 0.973764882 0.298347718
old/fused 0.947394191 0.980563959 1.351459607 1.422148725 1.58798411 1.626255271 1.800945112 1.646479769 1.649656729 1.548540082
spconv/fused 0.341872153 0.352380562 1.287072621 0.916934866 1.404361442 1.137752423 1.330378541 1.142931655 1.143968732 1.412131806
fp64
fused 1.420543399 1.208440223 2.657113268 2.489402152 9.968613399 9.779180488 65.87212427 9.246852072 9.22867435 1.489004803
old 1.544003572 1.587914612 3.242279789 3.420265195 11.49426685 11.72775161 oom 11.08054339 11.07393647 1.80335558
spconv 2.1.21(not support)
old/fused 1.086910525 1.314019992 1.220226412 1.37393036 1.153045703 1.199257098 1.198304386 1.199948774 1.211114683

Shape details:

(x, y, z) in_channels out_channels kernel_size strides paddings nnz
0 (41, 1600, 1408) 4 16 (3, 3, 3) (1, 1, 1) (0, 0, 0) 136000
1 (41, 1600, 1408) 16 16 (3, 3, 3) (1, 1, 1) (0, 0, 0) 136000
2 (41, 1600, 1408) 16 32 (3, 3, 3) (2, 2, 2) (1, 1, 1) 136000
3 (21, 800, 704) 32 32 (3, 3, 3) (1, 1, 1) (0, 0, 0) 220939
4 (21, 800, 704) 32 64 (3, 3, 3) (2, 2, 2) (1, 1, 1) 220939
5 (11, 400, 352) 64 64 (3, 3, 3) (1, 1, 1) (0, 0, 0) 146376
6 (11, 400, 352) 64 64 (3, 3, 3) (1, 1, 1) (0, 0, 0) 146376
7 (5, 200, 176) 64 64 (3, 3, 3) (2, 2, 2) (0, 1, 1) 65421
8 (5, 200, 176) 64 64 (3, 3, 3) (1, 1, 1) (0, 0, 0) 65421
9 (5, 200, 176) 64 64 (3, 1, 1) (2, 1, 1) (0, 0, 0) 65421

umiswing and others added 30 commits June 30, 2022 15:34
* Convert slice+grad oneDNN fluid kernels to PHI

* Change mutable_data to Alloc

* Refactor licences
…t in dy2static (PaddlePaddle#46128)

* add numpy api err msg

* fix bug

* fix unittest

* add set_state_dict err

* rewrite numpy_api_check

* add define

* change err msg

* fix test

* move import statement
@umiswing umiswing changed the title summer-ospp 2022: 飞桨PaddlePaddle Sparse Conv开发和优化-group-gather-gemm fuse summer-ospp 2022: 飞桨PaddlePaddle Sparse Conv开发和优化: gather-gemm-scatter fuse Oct 31, 2022
@@ -1,11 +1,8 @@
/* Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里应该是有空行的?

zkh2016
zkh2016 previously approved these changes Oct 31, 2022
Copy link
Contributor

@zkh2016 zkh2016 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM。
辛苦后面补充下反向融合的实现。

@zkh2016 zkh2016 merged commit 5158fa4 into PaddlePaddle:develop Oct 31, 2022
lanxianghit pushed a commit that referenced this pull request Feb 2, 2023
cherry-pick some PR about optimize sparse kernel and fix some bug:
#47736 #47703 #47604 #46679 #48439 #49009 #49734
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributor External developers
Projects
None yet
Development

Successfully merging this pull request may close these issues.