[TOPI] Add proper scheduling for dense on CUDA #3923

comaniac · 2019-09-09T17:15:34Z

@icemelon9 please review this PR that adds scheduling for dense OP on CUDA since the original scheduling was too basic to achieve reasonable performance.

The added scheduling was modified from topi/recipe/gemm/cuda_gemm_square.py to achieve high performance (6TFlop/s for 2048x2048 dense matrix after AutoTVM tuning). For small batch (<64) dense, we are still based on the original scheduling but just added a parameter (tile_k) for AutoTVM to tune (~370 GFlop/s for batch size 1 dense computation).

One reason to have separate scheduling for different batch size is that I encountered invalid CUDA kernel errors when applying the high performance scheduling to small batch. I think that may be due to invalid splits, but I am not quite sure. You are welcome to comment and suggest improvements.

comaniac · 2019-09-09T22:59:43Z

@icemelon9 @vinx13 @Huyuwei could you help review this PR?

vinx13 · 2019-09-09T23:58:58Z

@comaniac please look into ci error

icemelon · 2019-09-10T00:02:44Z

Could you also add fallback config for autotvm?

comaniac · 2019-09-10T00:12:27Z

@vinx13 Seems failed a case from NNVM. Will fix it soon.
@icemelon9 Will do it tomorrow.

Thanks!

comaniac · 2019-09-10T23:43:23Z

The above commit:

Fixed the failed unit test that was caused by too small shape (=1) in reduce axis.
Added fallback config for both cases.

topi/python/topi/cuda/dense.py

icemelon

lgtm

topi/python/topi/cuda/dense.py

comaniac · 2019-09-19T18:21:49Z

CI error doesn't relate to this PR and it passed locally. Re-run without change.

Huyuwei · 2019-09-19T21:23:04Z

@comaniac This is merged, thanks!

* add proper scheduling for dense on CUDA * add fallback config and fix unit test * fix corner cases * refactoring * fix bias and add testcase * let fusion happen

add proper scheduling for dense on CUDA

e275423

vinx13 approved these changes Sep 9, 2019

View reviewed changes

add fallback config and fix unit test

3a330d5

fix corner cases

7f1973c

wweic reviewed Sep 11, 2019

View reviewed changes

topi/python/topi/cuda/dense.py Outdated Show resolved Hide resolved

icemelon reviewed Sep 12, 2019

View reviewed changes

topi/python/topi/cuda/dense.py Outdated Show resolved Hide resolved

topi/python/topi/cuda/dense.py Outdated Show resolved Hide resolved

topi/python/topi/cuda/dense.py Outdated Show resolved Hide resolved

tqchen assigned Huyuwei Sep 13, 2019

refactoring

5ab19d4

icemelon approved these changes Sep 13, 2019

View reviewed changes

Huyuwei reviewed Sep 14, 2019

View reviewed changes

topi/python/topi/cuda/dense.py Outdated Show resolved Hide resolved

topi/python/topi/cuda/dense.py Outdated Show resolved Hide resolved

Huyuwei requested changes Sep 15, 2019

View reviewed changes

topi/python/topi/cuda/dense.py Show resolved Hide resolved

topi/python/topi/cuda/dense.py Outdated Show resolved Hide resolved

comaniac force-pushed the dense_cuda_schedule branch from 28113b9 to c5ddc78 Compare September 18, 2019 04:29

fix bias and add testcase

a7b605f

comaniac force-pushed the dense_cuda_schedule branch from c5ddc78 to a7b605f Compare September 18, 2019 04:30

comaniac requested a review from Huyuwei September 18, 2019 06:04

let fusion happen

f934a48

comaniac force-pushed the dense_cuda_schedule branch from 38cdeb6 to f934a48 Compare September 19, 2019 00:49

comaniac closed this Sep 19, 2019

comaniac reopened this Sep 19, 2019

comaniac closed this Sep 19, 2019

comaniac reopened this Sep 19, 2019

Huyuwei approved these changes Sep 19, 2019

View reviewed changes

Huyuwei merged commit bec08fe into apache:master Sep 19, 2019

comaniac deleted the dense_cuda_schedule branch September 19, 2019 21:24

tqchen mentioned this pull request Nov 8, 2019

[RELEASE][DRAFT] TVM v0.6 Release candidate #4259

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TOPI] Add proper scheduling for dense on CUDA #3923

[TOPI] Add proper scheduling for dense on CUDA #3923

comaniac commented Sep 9, 2019

comaniac commented Sep 9, 2019

vinx13 commented Sep 9, 2019

icemelon commented Sep 10, 2019

comaniac commented Sep 10, 2019

comaniac commented Sep 10, 2019

icemelon left a comment

comaniac commented Sep 19, 2019

Huyuwei commented Sep 19, 2019

[TOPI] Add proper scheduling for dense on CUDA #3923

[TOPI] Add proper scheduling for dense on CUDA #3923

Conversation

comaniac commented Sep 9, 2019

comaniac commented Sep 9, 2019

vinx13 commented Sep 9, 2019

icemelon commented Sep 10, 2019

comaniac commented Sep 10, 2019

comaniac commented Sep 10, 2019

icemelon left a comment

Choose a reason for hiding this comment

comaniac commented Sep 19, 2019

Huyuwei commented Sep 19, 2019