Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CUTLASS] Support conv2d activation fusion #9746

Merged
merged 2 commits into from
Dec 16, 2021

Conversation

masahi
Copy link
Member

@masahi masahi commented Dec 15, 2021

@comaniac @Laurawly @hwu36 @manishucsd @junrushao1994 @vinx13

This adds conv2d + activation fusion support for bias_add, relu, and fp32 sigmoid. I have pending PRs at the cutlass repo NVIDIA/cutlass#378, NVIDIA/cutlass#379 which will enable fp16 sigmoid, silu, and hardswish fusion as well.

End to end results

All code, models, and nvprof dump etc are available at https://github.com/masahi/tvm-cutlass-eval

All numbers in milli sec, using fp16 models with fp16 accumulation running on tensorcore, measured on RTX 3070

Model name Input size CUTLASS (no fusion) CUTLASS with fusion cuDNN AutoTVM TensorCore TensorRT fp16
resnet50 (8, 3, 224, 224) 3.47 3.16 4.08 4.14 2.53
efficientnet_v2 (8, 3, 224, 224) 8.18 8.26 (7.80 if use fast_math) 14.0 13.2 5.25
DETR-R50 (8, 3, 800, 750) 61.06 57.19 68.4 80.5 NA
deeplabv3_mobilenet_v3_large (8, 3, 512, 512) 10.3 9.16 17.5 15.9 19.2 (?)
YOLOv5l (8, 3, 512, 512) 33.6 25.4 (24.2 if use fast_math) 34.8 N/A N/A

Observations

  ...
  %4 = @tvmgen_default_cutlass_main_3(%3, meta[relay.Constant][2] /* ty=Tensor[(64, 1, 1, 64), float16] */, meta[relay.Constant][3] /* ty=Tensor[(64), float16] */) /* ty=Tensor[(8, 56, 56, 64), float16] */;
  %5 = @tvmgen_default_cutlass_main_6(%4, meta[relay.Constant][4] /* ty=Tensor[(64, 3, 3, 64), float16] */, meta[relay.Constant][5] /* ty=Tensor[(64), float16] */) /* ty=Tensor[(8, 56, 56, 64), float16] */;
  %6 = @tvmgen_default_cutlass_main_9(%5, meta[relay.Constant][6] /* ty=Tensor[(256, 1, 1, 64), float16] */, meta[relay.Constant][7] /* ty=Tensor[(256), float16] */) /* ty=Tensor[(8, 56, 56, 256), float16] */;
  %7 = @tvmgen_default_cutlass_main_12(%3, meta[relay.Constant][8] /* ty=Tensor[(256, 1, 1, 64), float16] */, meta[relay.Constant][9] /* ty=Tensor[(256), float16] */) /* ty=Tensor[(8, 56, 56, 256), float16] */;
  %8 = add(%6, %7) /* ty=Tensor[(8, 56, 56, 256), float16] */;
  %9 = nn.relu(%8) /* ty=Tensor[(8, 56, 56, 256), float16] */;
  %10 = @tvmgen_default_cutlass_main_15(%9, meta[relay.Constant][10] /* ty=Tensor[(64, 1, 1, 256), float16] */, meta[relay.Constant][11] /* ty=Tensor[(64), float16] */) /* ty=Tensor[(8, 56, 56, 64), float16] */;
  %11 = @tvmgen_default_cutlass_main_18(%10, meta[relay.Constant][12] /* ty=Tensor[(64, 3, 3, 64), float16] */, meta[relay.Constant][13] /* ty=Tensor[(64), float16] */) /* ty=Tensor[(8, 56, 56, 64), float16] */;
  %12 = @tvmgen_default_cutlass_main_21(%11, meta[relay.Constant][14] /* ty=Tensor[(256, 1, 1, 64), float16] */, meta[relay.Constant][15] /* ty=Tensor[(256), float16] */) /* ty=Tensor[(8, 56, 56, 256), float16] */;
  %13 = add(%12, %9) /* ty=Tensor[(8, 56, 56, 256), float16] */;
  %14 = nn.relu(%13) /* ty=Tensor[(8, 56, 56, 256), float16] */;
  ...

The intermediate adds are the element-wise addition in the residual block, which can in principle be fused with the preceding conv2d. nvprof output shows that these unfused ops are taking more than 20% of e2e time. If we fuse them, I believe we could approach TRT-level performance.

  • Despite its suboptimality, enabling activation fusion still results in a good speedup.
  • Cutlass results are always better than cuDNN ones, even if fusion is disabled. The difference is especially large for deeplabv3 and efficientnetv2, both of which use a lot of depthwise conv2d. The nvprof output shows that cuDNN is spending a lot of time doing layout transform (see for example). Maybe our use of cuDNN is not optimal in terms of API usage and kernel selections. nvprof dumps for cuDNN-runs are available in the repository, if someone wants to compare against cutlass nvprof dumps.
  • Fusing sigmoid activation into cutlass conv2d results in worse runtime (see efficientnet_v2 row), which I found odd since TVM should be generating essentially the same sigmoid kernel in the unfused case. The fast-math option, discussed in Support half precision sigmoid activation NVIDIA/cutlass#378 (comment), makes it a lot better at the cost of slight accuracy loss.
  • efficientnetv2 and deeplabv3 use a lot of depthwise conv2d, which is not currently offloaded to cutlass. They are actually the bottleneck in the results above, which can be observed by looking at nvprof dumps. Using AutoTVM for them should help bring down e2e time further.

Known issues

  • Kernel profiling time is extremely slow. For DETR-R50, I had to wait like 20 min. More discussion in [CUTLASS] Add conv2d profiler #9737 (comment)
  • Trying to fuse silu activation in YOLOv5l results in a strange type inference error during MergeComposite. So the YOLOv5l result above doesn't use silu fusion. (Fixed a bug in type relation)
  • I also tried MaskRCNN, but currently it results in a strange output (no detection) when cutlass BYOC is used. The same issue also applies to cuDNN offload, but TVM-native kernels (target = "cuda") work correctly.

commit e4e273ae74a8e54ab1ae1414ce9b6bfcc2b3d530
Merge: 0489d14 77c9385
Author: Masahiro Masuda <[email protected]>
Date:   Mon Dec 13 11:58:54 2021 +0900

    Merge branch 'partition-constant-unbind' into cutlass-conv2d-fusion

commit 77c9385
Author: Masahiro Masuda <[email protected]>
Date:   Mon Dec 13 11:58:18 2021 +0900

    add test

commit ab01b3a
Author: Masahiro Masuda <[email protected]>
Date:   Mon Dec 13 11:55:06 2021 +0900

    make constant binding in PartitionGraph optional

commit 0489d14
Author: Masahiro Masuda <[email protected]>
Date:   Sun Dec 12 21:52:29 2021 +0900

    support sigmoid fusion (only fp32 accum for now)

commit 3705bbd
Author: Masahiro Masuda <[email protected]>
Date:   Sun Dec 12 20:50:58 2021 +0900

    conv2d fusion test worked

commit 05b51c9
Author: Masahiro Masuda <[email protected]>
Date:   Sun Dec 12 20:34:10 2021 +0900

    fix bias stride

commit 7cf40e7
Author: Masahiro Masuda <[email protected]>
Date:   Sun Dec 12 20:01:21 2021 +0900

    use nobetascaling

commit 274ec02
Author: Masahiro Masuda <[email protected]>
Date:   Sun Dec 12 19:12:58 2021 +0900

    adding fusion support to codegen

commit 0de5ebd
Author: Masahiro Masuda <[email protected]>
Date:   Sun Dec 12 18:39:08 2021 +0900

    partition working

commit c08bb38
Author: Masahiro Masuda <[email protected]>
Date:   Sun Dec 12 17:24:42 2021 +0900

    update test

commit 81bf9e6
Author: Masahiro Masuda <[email protected]>
Date:   Fri Dec 10 13:23:39 2021 +0900

    add fused conv2d pattern

commit 1c0bbb2
Author: Masahiro Masuda <[email protected]>
Date:   Sun Dec 12 18:29:03 2021 +0900

    fix lint

commit 463574c
Author: Masahiro Masuda <[email protected]>
Date:   Sun Dec 12 17:28:38 2021 +0900

    fixed conv2d check

commit 588c5ab
Author: Masahiro Masuda <[email protected]>
Date:   Sun Dec 12 15:05:27 2021 +0900

    update test

commit a447b57
Author: Masahiro Masuda <[email protected]>
Date:   Sun Dec 12 14:54:52 2021 +0900

    speed up profiling by removing initialization

commit 93cd039
Author: Masahiro Masuda <[email protected]>
Date:   Sun Dec 12 08:26:29 2021 +0900

    fixed nhwc cudnn depthwise conv

commit 6db7172
Author: Masahiro Masuda <[email protected]>
Date:   Sat Dec 11 15:39:05 2021 +0900

    add cache

commit f7d17a1
Author: Masahiro Masuda <[email protected]>
Date:   Sat Dec 11 15:05:38 2021 +0900

    removed im2col profiling for conv2d

commit b724f44
Author: Masahiro Masuda <[email protected]>
Date:   Fri Dec 10 22:57:54 2021 +0900

    black

commit fe4687b
Author: Masahiro Masuda <[email protected]>
Date:   Fri Dec 10 22:49:13 2021 +0900

    fixed cmd arguement

commit ab114f5
Author: Masahiro Masuda <[email protected]>
Date:   Fri Dec 10 22:22:19 2021 +0900

    conv2d profiler working

commit 49ee61f
Author: Masahiro Masuda <[email protected]>
Date:   Fri Dec 10 20:26:15 2021 +0900

    add conv2d profiler

commit 49e2c89
Author: Masahiro Masuda <[email protected]>
Date:   Sun Dec 12 08:03:36 2021 +0900

    do not offload depthwise conv2d

commit cd83677
Author: Masahiro Masuda <[email protected]>
Date:   Fri Dec 10 13:20:01 2021 +0900

    lint fix

commit 870823c
Author: Masahiro Masuda <[email protected]>
Date:   Fri Dec 10 12:54:38 2021 +0900

    add comment on IC == 3 case

commit 6b780db
Author: Masahiro Masuda <[email protected]>
Date:   Fri Dec 10 12:48:33 2021 +0900

    check align on N dim

commit 308c4da
Author: Masahiro Masuda <[email protected]>
Date:   Fri Dec 10 12:34:42 2021 +0900

    fixed check functions for fused cases, run infer type before mergecomposite

commit 8d6a1bf
Author: Masahiro Masuda <[email protected]>
Date:   Fri Dec 10 12:10:59 2021 +0900

    test IC=3 convolution

commit ffce47d
Author: Masahiro Masuda <[email protected]>
Date:   Fri Dec 10 12:10:16 2021 +0900

    use align1 kernel for unusual channel cases (IC = 3 etc)

commit 6cdf205
Author: Masahiro Masuda <[email protected]>
Date:   Fri Dec 10 12:06:56 2021 +0900

    add dtype and layout check in parttern match

commit 7743cc6
Author: Masahiro Masuda <[email protected]>
Date:   Fri Dec 10 10:40:53 2021 +0900

    add sm75 kernels to sm80 profilings

commit efceccb
Author: Masahiro Masuda <[email protected]>
Date:   Fri Dec 10 10:40:42 2021 +0900

    skip legalize when batch size is dynamic

commit 65fbc0a
Author: Masahiro Masuda <[email protected]>
Date:   Fri Dec 10 10:36:36 2021 +0900

    bug fix in im2col encoding
@masahi
Copy link
Member Author

masahi commented Dec 16, 2021

@comaniac @Laurawly Ready to merge?

Copy link
Contributor

@comaniac comaniac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks.

@comaniac comaniac merged commit aa86dc0 into apache:main Dec 16, 2021
@comaniac
Copy link
Contributor

Thanks @masahi

ylc pushed a commit to ylc/tvm that referenced this pull request Jan 7, 2022
* Add cutlass conv2d activation (bias, relu, sigmoid)

commit e4e273ae74a8e54ab1ae1414ce9b6bfcc2b3d530
Merge: 0489d14 77c9385
Author: Masahiro Masuda <[email protected]>
Date:   Mon Dec 13 11:58:54 2021 +0900

    Merge branch 'partition-constant-unbind' into cutlass-conv2d-fusion

commit 77c9385
Author: Masahiro Masuda <[email protected]>
Date:   Mon Dec 13 11:58:18 2021 +0900

    add test

commit ab01b3a
Author: Masahiro Masuda <[email protected]>
Date:   Mon Dec 13 11:55:06 2021 +0900

    make constant binding in PartitionGraph optional

commit 0489d14
Author: Masahiro Masuda <[email protected]>
Date:   Sun Dec 12 21:52:29 2021 +0900

    support sigmoid fusion (only fp32 accum for now)

commit 3705bbd
Author: Masahiro Masuda <[email protected]>
Date:   Sun Dec 12 20:50:58 2021 +0900

    conv2d fusion test worked

commit 05b51c9
Author: Masahiro Masuda <[email protected]>
Date:   Sun Dec 12 20:34:10 2021 +0900

    fix bias stride

commit 7cf40e7
Author: Masahiro Masuda <[email protected]>
Date:   Sun Dec 12 20:01:21 2021 +0900

    use nobetascaling

commit 274ec02
Author: Masahiro Masuda <[email protected]>
Date:   Sun Dec 12 19:12:58 2021 +0900

    adding fusion support to codegen

commit 0de5ebd
Author: Masahiro Masuda <[email protected]>
Date:   Sun Dec 12 18:39:08 2021 +0900

    partition working

commit c08bb38
Author: Masahiro Masuda <[email protected]>
Date:   Sun Dec 12 17:24:42 2021 +0900

    update test

commit 81bf9e6
Author: Masahiro Masuda <[email protected]>
Date:   Fri Dec 10 13:23:39 2021 +0900

    add fused conv2d pattern

commit 1c0bbb2
Author: Masahiro Masuda <[email protected]>
Date:   Sun Dec 12 18:29:03 2021 +0900

    fix lint

commit 463574c
Author: Masahiro Masuda <[email protected]>
Date:   Sun Dec 12 17:28:38 2021 +0900

    fixed conv2d check

commit 588c5ab
Author: Masahiro Masuda <[email protected]>
Date:   Sun Dec 12 15:05:27 2021 +0900

    update test

commit a447b57
Author: Masahiro Masuda <[email protected]>
Date:   Sun Dec 12 14:54:52 2021 +0900

    speed up profiling by removing initialization

commit 93cd039
Author: Masahiro Masuda <[email protected]>
Date:   Sun Dec 12 08:26:29 2021 +0900

    fixed nhwc cudnn depthwise conv

commit 6db7172
Author: Masahiro Masuda <[email protected]>
Date:   Sat Dec 11 15:39:05 2021 +0900

    add cache

commit f7d17a1
Author: Masahiro Masuda <[email protected]>
Date:   Sat Dec 11 15:05:38 2021 +0900

    removed im2col profiling for conv2d

commit b724f44
Author: Masahiro Masuda <[email protected]>
Date:   Fri Dec 10 22:57:54 2021 +0900

    black

commit fe4687b
Author: Masahiro Masuda <[email protected]>
Date:   Fri Dec 10 22:49:13 2021 +0900

    fixed cmd arguement

commit ab114f5
Author: Masahiro Masuda <[email protected]>
Date:   Fri Dec 10 22:22:19 2021 +0900

    conv2d profiler working

commit 49ee61f
Author: Masahiro Masuda <[email protected]>
Date:   Fri Dec 10 20:26:15 2021 +0900

    add conv2d profiler

commit 49e2c89
Author: Masahiro Masuda <[email protected]>
Date:   Sun Dec 12 08:03:36 2021 +0900

    do not offload depthwise conv2d

commit cd83677
Author: Masahiro Masuda <[email protected]>
Date:   Fri Dec 10 13:20:01 2021 +0900

    lint fix

commit 870823c
Author: Masahiro Masuda <[email protected]>
Date:   Fri Dec 10 12:54:38 2021 +0900

    add comment on IC == 3 case

commit 6b780db
Author: Masahiro Masuda <[email protected]>
Date:   Fri Dec 10 12:48:33 2021 +0900

    check align on N dim

commit 308c4da
Author: Masahiro Masuda <[email protected]>
Date:   Fri Dec 10 12:34:42 2021 +0900

    fixed check functions for fused cases, run infer type before mergecomposite

commit 8d6a1bf
Author: Masahiro Masuda <[email protected]>
Date:   Fri Dec 10 12:10:59 2021 +0900

    test IC=3 convolution

commit ffce47d
Author: Masahiro Masuda <[email protected]>
Date:   Fri Dec 10 12:10:16 2021 +0900

    use align1 kernel for unusual channel cases (IC = 3 etc)

commit 6cdf205
Author: Masahiro Masuda <[email protected]>
Date:   Fri Dec 10 12:06:56 2021 +0900

    add dtype and layout check in parttern match

commit 7743cc6
Author: Masahiro Masuda <[email protected]>
Date:   Fri Dec 10 10:40:53 2021 +0900

    add sm75 kernels to sm80 profilings

commit efceccb
Author: Masahiro Masuda <[email protected]>
Date:   Fri Dec 10 10:40:42 2021 +0900

    skip legalize when batch size is dynamic

commit 65fbc0a
Author: Masahiro Masuda <[email protected]>
Date:   Fri Dec 10 10:36:36 2021 +0900

    bug fix in im2col encoding

* support batch norm fusion
ylc pushed a commit to ylc/tvm that referenced this pull request Jan 13, 2022
* Add cutlass conv2d activation (bias, relu, sigmoid)

commit e4e273ae74a8e54ab1ae1414ce9b6bfcc2b3d530
Merge: 0489d14 77c9385
Author: Masahiro Masuda <[email protected]>
Date:   Mon Dec 13 11:58:54 2021 +0900

    Merge branch 'partition-constant-unbind' into cutlass-conv2d-fusion

commit 77c9385
Author: Masahiro Masuda <[email protected]>
Date:   Mon Dec 13 11:58:18 2021 +0900

    add test

commit ab01b3a
Author: Masahiro Masuda <[email protected]>
Date:   Mon Dec 13 11:55:06 2021 +0900

    make constant binding in PartitionGraph optional

commit 0489d14
Author: Masahiro Masuda <[email protected]>
Date:   Sun Dec 12 21:52:29 2021 +0900

    support sigmoid fusion (only fp32 accum for now)

commit 3705bbd
Author: Masahiro Masuda <[email protected]>
Date:   Sun Dec 12 20:50:58 2021 +0900

    conv2d fusion test worked

commit 05b51c9
Author: Masahiro Masuda <[email protected]>
Date:   Sun Dec 12 20:34:10 2021 +0900

    fix bias stride

commit 7cf40e7
Author: Masahiro Masuda <[email protected]>
Date:   Sun Dec 12 20:01:21 2021 +0900

    use nobetascaling

commit 274ec02
Author: Masahiro Masuda <[email protected]>
Date:   Sun Dec 12 19:12:58 2021 +0900

    adding fusion support to codegen

commit 0de5ebd
Author: Masahiro Masuda <[email protected]>
Date:   Sun Dec 12 18:39:08 2021 +0900

    partition working

commit c08bb38
Author: Masahiro Masuda <[email protected]>
Date:   Sun Dec 12 17:24:42 2021 +0900

    update test

commit 81bf9e6
Author: Masahiro Masuda <[email protected]>
Date:   Fri Dec 10 13:23:39 2021 +0900

    add fused conv2d pattern

commit 1c0bbb2
Author: Masahiro Masuda <[email protected]>
Date:   Sun Dec 12 18:29:03 2021 +0900

    fix lint

commit 463574c
Author: Masahiro Masuda <[email protected]>
Date:   Sun Dec 12 17:28:38 2021 +0900

    fixed conv2d check

commit 588c5ab
Author: Masahiro Masuda <[email protected]>
Date:   Sun Dec 12 15:05:27 2021 +0900

    update test

commit a447b57
Author: Masahiro Masuda <[email protected]>
Date:   Sun Dec 12 14:54:52 2021 +0900

    speed up profiling by removing initialization

commit 93cd039
Author: Masahiro Masuda <[email protected]>
Date:   Sun Dec 12 08:26:29 2021 +0900

    fixed nhwc cudnn depthwise conv

commit 6db7172
Author: Masahiro Masuda <[email protected]>
Date:   Sat Dec 11 15:39:05 2021 +0900

    add cache

commit f7d17a1
Author: Masahiro Masuda <[email protected]>
Date:   Sat Dec 11 15:05:38 2021 +0900

    removed im2col profiling for conv2d

commit b724f44
Author: Masahiro Masuda <[email protected]>
Date:   Fri Dec 10 22:57:54 2021 +0900

    black

commit fe4687b
Author: Masahiro Masuda <[email protected]>
Date:   Fri Dec 10 22:49:13 2021 +0900

    fixed cmd arguement

commit ab114f5
Author: Masahiro Masuda <[email protected]>
Date:   Fri Dec 10 22:22:19 2021 +0900

    conv2d profiler working

commit 49ee61f
Author: Masahiro Masuda <[email protected]>
Date:   Fri Dec 10 20:26:15 2021 +0900

    add conv2d profiler

commit 49e2c89
Author: Masahiro Masuda <[email protected]>
Date:   Sun Dec 12 08:03:36 2021 +0900

    do not offload depthwise conv2d

commit cd83677
Author: Masahiro Masuda <[email protected]>
Date:   Fri Dec 10 13:20:01 2021 +0900

    lint fix

commit 870823c
Author: Masahiro Masuda <[email protected]>
Date:   Fri Dec 10 12:54:38 2021 +0900

    add comment on IC == 3 case

commit 6b780db
Author: Masahiro Masuda <[email protected]>
Date:   Fri Dec 10 12:48:33 2021 +0900

    check align on N dim

commit 308c4da
Author: Masahiro Masuda <[email protected]>
Date:   Fri Dec 10 12:34:42 2021 +0900

    fixed check functions for fused cases, run infer type before mergecomposite

commit 8d6a1bf
Author: Masahiro Masuda <[email protected]>
Date:   Fri Dec 10 12:10:59 2021 +0900

    test IC=3 convolution

commit ffce47d
Author: Masahiro Masuda <[email protected]>
Date:   Fri Dec 10 12:10:16 2021 +0900

    use align1 kernel for unusual channel cases (IC = 3 etc)

commit 6cdf205
Author: Masahiro Masuda <[email protected]>
Date:   Fri Dec 10 12:06:56 2021 +0900

    add dtype and layout check in parttern match

commit 7743cc6
Author: Masahiro Masuda <[email protected]>
Date:   Fri Dec 10 10:40:53 2021 +0900

    add sm75 kernels to sm80 profilings

commit efceccb
Author: Masahiro Masuda <[email protected]>
Date:   Fri Dec 10 10:40:42 2021 +0900

    skip legalize when batch size is dynamic

commit 65fbc0a
Author: Masahiro Masuda <[email protected]>
Date:   Fri Dec 10 10:36:36 2021 +0900

    bug fix in im2col encoding

* support batch norm fusion
qsqqsqqsq-intellif pushed a commit to qsqqsqqsq-intellif/tvm that referenced this pull request Apr 29, 2022
* Add cutlass conv2d activation (bias, relu, sigmoid)

commit e4e273ae74a8e54ab1ae1414ce9b6bfcc2b3d530
Merge: 0489d14 77c9385
Author: Masahiro Masuda <[email protected]>
Date:   Mon Dec 13 11:58:54 2021 +0900

    Merge branch 'partition-constant-unbind' into cutlass-conv2d-fusion

commit 77c9385
Author: Masahiro Masuda <[email protected]>
Date:   Mon Dec 13 11:58:18 2021 +0900

    add test

commit ab01b3a
Author: Masahiro Masuda <[email protected]>
Date:   Mon Dec 13 11:55:06 2021 +0900

    make constant binding in PartitionGraph optional

commit 0489d14
Author: Masahiro Masuda <[email protected]>
Date:   Sun Dec 12 21:52:29 2021 +0900

    support sigmoid fusion (only fp32 accum for now)

commit 3705bbd
Author: Masahiro Masuda <[email protected]>
Date:   Sun Dec 12 20:50:58 2021 +0900

    conv2d fusion test worked

commit 05b51c9
Author: Masahiro Masuda <[email protected]>
Date:   Sun Dec 12 20:34:10 2021 +0900

    fix bias stride

commit 7cf40e7
Author: Masahiro Masuda <[email protected]>
Date:   Sun Dec 12 20:01:21 2021 +0900

    use nobetascaling

commit 274ec02
Author: Masahiro Masuda <[email protected]>
Date:   Sun Dec 12 19:12:58 2021 +0900

    adding fusion support to codegen

commit 0de5ebd
Author: Masahiro Masuda <[email protected]>
Date:   Sun Dec 12 18:39:08 2021 +0900

    partition working

commit c08bb38
Author: Masahiro Masuda <[email protected]>
Date:   Sun Dec 12 17:24:42 2021 +0900

    update test

commit 81bf9e6
Author: Masahiro Masuda <[email protected]>
Date:   Fri Dec 10 13:23:39 2021 +0900

    add fused conv2d pattern

commit 1c0bbb2
Author: Masahiro Masuda <[email protected]>
Date:   Sun Dec 12 18:29:03 2021 +0900

    fix lint

commit 463574c
Author: Masahiro Masuda <[email protected]>
Date:   Sun Dec 12 17:28:38 2021 +0900

    fixed conv2d check

commit 588c5ab
Author: Masahiro Masuda <[email protected]>
Date:   Sun Dec 12 15:05:27 2021 +0900

    update test

commit a447b57
Author: Masahiro Masuda <[email protected]>
Date:   Sun Dec 12 14:54:52 2021 +0900

    speed up profiling by removing initialization

commit 93cd039
Author: Masahiro Masuda <[email protected]>
Date:   Sun Dec 12 08:26:29 2021 +0900

    fixed nhwc cudnn depthwise conv

commit 6db7172
Author: Masahiro Masuda <[email protected]>
Date:   Sat Dec 11 15:39:05 2021 +0900

    add cache

commit f7d17a1
Author: Masahiro Masuda <[email protected]>
Date:   Sat Dec 11 15:05:38 2021 +0900

    removed im2col profiling for conv2d

commit b724f44
Author: Masahiro Masuda <[email protected]>
Date:   Fri Dec 10 22:57:54 2021 +0900

    black

commit fe4687b
Author: Masahiro Masuda <[email protected]>
Date:   Fri Dec 10 22:49:13 2021 +0900

    fixed cmd arguement

commit ab114f5
Author: Masahiro Masuda <[email protected]>
Date:   Fri Dec 10 22:22:19 2021 +0900

    conv2d profiler working

commit 49ee61f
Author: Masahiro Masuda <[email protected]>
Date:   Fri Dec 10 20:26:15 2021 +0900

    add conv2d profiler

commit 49e2c89
Author: Masahiro Masuda <[email protected]>
Date:   Sun Dec 12 08:03:36 2021 +0900

    do not offload depthwise conv2d

commit cd83677
Author: Masahiro Masuda <[email protected]>
Date:   Fri Dec 10 13:20:01 2021 +0900

    lint fix

commit 870823c
Author: Masahiro Masuda <[email protected]>
Date:   Fri Dec 10 12:54:38 2021 +0900

    add comment on IC == 3 case

commit 6b780db
Author: Masahiro Masuda <[email protected]>
Date:   Fri Dec 10 12:48:33 2021 +0900

    check align on N dim

commit 308c4da
Author: Masahiro Masuda <[email protected]>
Date:   Fri Dec 10 12:34:42 2021 +0900

    fixed check functions for fused cases, run infer type before mergecomposite

commit 8d6a1bf
Author: Masahiro Masuda <[email protected]>
Date:   Fri Dec 10 12:10:59 2021 +0900

    test IC=3 convolution

commit ffce47d
Author: Masahiro Masuda <[email protected]>
Date:   Fri Dec 10 12:10:16 2021 +0900

    use align1 kernel for unusual channel cases (IC = 3 etc)

commit 6cdf205
Author: Masahiro Masuda <[email protected]>
Date:   Fri Dec 10 12:06:56 2021 +0900

    add dtype and layout check in parttern match

commit 7743cc6
Author: Masahiro Masuda <[email protected]>
Date:   Fri Dec 10 10:40:53 2021 +0900

    add sm75 kernels to sm80 profilings

commit efceccb
Author: Masahiro Masuda <[email protected]>
Date:   Fri Dec 10 10:40:42 2021 +0900

    skip legalize when batch size is dynamic

commit 65fbc0a
Author: Masahiro Masuda <[email protected]>
Date:   Fri Dec 10 10:36:36 2021 +0900

    bug fix in im2col encoding

* support batch norm fusion
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants