-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CUTLASS] Support conv2d activation fusion #9746
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
commit e4e273ae74a8e54ab1ae1414ce9b6bfcc2b3d530 Merge: 0489d14 77c9385 Author: Masahiro Masuda <[email protected]> Date: Mon Dec 13 11:58:54 2021 +0900 Merge branch 'partition-constant-unbind' into cutlass-conv2d-fusion commit 77c9385 Author: Masahiro Masuda <[email protected]> Date: Mon Dec 13 11:58:18 2021 +0900 add test commit ab01b3a Author: Masahiro Masuda <[email protected]> Date: Mon Dec 13 11:55:06 2021 +0900 make constant binding in PartitionGraph optional commit 0489d14 Author: Masahiro Masuda <[email protected]> Date: Sun Dec 12 21:52:29 2021 +0900 support sigmoid fusion (only fp32 accum for now) commit 3705bbd Author: Masahiro Masuda <[email protected]> Date: Sun Dec 12 20:50:58 2021 +0900 conv2d fusion test worked commit 05b51c9 Author: Masahiro Masuda <[email protected]> Date: Sun Dec 12 20:34:10 2021 +0900 fix bias stride commit 7cf40e7 Author: Masahiro Masuda <[email protected]> Date: Sun Dec 12 20:01:21 2021 +0900 use nobetascaling commit 274ec02 Author: Masahiro Masuda <[email protected]> Date: Sun Dec 12 19:12:58 2021 +0900 adding fusion support to codegen commit 0de5ebd Author: Masahiro Masuda <[email protected]> Date: Sun Dec 12 18:39:08 2021 +0900 partition working commit c08bb38 Author: Masahiro Masuda <[email protected]> Date: Sun Dec 12 17:24:42 2021 +0900 update test commit 81bf9e6 Author: Masahiro Masuda <[email protected]> Date: Fri Dec 10 13:23:39 2021 +0900 add fused conv2d pattern commit 1c0bbb2 Author: Masahiro Masuda <[email protected]> Date: Sun Dec 12 18:29:03 2021 +0900 fix lint commit 463574c Author: Masahiro Masuda <[email protected]> Date: Sun Dec 12 17:28:38 2021 +0900 fixed conv2d check commit 588c5ab Author: Masahiro Masuda <[email protected]> Date: Sun Dec 12 15:05:27 2021 +0900 update test commit a447b57 Author: Masahiro Masuda <[email protected]> Date: Sun Dec 12 14:54:52 2021 +0900 speed up profiling by removing initialization commit 93cd039 Author: Masahiro Masuda <[email protected]> Date: Sun Dec 12 08:26:29 2021 +0900 fixed nhwc cudnn depthwise conv commit 6db7172 Author: Masahiro Masuda <[email protected]> Date: Sat Dec 11 15:39:05 2021 +0900 add cache commit f7d17a1 Author: Masahiro Masuda <[email protected]> Date: Sat Dec 11 15:05:38 2021 +0900 removed im2col profiling for conv2d commit b724f44 Author: Masahiro Masuda <[email protected]> Date: Fri Dec 10 22:57:54 2021 +0900 black commit fe4687b Author: Masahiro Masuda <[email protected]> Date: Fri Dec 10 22:49:13 2021 +0900 fixed cmd arguement commit ab114f5 Author: Masahiro Masuda <[email protected]> Date: Fri Dec 10 22:22:19 2021 +0900 conv2d profiler working commit 49ee61f Author: Masahiro Masuda <[email protected]> Date: Fri Dec 10 20:26:15 2021 +0900 add conv2d profiler commit 49e2c89 Author: Masahiro Masuda <[email protected]> Date: Sun Dec 12 08:03:36 2021 +0900 do not offload depthwise conv2d commit cd83677 Author: Masahiro Masuda <[email protected]> Date: Fri Dec 10 13:20:01 2021 +0900 lint fix commit 870823c Author: Masahiro Masuda <[email protected]> Date: Fri Dec 10 12:54:38 2021 +0900 add comment on IC == 3 case commit 6b780db Author: Masahiro Masuda <[email protected]> Date: Fri Dec 10 12:48:33 2021 +0900 check align on N dim commit 308c4da Author: Masahiro Masuda <[email protected]> Date: Fri Dec 10 12:34:42 2021 +0900 fixed check functions for fused cases, run infer type before mergecomposite commit 8d6a1bf Author: Masahiro Masuda <[email protected]> Date: Fri Dec 10 12:10:59 2021 +0900 test IC=3 convolution commit ffce47d Author: Masahiro Masuda <[email protected]> Date: Fri Dec 10 12:10:16 2021 +0900 use align1 kernel for unusual channel cases (IC = 3 etc) commit 6cdf205 Author: Masahiro Masuda <[email protected]> Date: Fri Dec 10 12:06:56 2021 +0900 add dtype and layout check in parttern match commit 7743cc6 Author: Masahiro Masuda <[email protected]> Date: Fri Dec 10 10:40:53 2021 +0900 add sm75 kernels to sm80 profilings commit efceccb Author: Masahiro Masuda <[email protected]> Date: Fri Dec 10 10:40:42 2021 +0900 skip legalize when batch size is dynamic commit 65fbc0a Author: Masahiro Masuda <[email protected]> Date: Fri Dec 10 10:36:36 2021 +0900 bug fix in im2col encoding
masahi
requested review from
anijain2305,
areusch,
comaniac,
icemelon,
jroesch,
junrushao,
jwfromm,
manupak,
MarisaKirisame,
mbaret,
mbrookhart,
merrymercy,
slyubomirsky,
tqchen,
trevor-m,
vinx13,
wweic,
yzhliu,
zhiics and
ZihengJiang
as code owners
December 15, 2021 07:09
comaniac
approved these changes
Dec 16, 2021
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks.
Thanks @masahi |
ylc
pushed a commit
to ylc/tvm
that referenced
this pull request
Jan 7, 2022
* Add cutlass conv2d activation (bias, relu, sigmoid) commit e4e273ae74a8e54ab1ae1414ce9b6bfcc2b3d530 Merge: 0489d14 77c9385 Author: Masahiro Masuda <[email protected]> Date: Mon Dec 13 11:58:54 2021 +0900 Merge branch 'partition-constant-unbind' into cutlass-conv2d-fusion commit 77c9385 Author: Masahiro Masuda <[email protected]> Date: Mon Dec 13 11:58:18 2021 +0900 add test commit ab01b3a Author: Masahiro Masuda <[email protected]> Date: Mon Dec 13 11:55:06 2021 +0900 make constant binding in PartitionGraph optional commit 0489d14 Author: Masahiro Masuda <[email protected]> Date: Sun Dec 12 21:52:29 2021 +0900 support sigmoid fusion (only fp32 accum for now) commit 3705bbd Author: Masahiro Masuda <[email protected]> Date: Sun Dec 12 20:50:58 2021 +0900 conv2d fusion test worked commit 05b51c9 Author: Masahiro Masuda <[email protected]> Date: Sun Dec 12 20:34:10 2021 +0900 fix bias stride commit 7cf40e7 Author: Masahiro Masuda <[email protected]> Date: Sun Dec 12 20:01:21 2021 +0900 use nobetascaling commit 274ec02 Author: Masahiro Masuda <[email protected]> Date: Sun Dec 12 19:12:58 2021 +0900 adding fusion support to codegen commit 0de5ebd Author: Masahiro Masuda <[email protected]> Date: Sun Dec 12 18:39:08 2021 +0900 partition working commit c08bb38 Author: Masahiro Masuda <[email protected]> Date: Sun Dec 12 17:24:42 2021 +0900 update test commit 81bf9e6 Author: Masahiro Masuda <[email protected]> Date: Fri Dec 10 13:23:39 2021 +0900 add fused conv2d pattern commit 1c0bbb2 Author: Masahiro Masuda <[email protected]> Date: Sun Dec 12 18:29:03 2021 +0900 fix lint commit 463574c Author: Masahiro Masuda <[email protected]> Date: Sun Dec 12 17:28:38 2021 +0900 fixed conv2d check commit 588c5ab Author: Masahiro Masuda <[email protected]> Date: Sun Dec 12 15:05:27 2021 +0900 update test commit a447b57 Author: Masahiro Masuda <[email protected]> Date: Sun Dec 12 14:54:52 2021 +0900 speed up profiling by removing initialization commit 93cd039 Author: Masahiro Masuda <[email protected]> Date: Sun Dec 12 08:26:29 2021 +0900 fixed nhwc cudnn depthwise conv commit 6db7172 Author: Masahiro Masuda <[email protected]> Date: Sat Dec 11 15:39:05 2021 +0900 add cache commit f7d17a1 Author: Masahiro Masuda <[email protected]> Date: Sat Dec 11 15:05:38 2021 +0900 removed im2col profiling for conv2d commit b724f44 Author: Masahiro Masuda <[email protected]> Date: Fri Dec 10 22:57:54 2021 +0900 black commit fe4687b Author: Masahiro Masuda <[email protected]> Date: Fri Dec 10 22:49:13 2021 +0900 fixed cmd arguement commit ab114f5 Author: Masahiro Masuda <[email protected]> Date: Fri Dec 10 22:22:19 2021 +0900 conv2d profiler working commit 49ee61f Author: Masahiro Masuda <[email protected]> Date: Fri Dec 10 20:26:15 2021 +0900 add conv2d profiler commit 49e2c89 Author: Masahiro Masuda <[email protected]> Date: Sun Dec 12 08:03:36 2021 +0900 do not offload depthwise conv2d commit cd83677 Author: Masahiro Masuda <[email protected]> Date: Fri Dec 10 13:20:01 2021 +0900 lint fix commit 870823c Author: Masahiro Masuda <[email protected]> Date: Fri Dec 10 12:54:38 2021 +0900 add comment on IC == 3 case commit 6b780db Author: Masahiro Masuda <[email protected]> Date: Fri Dec 10 12:48:33 2021 +0900 check align on N dim commit 308c4da Author: Masahiro Masuda <[email protected]> Date: Fri Dec 10 12:34:42 2021 +0900 fixed check functions for fused cases, run infer type before mergecomposite commit 8d6a1bf Author: Masahiro Masuda <[email protected]> Date: Fri Dec 10 12:10:59 2021 +0900 test IC=3 convolution commit ffce47d Author: Masahiro Masuda <[email protected]> Date: Fri Dec 10 12:10:16 2021 +0900 use align1 kernel for unusual channel cases (IC = 3 etc) commit 6cdf205 Author: Masahiro Masuda <[email protected]> Date: Fri Dec 10 12:06:56 2021 +0900 add dtype and layout check in parttern match commit 7743cc6 Author: Masahiro Masuda <[email protected]> Date: Fri Dec 10 10:40:53 2021 +0900 add sm75 kernels to sm80 profilings commit efceccb Author: Masahiro Masuda <[email protected]> Date: Fri Dec 10 10:40:42 2021 +0900 skip legalize when batch size is dynamic commit 65fbc0a Author: Masahiro Masuda <[email protected]> Date: Fri Dec 10 10:36:36 2021 +0900 bug fix in im2col encoding * support batch norm fusion
ylc
pushed a commit
to ylc/tvm
that referenced
this pull request
Jan 13, 2022
* Add cutlass conv2d activation (bias, relu, sigmoid) commit e4e273ae74a8e54ab1ae1414ce9b6bfcc2b3d530 Merge: 0489d14 77c9385 Author: Masahiro Masuda <[email protected]> Date: Mon Dec 13 11:58:54 2021 +0900 Merge branch 'partition-constant-unbind' into cutlass-conv2d-fusion commit 77c9385 Author: Masahiro Masuda <[email protected]> Date: Mon Dec 13 11:58:18 2021 +0900 add test commit ab01b3a Author: Masahiro Masuda <[email protected]> Date: Mon Dec 13 11:55:06 2021 +0900 make constant binding in PartitionGraph optional commit 0489d14 Author: Masahiro Masuda <[email protected]> Date: Sun Dec 12 21:52:29 2021 +0900 support sigmoid fusion (only fp32 accum for now) commit 3705bbd Author: Masahiro Masuda <[email protected]> Date: Sun Dec 12 20:50:58 2021 +0900 conv2d fusion test worked commit 05b51c9 Author: Masahiro Masuda <[email protected]> Date: Sun Dec 12 20:34:10 2021 +0900 fix bias stride commit 7cf40e7 Author: Masahiro Masuda <[email protected]> Date: Sun Dec 12 20:01:21 2021 +0900 use nobetascaling commit 274ec02 Author: Masahiro Masuda <[email protected]> Date: Sun Dec 12 19:12:58 2021 +0900 adding fusion support to codegen commit 0de5ebd Author: Masahiro Masuda <[email protected]> Date: Sun Dec 12 18:39:08 2021 +0900 partition working commit c08bb38 Author: Masahiro Masuda <[email protected]> Date: Sun Dec 12 17:24:42 2021 +0900 update test commit 81bf9e6 Author: Masahiro Masuda <[email protected]> Date: Fri Dec 10 13:23:39 2021 +0900 add fused conv2d pattern commit 1c0bbb2 Author: Masahiro Masuda <[email protected]> Date: Sun Dec 12 18:29:03 2021 +0900 fix lint commit 463574c Author: Masahiro Masuda <[email protected]> Date: Sun Dec 12 17:28:38 2021 +0900 fixed conv2d check commit 588c5ab Author: Masahiro Masuda <[email protected]> Date: Sun Dec 12 15:05:27 2021 +0900 update test commit a447b57 Author: Masahiro Masuda <[email protected]> Date: Sun Dec 12 14:54:52 2021 +0900 speed up profiling by removing initialization commit 93cd039 Author: Masahiro Masuda <[email protected]> Date: Sun Dec 12 08:26:29 2021 +0900 fixed nhwc cudnn depthwise conv commit 6db7172 Author: Masahiro Masuda <[email protected]> Date: Sat Dec 11 15:39:05 2021 +0900 add cache commit f7d17a1 Author: Masahiro Masuda <[email protected]> Date: Sat Dec 11 15:05:38 2021 +0900 removed im2col profiling for conv2d commit b724f44 Author: Masahiro Masuda <[email protected]> Date: Fri Dec 10 22:57:54 2021 +0900 black commit fe4687b Author: Masahiro Masuda <[email protected]> Date: Fri Dec 10 22:49:13 2021 +0900 fixed cmd arguement commit ab114f5 Author: Masahiro Masuda <[email protected]> Date: Fri Dec 10 22:22:19 2021 +0900 conv2d profiler working commit 49ee61f Author: Masahiro Masuda <[email protected]> Date: Fri Dec 10 20:26:15 2021 +0900 add conv2d profiler commit 49e2c89 Author: Masahiro Masuda <[email protected]> Date: Sun Dec 12 08:03:36 2021 +0900 do not offload depthwise conv2d commit cd83677 Author: Masahiro Masuda <[email protected]> Date: Fri Dec 10 13:20:01 2021 +0900 lint fix commit 870823c Author: Masahiro Masuda <[email protected]> Date: Fri Dec 10 12:54:38 2021 +0900 add comment on IC == 3 case commit 6b780db Author: Masahiro Masuda <[email protected]> Date: Fri Dec 10 12:48:33 2021 +0900 check align on N dim commit 308c4da Author: Masahiro Masuda <[email protected]> Date: Fri Dec 10 12:34:42 2021 +0900 fixed check functions for fused cases, run infer type before mergecomposite commit 8d6a1bf Author: Masahiro Masuda <[email protected]> Date: Fri Dec 10 12:10:59 2021 +0900 test IC=3 convolution commit ffce47d Author: Masahiro Masuda <[email protected]> Date: Fri Dec 10 12:10:16 2021 +0900 use align1 kernel for unusual channel cases (IC = 3 etc) commit 6cdf205 Author: Masahiro Masuda <[email protected]> Date: Fri Dec 10 12:06:56 2021 +0900 add dtype and layout check in parttern match commit 7743cc6 Author: Masahiro Masuda <[email protected]> Date: Fri Dec 10 10:40:53 2021 +0900 add sm75 kernels to sm80 profilings commit efceccb Author: Masahiro Masuda <[email protected]> Date: Fri Dec 10 10:40:42 2021 +0900 skip legalize when batch size is dynamic commit 65fbc0a Author: Masahiro Masuda <[email protected]> Date: Fri Dec 10 10:36:36 2021 +0900 bug fix in im2col encoding * support batch norm fusion
qsqqsqqsq-intellif
pushed a commit
to qsqqsqqsq-intellif/tvm
that referenced
this pull request
Apr 29, 2022
* Add cutlass conv2d activation (bias, relu, sigmoid) commit e4e273ae74a8e54ab1ae1414ce9b6bfcc2b3d530 Merge: 0489d14 77c9385 Author: Masahiro Masuda <[email protected]> Date: Mon Dec 13 11:58:54 2021 +0900 Merge branch 'partition-constant-unbind' into cutlass-conv2d-fusion commit 77c9385 Author: Masahiro Masuda <[email protected]> Date: Mon Dec 13 11:58:18 2021 +0900 add test commit ab01b3a Author: Masahiro Masuda <[email protected]> Date: Mon Dec 13 11:55:06 2021 +0900 make constant binding in PartitionGraph optional commit 0489d14 Author: Masahiro Masuda <[email protected]> Date: Sun Dec 12 21:52:29 2021 +0900 support sigmoid fusion (only fp32 accum for now) commit 3705bbd Author: Masahiro Masuda <[email protected]> Date: Sun Dec 12 20:50:58 2021 +0900 conv2d fusion test worked commit 05b51c9 Author: Masahiro Masuda <[email protected]> Date: Sun Dec 12 20:34:10 2021 +0900 fix bias stride commit 7cf40e7 Author: Masahiro Masuda <[email protected]> Date: Sun Dec 12 20:01:21 2021 +0900 use nobetascaling commit 274ec02 Author: Masahiro Masuda <[email protected]> Date: Sun Dec 12 19:12:58 2021 +0900 adding fusion support to codegen commit 0de5ebd Author: Masahiro Masuda <[email protected]> Date: Sun Dec 12 18:39:08 2021 +0900 partition working commit c08bb38 Author: Masahiro Masuda <[email protected]> Date: Sun Dec 12 17:24:42 2021 +0900 update test commit 81bf9e6 Author: Masahiro Masuda <[email protected]> Date: Fri Dec 10 13:23:39 2021 +0900 add fused conv2d pattern commit 1c0bbb2 Author: Masahiro Masuda <[email protected]> Date: Sun Dec 12 18:29:03 2021 +0900 fix lint commit 463574c Author: Masahiro Masuda <[email protected]> Date: Sun Dec 12 17:28:38 2021 +0900 fixed conv2d check commit 588c5ab Author: Masahiro Masuda <[email protected]> Date: Sun Dec 12 15:05:27 2021 +0900 update test commit a447b57 Author: Masahiro Masuda <[email protected]> Date: Sun Dec 12 14:54:52 2021 +0900 speed up profiling by removing initialization commit 93cd039 Author: Masahiro Masuda <[email protected]> Date: Sun Dec 12 08:26:29 2021 +0900 fixed nhwc cudnn depthwise conv commit 6db7172 Author: Masahiro Masuda <[email protected]> Date: Sat Dec 11 15:39:05 2021 +0900 add cache commit f7d17a1 Author: Masahiro Masuda <[email protected]> Date: Sat Dec 11 15:05:38 2021 +0900 removed im2col profiling for conv2d commit b724f44 Author: Masahiro Masuda <[email protected]> Date: Fri Dec 10 22:57:54 2021 +0900 black commit fe4687b Author: Masahiro Masuda <[email protected]> Date: Fri Dec 10 22:49:13 2021 +0900 fixed cmd arguement commit ab114f5 Author: Masahiro Masuda <[email protected]> Date: Fri Dec 10 22:22:19 2021 +0900 conv2d profiler working commit 49ee61f Author: Masahiro Masuda <[email protected]> Date: Fri Dec 10 20:26:15 2021 +0900 add conv2d profiler commit 49e2c89 Author: Masahiro Masuda <[email protected]> Date: Sun Dec 12 08:03:36 2021 +0900 do not offload depthwise conv2d commit cd83677 Author: Masahiro Masuda <[email protected]> Date: Fri Dec 10 13:20:01 2021 +0900 lint fix commit 870823c Author: Masahiro Masuda <[email protected]> Date: Fri Dec 10 12:54:38 2021 +0900 add comment on IC == 3 case commit 6b780db Author: Masahiro Masuda <[email protected]> Date: Fri Dec 10 12:48:33 2021 +0900 check align on N dim commit 308c4da Author: Masahiro Masuda <[email protected]> Date: Fri Dec 10 12:34:42 2021 +0900 fixed check functions for fused cases, run infer type before mergecomposite commit 8d6a1bf Author: Masahiro Masuda <[email protected]> Date: Fri Dec 10 12:10:59 2021 +0900 test IC=3 convolution commit ffce47d Author: Masahiro Masuda <[email protected]> Date: Fri Dec 10 12:10:16 2021 +0900 use align1 kernel for unusual channel cases (IC = 3 etc) commit 6cdf205 Author: Masahiro Masuda <[email protected]> Date: Fri Dec 10 12:06:56 2021 +0900 add dtype and layout check in parttern match commit 7743cc6 Author: Masahiro Masuda <[email protected]> Date: Fri Dec 10 10:40:53 2021 +0900 add sm75 kernels to sm80 profilings commit efceccb Author: Masahiro Masuda <[email protected]> Date: Fri Dec 10 10:40:42 2021 +0900 skip legalize when batch size is dynamic commit 65fbc0a Author: Masahiro Masuda <[email protected]> Date: Fri Dec 10 10:36:36 2021 +0900 bug fix in im2col encoding * support batch norm fusion
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
@comaniac @Laurawly @hwu36 @manishucsd @junrushao1994 @vinx13
This adds conv2d + activation fusion support for bias_add, relu, and fp32 sigmoid. I have pending PRs at the cutlass repo NVIDIA/cutlass#378, NVIDIA/cutlass#379 which will enable fp16 sigmoid, silu, and hardswish fusion as well.
End to end results
All code, models, and nvprof dump etc are available at https://github.com/masahi/tvm-cutlass-eval
All numbers in milli sec, using fp16 models with fp16 accumulation running on tensorcore, measured on RTX 3070
Observations
main
function looks like this (see the whole model at https://github.com/masahi/tvm-cutlass-eval/blob/master/resnet50/resnet50_partition.txt):The intermediate
add
s are the element-wise addition in the residual block, which can in principle be fused with the preceding conv2d. nvprof output shows that these unfused ops are taking more than 20% of e2e time. If we fuse them, I believe we could approach TRT-level performance.deeplabv3
andefficientnetv2
, both of which use a lot of depthwise conv2d. The nvprof output shows that cuDNN is spending a lot of time doing layout transform (see for example). Maybe our use of cuDNN is not optimal in terms of API usage and kernel selections. nvprof dumps for cuDNN-runs are available in the repository, if someone wants to compare against cutlass nvprof dumps.efficientnet_v2
row), which I found odd since TVM should be generating essentially the same sigmoid kernel in the unfused case. Thefast-math
option, discussed in Support half precision sigmoid activation NVIDIA/cutlass#378 (comment), makes it a lot better at the cost of slight accuracy loss.efficientnetv2
anddeeplabv3
use a lot of depthwise conv2d, which is not currently offloaded to cutlass. They are actually the bottleneck in the results above, which can be observed by looking at nvprof dumps. Using AutoTVM for them should help bring down e2e time further.Known issues
Trying to fuse(Fixed a bug in type relation)silu
activation in YOLOv5l results in a strange type inference error duringMergeComposite
. So the YOLOv5l result above doesn't use silu fusion.target = "cuda"
) work correctly.