You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
1053879
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lux Benchmarks
layernorm(2, act=gelu, affine=false)(4 x 32)/forward/CPU/2 thread(s)
3875
ns3791
ns1.02
layernorm(2, act=gelu, affine=false)(4 x 32)/forward/CPU/4 thread(s)
4292
ns4500
ns0.95
layernorm(2, act=gelu, affine=false)(4 x 32)/forward/CPU/8 thread(s)
4958
ns4875
ns1.02
layernorm(2, act=gelu, affine=false)(4 x 32)/forward/CPU/1 thread(s)
3708
ns3666
ns1.01
layernorm(2, act=gelu, affine=false)(4 x 32)/zygote/CPU/2 thread(s)
10750
ns10167
ns1.06
layernorm(2, act=gelu, affine=false)(4 x 32)/zygote/CPU/4 thread(s)
10416
ns10458
ns1.00
layernorm(2, act=gelu, affine=false)(4 x 32)/zygote/CPU/8 thread(s)
10833
ns10750
ns1.01
layernorm(2, act=gelu, affine=false)(4 x 32)/zygote/CPU/1 thread(s)
10500
ns10625
ns0.99
bias_activation(32, act=relu)(32 x 128)/forward/CPU/2 thread(s)
1250
ns1062.5
ns1.18
bias_activation(32, act=relu)(32 x 128)/forward/CPU/4 thread(s)
1042
ns1167
ns0.89
bias_activation(32, act=relu)(32 x 128)/forward/CPU/8 thread(s)
1417
ns1500
ns0.94
bias_activation(32, act=relu)(32 x 128)/forward/CPU/1 thread(s)
1208
ns1125
ns1.07
bias_activation(32, act=relu)(32 x 128)/zygote/CPU/2 thread(s)
4125
ns4083
ns1.01
bias_activation(32, act=relu)(32 x 128)/zygote/CPU/4 thread(s)
3792
ns4042
ns0.94
bias_activation(32, act=relu)(32 x 128)/zygote/CPU/8 thread(s)
4208
ns4208
ns1
bias_activation(32, act=relu)(32 x 128)/zygote/CPU/1 thread(s)
4166
ns3958
ns1.05
batchnorm(4, act=relu, affine=false)(16 x 16 x 32 x 32)/forward/CPU/2 thread(s)
57458
ns57542
ns1.00
batchnorm(4, act=relu, affine=false)(16 x 16 x 32 x 32)/forward/CPU/4 thread(s)
46709
ns46416
ns1.01
batchnorm(4, act=relu, affine=false)(16 x 16 x 32 x 32)/forward/CPU/8 thread(s)
38291.5
ns47125
ns0.81
batchnorm(4, act=relu, affine=false)(16 x 16 x 32 x 32)/forward/CPU/1 thread(s)
82166
ns80875
ns1.02
batchnorm(4, act=relu, affine=false)(16 x 16 x 32 x 32)/zygote/CPU/2 thread(s)
2036084
ns2035395.5
ns1.00
batchnorm(4, act=relu, affine=false)(16 x 16 x 32 x 32)/zygote/CPU/4 thread(s)
2088000
ns2078396
ns1.00
batchnorm(4, act=relu, affine=false)(16 x 16 x 32 x 32)/zygote/CPU/8 thread(s)
2101833.5
ns2078708
ns1.01
batchnorm(4, act=relu, affine=false)(16 x 16 x 32 x 32)/zygote/CPU/1 thread(s)
1996395.5
ns1998584
ns1.00
layernorm(4, act=identity, affine=false)(16 x 16 x 32 x 32)/forward/CPU/2 thread(s)
171187
ns144250
ns1.19
layernorm(4, act=identity, affine=false)(16 x 16 x 32 x 32)/forward/CPU/4 thread(s)
141166
ns144166.5
ns0.98
layernorm(4, act=identity, affine=false)(16 x 16 x 32 x 32)/forward/CPU/8 thread(s)
145416.5
ns145125
ns1.00
layernorm(4, act=identity, affine=false)(16 x 16 x 32 x 32)/forward/CPU/1 thread(s)
143604
ns153104.5
ns0.94
layernorm(4, act=identity, affine=false)(16 x 16 x 32 x 32)/zygote/CPU/2 thread(s)
1123959
ns1120291.5
ns1.00
layernorm(4, act=identity, affine=false)(16 x 16 x 32 x 32)/zygote/CPU/4 thread(s)
1117541.5
ns1113167
ns1.00
layernorm(4, act=identity, affine=false)(16 x 16 x 32 x 32)/zygote/CPU/8 thread(s)
1153479.5
ns832708.5
ns1.39
layernorm(4, act=identity, affine=false)(16 x 16 x 32 x 32)/zygote/CPU/1 thread(s)
1120542
ns1117084
ns1.00
layernorm(2, act=relu, affine=true)(4 x 32)/forward/CPU/2 thread(s)
3250
ns3375
ns0.96
layernorm(2, act=relu, affine=true)(4 x 32)/forward/CPU/4 thread(s)
3542
ns3542
ns1
layernorm(2, act=relu, affine=true)(4 x 32)/forward/CPU/8 thread(s)
4083
ns4166
ns0.98
layernorm(2, act=relu, affine=true)(4 x 32)/forward/CPU/1 thread(s)
3042
ns3125
ns0.97
layernorm(2, act=relu, affine=true)(4 x 32)/zygote/CPU/2 thread(s)
9145.5
ns9042
ns1.01
layernorm(2, act=relu, affine=true)(4 x 32)/zygote/CPU/4 thread(s)
8833
ns8750
ns1.01
layernorm(2, act=relu, affine=true)(4 x 32)/zygote/CPU/8 thread(s)
10333
ns10208
ns1.01
layernorm(2, act=relu, affine=true)(4 x 32)/zygote/CPU/1 thread(s)
9292
ns8833
ns1.05
groupnorm(4, act=identity, affine=false)(16 x 16 x 4 x 32)/forward/CPU/2 thread(s)
15250
ns17041
ns0.89
groupnorm(4, act=identity, affine=false)(16 x 16 x 4 x 32)/forward/CPU/4 thread(s)
17354.5
ns15834
ns1.10
groupnorm(4, act=identity, affine=false)(16 x 16 x 4 x 32)/forward/CPU/8 thread(s)
16208
ns16604.5
ns0.98
groupnorm(4, act=identity, affine=false)(16 x 16 x 4 x 32)/forward/CPU/1 thread(s)
15187.5
ns16791
ns0.90
groupnorm(4, act=identity, affine=false)(16 x 16 x 4 x 32)/zygote/CPU/2 thread(s)
216750
ns213750
ns1.01
groupnorm(4, act=identity, affine=false)(16 x 16 x 4 x 32)/zygote/CPU/4 thread(s)
211208
ns214875
ns0.98
groupnorm(4, act=identity, affine=false)(16 x 16 x 4 x 32)/zygote/CPU/8 thread(s)
212166.5
ns215667
ns0.98
groupnorm(4, act=identity, affine=false)(16 x 16 x 4 x 32)/zygote/CPU/1 thread(s)
227042
ns226125
ns1.00
bias_activation(2, act=relu)(2 x 128)/forward/CPU/2 thread(s)
667
ns542
ns1.23
bias_activation(2, act=relu)(2 x 128)/forward/CPU/4 thread(s)
583
ns708
ns0.82
bias_activation(2, act=relu)(2 x 128)/forward/CPU/8 thread(s)
770.5
ns709
ns1.09
bias_activation(2, act=relu)(2 x 128)/forward/CPU/1 thread(s)
500
ns541
ns0.92
bias_activation(2, act=relu)(2 x 128)/zygote/CPU/2 thread(s)
1459
ns1375
ns1.06
bias_activation(2, act=relu)(2 x 128)/zygote/CPU/4 thread(s)
1417
ns1375
ns1.03
bias_activation(2, act=relu)(2 x 128)/zygote/CPU/8 thread(s)
1417
ns1500
ns0.94
bias_activation(2, act=relu)(2 x 128)/zygote/CPU/1 thread(s)
1458
ns1458
ns1
batchnorm(4, act=relu, affine=true)(16 x 16 x 4 x 32)/forward/CPU/2 thread(s)
7166
ns7000
ns1.02
batchnorm(4, act=relu, affine=true)(16 x 16 x 4 x 32)/forward/CPU/4 thread(s)
5875
ns5750
ns1.02
batchnorm(4, act=relu, affine=true)(16 x 16 x 4 x 32)/forward/CPU/8 thread(s)
5250
ns6042
ns0.87
batchnorm(4, act=relu, affine=true)(16 x 16 x 4 x 32)/forward/CPU/1 thread(s)
10041
ns9750
ns1.03
batchnorm(4, act=relu, affine=true)(16 x 16 x 4 x 32)/zygote/CPU/2 thread(s)
221000
ns222021
ns1.00
batchnorm(4, act=relu, affine=true)(16 x 16 x 4 x 32)/zygote/CPU/4 thread(s)
227229.5
ns228542
ns0.99
batchnorm(4, act=relu, affine=true)(16 x 16 x 4 x 32)/zygote/CPU/8 thread(s)
228708
ns229292
ns1.00
batchnorm(4, act=relu, affine=true)(16 x 16 x 4 x 32)/zygote/CPU/1 thread(s)
213792
ns213937.5
ns1.00
dense(32, bias=false, act=relu)(32 x 128)/forward/CPU/2 thread(s)
3834
ns3875
ns0.99
dense(32, bias=false, act=relu)(32 x 128)/forward/CPU/4 thread(s)
3875
ns3917
ns0.99
dense(32, bias=false, act=relu)(32 x 128)/forward/CPU/8 thread(s)
3917
ns3959
ns0.99
dense(32, bias=false, act=relu)(32 x 128)/forward/CPU/1 thread(s)
3875
ns3917
ns0.99
dense(32, bias=false, act=relu)(32 x 128)/zygote/CPU/2 thread(s)
16750
ns16917
ns0.99
dense(32, bias=false, act=relu)(32 x 128)/zygote/CPU/4 thread(s)
16708
ns16792
ns0.99
dense(32, bias=false, act=relu)(32 x 128)/zygote/CPU/8 thread(s)
16542
ns17250
ns0.96
dense(32, bias=false, act=relu)(32 x 128)/zygote/CPU/1 thread(s)
17042
ns16750
ns1.02
dense(512, bias=false, act=gelu)(512 x 128)/forward/CPU/2 thread(s)
580104.5
ns568792
ns1.02
dense(512, bias=false, act=gelu)(512 x 128)/forward/CPU/4 thread(s)
575958
ns578645.5
ns1.00
dense(512, bias=false, act=gelu)(512 x 128)/forward/CPU/8 thread(s)
579375
ns578083
ns1.00
dense(512, bias=false, act=gelu)(512 x 128)/forward/CPU/1 thread(s)
580708
ns575625
ns1.01
dense(512, bias=false, act=gelu)(512 x 128)/zygote/CPU/2 thread(s)
1416791
ns1422625
ns1.00
dense(512, bias=false, act=gelu)(512 x 128)/zygote/CPU/4 thread(s)
1424167
ns1420000
ns1.00
dense(512, bias=false, act=gelu)(512 x 128)/zygote/CPU/8 thread(s)
1423042
ns1422375
ns1.00
dense(512, bias=false, act=gelu)(512 x 128)/zygote/CPU/1 thread(s)
1425000
ns1426708
ns1.00
lenet(28, 28, 1, 64)/forward/CPU/2 thread(s)
1079063
ns1077687.5
ns1.00
lenet(28, 28, 1, 64)/forward/CPU/4 thread(s)
963917
ns960917
ns1.00
lenet(28, 28, 1, 64)/forward/CPU/8 thread(s)
1334458
ns1353229.5
ns0.99
lenet(28, 28, 1, 64)/forward/CPU/1 thread(s)
1297667
ns1315312
ns0.99
lenet(28, 28, 1, 64)/zygote/CPU/2 thread(s)
5943395.5
ns5961958
ns1.00
lenet(28, 28, 1, 64)/zygote/CPU/4 thread(s)
4600125
ns4633250
ns0.99
lenet(28, 28, 1, 64)/zygote/CPU/8 thread(s)
4951395.5
ns4975188
ns1.00
lenet(28, 28, 1, 64)/zygote/CPU/1 thread(s)
5560500
ns5557125
ns1.00
dense(2, bias=true, act=relu)(2 x 128)/forward/CPU/2 thread(s)
500
ns542
ns0.92
dense(2, bias=true, act=relu)(2 x 128)/forward/CPU/4 thread(s)
500
ns583
ns0.86
dense(2, bias=true, act=relu)(2 x 128)/forward/CPU/8 thread(s)
500
ns542
ns0.92
dense(2, bias=true, act=relu)(2 x 128)/forward/CPU/1 thread(s)
542
ns542
ns1
dense(2, bias=true, act=relu)(2 x 128)/zygote/CPU/2 thread(s)
2166
ns2208
ns0.98
dense(2, bias=true, act=relu)(2 x 128)/zygote/CPU/4 thread(s)
2042
ns2250
ns0.91
dense(2, bias=true, act=relu)(2 x 128)/zygote/CPU/8 thread(s)
2125
ns2167
ns0.98
dense(2, bias=true, act=relu)(2 x 128)/zygote/CPU/1 thread(s)
2208
ns2125
ns1.04
layernorm(2, act=relu, affine=false)(32 x 32)/forward/CPU/2 thread(s)
3687.5
ns4125
ns0.89
layernorm(2, act=relu, affine=false)(32 x 32)/forward/CPU/4 thread(s)
3791
ns4375
ns0.87
layernorm(2, act=relu, affine=false)(32 x 32)/forward/CPU/8 thread(s)
4792
ns5167
ns0.93
layernorm(2, act=relu, affine=false)(32 x 32)/forward/CPU/1 thread(s)
3667
ns4250
ns0.86
layernorm(2, act=relu, affine=false)(32 x 32)/zygote/CPU/2 thread(s)
10875
ns11875
ns0.92
layernorm(2, act=relu, affine=false)(32 x 32)/zygote/CPU/4 thread(s)
11084
ns11000
ns1.01
layernorm(2, act=relu, affine=false)(32 x 32)/zygote/CPU/8 thread(s)
11500
ns11917
ns0.97
layernorm(2, act=relu, affine=false)(32 x 32)/zygote/CPU/1 thread(s)
11250
ns11500
ns0.98
groupnorm(2, act=relu, affine=true)(32 x 32)/forward/CPU/2 thread(s)
6125
ns7000
ns0.88
groupnorm(2, act=relu, affine=true)(32 x 32)/forward/CPU/4 thread(s)
6834
ns6958
ns0.98
groupnorm(2, act=relu, affine=true)(32 x 32)/forward/CPU/8 thread(s)
7542
ns8250
ns0.91
groupnorm(2, act=relu, affine=true)(32 x 32)/forward/CPU/1 thread(s)
6250
ns6125
ns1.02
groupnorm(2, act=relu, affine=true)(32 x 32)/zygote/CPU/2 thread(s)
17625
ns18708.5
ns0.94
groupnorm(2, act=relu, affine=true)(32 x 32)/zygote/CPU/4 thread(s)
17542
ns18625
ns0.94
groupnorm(2, act=relu, affine=true)(32 x 32)/zygote/CPU/8 thread(s)
18834
ns18375
ns1.02
groupnorm(2, act=relu, affine=true)(32 x 32)/zygote/CPU/1 thread(s)
17416
ns16708
ns1.04
batchnorm(2, act=identity, affine=false)(32 x 32)/forward/CPU/2 thread(s)
542
ns625
ns0.87
batchnorm(2, act=identity, affine=false)(32 x 32)/forward/CPU/4 thread(s)
666
ns708
ns0.94
batchnorm(2, act=identity, affine=false)(32 x 32)/forward/CPU/8 thread(s)
625
ns667
ns0.94
batchnorm(2, act=identity, affine=false)(32 x 32)/forward/CPU/1 thread(s)
625
ns584
ns1.07
batchnorm(2, act=identity, affine=false)(32 x 32)/zygote/CPU/2 thread(s)
8500
ns8834
ns0.96
batchnorm(2, act=identity, affine=false)(32 x 32)/zygote/CPU/4 thread(s)
8750
ns8875
ns0.99
batchnorm(2, act=identity, affine=false)(32 x 32)/zygote/CPU/8 thread(s)
9125
ns9334
ns0.98
batchnorm(2, act=identity, affine=false)(32 x 32)/zygote/CPU/1 thread(s)
9208
ns8354.5
ns1.10
dense(512, bias=false, act=identity)(512 x 128)/forward/CPU/2 thread(s)
64375
ns64459
ns1.00
dense(512, bias=false, act=identity)(512 x 128)/forward/CPU/4 thread(s)
64542
ns64750
ns1.00
dense(512, bias=false, act=identity)(512 x 128)/forward/CPU/8 thread(s)
64667
ns64916
ns1.00
dense(512, bias=false, act=identity)(512 x 128)/forward/CPU/1 thread(s)
64500
ns64625
ns1.00
dense(512, bias=false, act=identity)(512 x 128)/zygote/CPU/2 thread(s)
277667
ns279250
ns0.99
dense(512, bias=false, act=identity)(512 x 128)/zygote/CPU/4 thread(s)
287083
ns282167
ns1.02
dense(512, bias=false, act=identity)(512 x 128)/zygote/CPU/8 thread(s)
291375
ns284125
ns1.03
dense(512, bias=false, act=identity)(512 x 128)/zygote/CPU/1 thread(s)
284145.5
ns278708
ns1.02
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/2 thread(s)
3306333
ns3278417
ns1.01
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/4 thread(s)
3031917
ns3081000
ns0.98
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/8 thread(s)
2796833
ns3021792
ns0.93
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/1 thread(s)
3935125
ns4040979.5
ns0.97
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/2 thread(s)
7260770.5
ns7620208
ns0.95
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/4 thread(s)
7411416
ns7449187.5
ns0.99
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/8 thread(s)
7367271
ns7493708.5
ns0.98
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/1 thread(s)
8191583.5
ns8208791
ns1.00
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/2 thread(s)
17581104
ns18366417
ns0.96
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/4 thread(s)
17521584
ns17522312.5
ns1.00
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/8 thread(s)
17682146
ns17580834
ns1.01
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/1 thread(s)
14123875
ns14093354.5
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
23725208
ns23631333
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
34375583
ns33504604
ns1.03
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
40913375
ns37034667
ns1.10
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
34801458
ns34967583.5
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
189578375
ns189693000
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
164456312.5
ns165014875
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
155623541
ns152416688
ns1.02
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
434187396
ns434850958
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
289496083
ns289105312.5
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
262462166
ns250867083
ns1.05
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
305828042
ns296775875
ns1.03
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
474493916.5
ns473537562.5
ns1.00
layernorm(4, act=identity, affine=false)(16 x 16 x 4 x 32)/forward/CPU/2 thread(s)
23604
ns22083
ns1.07
layernorm(4, act=identity, affine=false)(16 x 16 x 4 x 32)/forward/CPU/4 thread(s)
24250
ns22459
ns1.08
layernorm(4, act=identity, affine=false)(16 x 16 x 4 x 32)/forward/CPU/8 thread(s)
23979
ns25375
ns0.94
layernorm(4, act=identity, affine=false)(16 x 16 x 4 x 32)/forward/CPU/1 thread(s)
21291
ns24083
ns0.88
layernorm(4, act=identity, affine=false)(16 x 16 x 4 x 32)/zygote/CPU/2 thread(s)
104687.5
ns103083
ns1.02
layernorm(4, act=identity, affine=false)(16 x 16 x 4 x 32)/zygote/CPU/4 thread(s)
104875
ns103250
ns1.02
layernorm(4, act=identity, affine=false)(16 x 16 x 4 x 32)/zygote/CPU/8 thread(s)
104125
ns104542
ns1.00
layernorm(4, act=identity, affine=false)(16 x 16 x 4 x 32)/zygote/CPU/1 thread(s)
103292
ns103041
ns1.00
layernorm(2, act=relu, affine=true)(32 x 32)/forward/CPU/2 thread(s)
6749.5
ns5917
ns1.14
layernorm(2, act=relu, affine=true)(32 x 32)/forward/CPU/4 thread(s)
5416
ns5958
ns0.91
layernorm(2, act=relu, affine=true)(32 x 32)/forward/CPU/8 thread(s)
7000
ns6708
ns1.04
layernorm(2, act=relu, affine=true)(32 x 32)/forward/CPU/1 thread(s)
5333
ns5791.5
ns0.92
layernorm(2, act=relu, affine=true)(32 x 32)/zygote/CPU/2 thread(s)
14833
ns14792
ns1.00
layernorm(2, act=relu, affine=true)(32 x 32)/zygote/CPU/4 thread(s)
14709
ns15000
ns0.98
layernorm(2, act=relu, affine=true)(32 x 32)/zygote/CPU/8 thread(s)
16166
ns16542
ns0.98
layernorm(2, act=relu, affine=true)(32 x 32)/zygote/CPU/1 thread(s)
14770.5
ns14875
ns0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
3018000
ns3002625
ns1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2066604.5
ns2079375
ns0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2280541.5
ns2272333
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
4577917
ns4882708
ns0.94
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
23533375
ns23536000
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
18022709
ns18038562.5
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
17334750
ns16972167
ns1.02
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
34837750
ns34545146
ns1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
33300333
ns33221458
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
27629000
ns27561792
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
27822584
ns27327000
ns1.02
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
41187708
ns42034750
ns0.98
layernorm(4, act=relu, affine=true)(16 x 16 x 4 x 32)/forward/CPU/2 thread(s)
74520.5
ns71417
ns1.04
layernorm(4, act=relu, affine=true)(16 x 16 x 4 x 32)/forward/CPU/4 thread(s)
74875
ns71854.5
ns1.04
layernorm(4, act=relu, affine=true)(16 x 16 x 4 x 32)/forward/CPU/8 thread(s)
82167
ns75708
ns1.09
layernorm(4, act=relu, affine=true)(16 x 16 x 4 x 32)/forward/CPU/1 thread(s)
74583
ns74708
ns1.00
layernorm(4, act=relu, affine=true)(16 x 16 x 4 x 32)/zygote/CPU/2 thread(s)
308437.5
ns205250.5
ns1.50
layernorm(4, act=relu, affine=true)(16 x 16 x 4 x 32)/zygote/CPU/4 thread(s)
225749.5
ns206750
ns1.09
layernorm(4, act=relu, affine=true)(16 x 16 x 4 x 32)/zygote/CPU/8 thread(s)
320208.5
ns208958
ns1.53
layernorm(4, act=relu, affine=true)(16 x 16 x 4 x 32)/zygote/CPU/1 thread(s)
218542
ns217416
ns1.01
layernorm(2, act=gelu, affine=true)(32 x 32)/forward/CPU/2 thread(s)
11583
ns11875
ns0.98
layernorm(2, act=gelu, affine=true)(32 x 32)/forward/CPU/4 thread(s)
11583
ns11416
ns1.01
layernorm(2, act=gelu, affine=true)(32 x 32)/forward/CPU/8 thread(s)
13208
ns12958
ns1.02
layernorm(2, act=gelu, affine=true)(32 x 32)/forward/CPU/1 thread(s)
11458
ns11708
ns0.98
layernorm(2, act=gelu, affine=true)(32 x 32)/zygote/CPU/2 thread(s)
28167
ns25667
ns1.10
layernorm(2, act=gelu, affine=true)(32 x 32)/zygote/CPU/4 thread(s)
28375
ns26541.5
ns1.07
layernorm(2, act=gelu, affine=true)(32 x 32)/zygote/CPU/8 thread(s)
29709
ns27729.5
ns1.07
layernorm(2, act=gelu, affine=true)(32 x 32)/zygote/CPU/1 thread(s)
28917
ns26667
ns1.08
groupnorm(2, act=gelu, affine=true)(32 x 32)/forward/CPU/2 thread(s)
12000
ns12812.5
ns0.94
groupnorm(2, act=gelu, affine=true)(32 x 32)/forward/CPU/4 thread(s)
12292
ns12209
ns1.01
groupnorm(2, act=gelu, affine=true)(32 x 32)/forward/CPU/8 thread(s)
13958
ns14208
ns0.98
groupnorm(2, act=gelu, affine=true)(32 x 32)/forward/CPU/1 thread(s)
12333
ns12291.5
ns1.00
groupnorm(2, act=gelu, affine=true)(32 x 32)/zygote/CPU/2 thread(s)
25666
ns25625
ns1.00
groupnorm(2, act=gelu, affine=true)(32 x 32)/zygote/CPU/4 thread(s)
25959
ns25916.5
ns1.00
groupnorm(2, act=gelu, affine=true)(32 x 32)/zygote/CPU/8 thread(s)
26500
ns26250
ns1.01
groupnorm(2, act=gelu, affine=true)(32 x 32)/zygote/CPU/1 thread(s)
26459
ns26604
ns0.99
groupnorm(4, act=gelu, affine=false)(16 x 16 x 4 x 32)/forward/CPU/2 thread(s)
180521
ns178792
ns1.01
groupnorm(4, act=gelu, affine=false)(16 x 16 x 4 x 32)/forward/CPU/4 thread(s)
179354.5
ns180750
ns0.99
groupnorm(4, act=gelu, affine=false)(16 x 16 x 4 x 32)/forward/CPU/8 thread(s)
183458
ns181917
ns1.01
groupnorm(4, act=gelu, affine=false)(16 x 16 x 4 x 32)/forward/CPU/1 thread(s)
180375
ns179166
ns1.01
groupnorm(4, act=gelu, affine=false)(16 x 16 x 4 x 32)/zygote/CPU/2 thread(s)
590375
ns593333
ns1.00
groupnorm(4, act=gelu, affine=false)(16 x 16 x 4 x 32)/zygote/CPU/4 thread(s)
594250
ns582708
ns1.02
groupnorm(4, act=gelu, affine=false)(16 x 16 x 4 x 32)/zygote/CPU/8 thread(s)
594916
ns583667
ns1.02
groupnorm(4, act=gelu, affine=false)(16 x 16 x 4 x 32)/zygote/CPU/1 thread(s)
583541
ns584542
ns1.00
layernorm(2, act=identity, affine=true)(32 x 32)/forward/CPU/2 thread(s)
6084
ns6167
ns0.99
layernorm(2, act=identity, affine=true)(32 x 32)/forward/CPU/4 thread(s)
5854.5
ns5875
ns1.00
layernorm(2, act=identity, affine=true)(32 x 32)/forward/CPU/8 thread(s)
7104.5
ns6875
ns1.03
layernorm(2, act=identity, affine=true)(32 x 32)/forward/CPU/1 thread(s)
5917
ns5708.5
ns1.04
layernorm(2, act=identity, affine=true)(32 x 32)/zygote/CPU/2 thread(s)
14208
ns13791
ns1.03
layernorm(2, act=identity, affine=true)(32 x 32)/zygote/CPU/4 thread(s)
13500
ns13917
ns0.97
layernorm(2, act=identity, affine=true)(32 x 32)/zygote/CPU/8 thread(s)
15625
ns15667
ns1.00
layernorm(2, act=identity, affine=true)(32 x 32)/zygote/CPU/1 thread(s)
13834
ns14458
ns0.96
batchedmm(512, Bsize=4)/forward/CPU/2 thread(s)
1217312.5
ns1225312.5
ns0.99
batchedmm(512, Bsize=4)/forward/CPU/4 thread(s)
1268500
ns1241959
ns1.02
batchedmm(512, Bsize=4)/forward/CPU/8 thread(s)
1281209
ns1289958.5
ns0.99
batchedmm(512, Bsize=4)/forward/CPU/1 thread(s)
998541.5
ns1011625
ns0.99
batchedmm(512, Bsize=4)/zygote/CPU/2 thread(s)
4105042
ns4103042
ns1.00
batchedmm(512, Bsize=4)/zygote/CPU/4 thread(s)
4410083.5
ns4403333
ns1.00
batchedmm(512, Bsize=4)/zygote/CPU/8 thread(s)
4905208.5
ns4523854.5
ns1.08
batchedmm(512, Bsize=4)/zygote/CPU/1 thread(s)
3703875
ns3709771
ns1.00
dense(2, bias=true, act=gelu)(2 x 128)/forward/CPU/2 thread(s)
1792
ns1875
ns0.96
dense(2, bias=true, act=gelu)(2 x 128)/forward/CPU/4 thread(s)
1792
ns1875
ns0.96
dense(2, bias=true, act=gelu)(2 x 128)/forward/CPU/8 thread(s)
1791
ns1916
ns0.93
dense(2, bias=true, act=gelu)(2 x 128)/forward/CPU/1 thread(s)
1875
ns1875
ns1
dense(2, bias=true, act=gelu)(2 x 128)/zygote/CPU/2 thread(s)
4833
ns4958
ns0.97
dense(2, bias=true, act=gelu)(2 x 128)/zygote/CPU/4 thread(s)
4833
ns5000
ns0.97
dense(2, bias=true, act=gelu)(2 x 128)/zygote/CPU/8 thread(s)
4833
ns4958
ns0.97
dense(2, bias=true, act=gelu)(2 x 128)/zygote/CPU/1 thread(s)
4875
ns4875
ns1
groupnorm(2, act=relu, affine=true)(4 x 32)/forward/CPU/2 thread(s)
5375
ns5833
ns0.92
groupnorm(2, act=relu, affine=true)(4 x 32)/forward/CPU/4 thread(s)
5958
ns5917
ns1.01
groupnorm(2, act=relu, affine=true)(4 x 32)/forward/CPU/8 thread(s)
7166.5
ns6667
ns1.07
groupnorm(2, act=relu, affine=true)(4 x 32)/forward/CPU/1 thread(s)
5333.5
ns5209
ns1.02
groupnorm(2, act=relu, affine=true)(4 x 32)/zygote/CPU/2 thread(s)
10500
ns11125
ns0.94
groupnorm(2, act=relu, affine=true)(4 x 32)/zygote/CPU/4 thread(s)
11042
ns11500
ns0.96
groupnorm(2, act=relu, affine=true)(4 x 32)/zygote/CPU/8 thread(s)
11125
ns11458
ns0.97
groupnorm(2, act=relu, affine=true)(4 x 32)/zygote/CPU/1 thread(s)
11542
ns10500
ns1.10
dense(2, bias=false, act=relu)(2 x 128)/forward/CPU/2 thread(s)
292
ns375
ns0.78
dense(2, bias=false, act=relu)(2 x 128)/forward/CPU/4 thread(s)
333
ns375
ns0.89
dense(2, bias=false, act=relu)(2 x 128)/forward/CPU/8 thread(s)
291
ns333
ns0.87
dense(2, bias=false, act=relu)(2 x 128)/forward/CPU/1 thread(s)
333
ns375
ns0.89
dense(2, bias=false, act=relu)(2 x 128)/zygote/CPU/2 thread(s)
2750
ns2792
ns0.98
dense(2, bias=false, act=relu)(2 x 128)/zygote/CPU/4 thread(s)
2708
ns2833
ns0.96
dense(2, bias=false, act=relu)(2 x 128)/zygote/CPU/8 thread(s)
2750
ns3083
ns0.89
dense(2, bias=false, act=relu)(2 x 128)/zygote/CPU/1 thread(s)
3083
ns2750
ns1.12
groupnorm(2, act=gelu, affine=false)(32 x 32)/forward/CPU/2 thread(s)
10875
ns11459
ns0.95
groupnorm(2, act=gelu, affine=false)(32 x 32)/forward/CPU/4 thread(s)
11125
ns11625
ns0.96
groupnorm(2, act=gelu, affine=false)(32 x 32)/forward/CPU/8 thread(s)
12958.5
ns12875
ns1.01
groupnorm(2, act=gelu, affine=false)(32 x 32)/forward/CPU/1 thread(s)
11229.5
ns10958
ns1.02
groupnorm(2, act=gelu, affine=false)(32 x 32)/zygote/CPU/2 thread(s)
24604.5
ns25020.5
ns0.98
groupnorm(2, act=gelu, affine=false)(32 x 32)/zygote/CPU/4 thread(s)
24834
ns25292
ns0.98
groupnorm(2, act=gelu, affine=false)(32 x 32)/zygote/CPU/8 thread(s)
25333
ns25125
ns1.01
groupnorm(2, act=gelu, affine=false)(32 x 32)/zygote/CPU/1 thread(s)
25333
ns24875
ns1.02
dense(32, bias=true, act=relu)(32 x 128)/forward/CPU/2 thread(s)
4166
ns4250
ns0.98
dense(32, bias=true, act=relu)(32 x 128)/forward/CPU/4 thread(s)
4167
ns4250
ns0.98
dense(32, bias=true, act=relu)(32 x 128)/forward/CPU/8 thread(s)
4208
ns4250
ns0.99
dense(32, bias=true, act=relu)(32 x 128)/forward/CPU/1 thread(s)
4208
ns4208
ns1
dense(32, bias=true, act=relu)(32 x 128)/zygote/CPU/2 thread(s)
16375
ns16333
ns1.00
dense(32, bias=true, act=relu)(32 x 128)/zygote/CPU/4 thread(s)
16500
ns16375
ns1.01
dense(32, bias=true, act=relu)(32 x 128)/zygote/CPU/8 thread(s)
16167
ns16520.5
ns0.98
dense(32, bias=true, act=relu)(32 x 128)/zygote/CPU/1 thread(s)
16291
ns16208
ns1.01
batchnorm(2, act=gelu, affine=false)(32 x 32)/forward/CPU/2 thread(s)
5834
ns5833
ns1.00
batchnorm(2, act=gelu, affine=false)(32 x 32)/forward/CPU/4 thread(s)
5834
ns5833
ns1.00
batchnorm(2, act=gelu, affine=false)(32 x 32)/forward/CPU/8 thread(s)
5792
ns6042
ns0.96
batchnorm(2, act=gelu, affine=false)(32 x 32)/forward/CPU/1 thread(s)
5875
ns5833
ns1.01
batchnorm(2, act=gelu, affine=false)(32 x 32)/zygote/CPU/2 thread(s)
20792
ns21000
ns0.99
batchnorm(2, act=gelu, affine=false)(32 x 32)/zygote/CPU/4 thread(s)
21000
ns21000
ns1
batchnorm(2, act=gelu, affine=false)(32 x 32)/zygote/CPU/8 thread(s)
21166
ns21417
ns0.99
batchnorm(2, act=gelu, affine=false)(32 x 32)/zygote/CPU/1 thread(s)
21167
ns20709
ns1.02
batchedmm(16, Bsize=512)/forward/CPU/2 thread(s)
423895.5
ns422124.5
ns1.00
batchedmm(16, Bsize=512)/forward/CPU/4 thread(s)
380479
ns387791
ns0.98
batchedmm(16, Bsize=512)/forward/CPU/8 thread(s)
485125
ns477333
ns1.02
batchedmm(16, Bsize=512)/forward/CPU/1 thread(s)
106958
ns103125
ns1.04
batchedmm(16, Bsize=512)/zygote/CPU/2 thread(s)
937833
ns921333
ns1.02
batchedmm(16, Bsize=512)/zygote/CPU/4 thread(s)
963250
ns974250
ns0.99
batchedmm(16, Bsize=512)/zygote/CPU/8 thread(s)
1216083
ns1186458
ns1.02
batchedmm(16, Bsize=512)/zygote/CPU/1 thread(s)
428542
ns457479.5
ns0.94
groupnorm(4, act=identity, affine=false)(16 x 16 x 32 x 32)/forward/CPU/2 thread(s)
80291.5
ns80542
ns1.00
groupnorm(4, act=identity, affine=false)(16 x 16 x 32 x 32)/forward/CPU/4 thread(s)
79458
ns80709
ns0.98
groupnorm(4, act=identity, affine=false)(16 x 16 x 32 x 32)/forward/CPU/8 thread(s)
87042
ns84896
ns1.03
groupnorm(4, act=identity, affine=false)(16 x 16 x 32 x 32)/forward/CPU/1 thread(s)
80375
ns79833
ns1.01
groupnorm(4, act=identity, affine=false)(16 x 16 x 32 x 32)/zygote/CPU/2 thread(s)
1917916.5
ns1919250
ns1.00
groupnorm(4, act=identity, affine=false)(16 x 16 x 32 x 32)/zygote/CPU/4 thread(s)
1918437.5
ns1876583
ns1.02
groupnorm(4, act=identity, affine=false)(16 x 16 x 32 x 32)/zygote/CPU/8 thread(s)
1950812.5
ns1946041
ns1.00
groupnorm(4, act=identity, affine=false)(16 x 16 x 32 x 32)/zygote/CPU/1 thread(s)
1915188
ns1921396
ns1.00
dense(2, bias=true, act=identity)(2 x 128)/forward/CPU/2 thread(s)
291
ns292
ns1.00
dense(2, bias=true, act=identity)(2 x 128)/forward/CPU/4 thread(s)
292
ns292
ns1
dense(2, bias=true, act=identity)(2 x 128)/forward/CPU/8 thread(s)
333
ns333
ns1
dense(2, bias=true, act=identity)(2 x 128)/forward/CPU/1 thread(s)
292
ns292
ns1
dense(2, bias=true, act=identity)(2 x 128)/zygote/CPU/2 thread(s)
1792
ns1917
ns0.93
dense(2, bias=true, act=identity)(2 x 128)/zygote/CPU/4 thread(s)
1834
ns1917
ns0.96
dense(2, bias=true, act=identity)(2 x 128)/zygote/CPU/8 thread(s)
1875
ns1875
ns1
dense(2, bias=true, act=identity)(2 x 128)/zygote/CPU/1 thread(s)
1875
ns1792
ns1.05
groupnorm(2, act=gelu, affine=false)(4 x 32)/forward/CPU/2 thread(s)
6000
ns6417
ns0.94
groupnorm(2, act=gelu, affine=false)(4 x 32)/forward/CPU/4 thread(s)
6167
ns6666
ns0.93
groupnorm(2, act=gelu, affine=false)(4 x 32)/forward/CPU/8 thread(s)
7834
ns7771
ns1.01
groupnorm(2, act=gelu, affine=false)(4 x 32)/forward/CPU/1 thread(s)
6125
ns6145.5
ns1.00
groupnorm(2, act=gelu, affine=false)(4 x 32)/zygote/CPU/2 thread(s)
9041
ns9604.5
ns0.94
groupnorm(2, act=gelu, affine=false)(4 x 32)/zygote/CPU/4 thread(s)
9125
ns9459
ns0.96
groupnorm(2, act=gelu, affine=false)(4 x 32)/zygote/CPU/8 thread(s)
9333
ns9500
ns0.98
groupnorm(2, act=gelu, affine=false)(4 x 32)/zygote/CPU/1 thread(s)
9625
ns9041
ns1.06
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
120446062.5
ns120459792
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
174298416.5
ns173682208
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
155622396
ns147804000
ns1.05
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
104910437
ns105720875
ns0.99
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
613470583
ns610206729.5
ns1.01
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
555889999.5
ns555562500
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
467916666
ns452099291.5
ns1.03
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
629979541
ns626409896
ns1.01
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
717129562
ns657253583
ns1.09
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
665448791
ns665008062.5
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
597201792
ns581676208.5
ns1.03
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
855951979.5
ns857648458
ns1.00
batchnorm(4, act=identity, affine=true)(16 x 16 x 32 x 32)/forward/CPU/2 thread(s)
58542
ns57875
ns1.01
batchnorm(4, act=identity, affine=true)(16 x 16 x 32 x 32)/forward/CPU/4 thread(s)
48208
ns47791
ns1.01
batchnorm(4, act=identity, affine=true)(16 x 16 x 32 x 32)/forward/CPU/8 thread(s)
39083
ns47500
ns0.82
batchnorm(4, act=identity, affine=true)(16 x 16 x 32 x 32)/forward/CPU/1 thread(s)
80167
ns83395.5
ns0.96
batchnorm(4, act=identity, affine=true)(16 x 16 x 32 x 32)/zygote/CPU/2 thread(s)
1918312.5
ns1915500
ns1.00
batchnorm(4, act=identity, affine=true)(16 x 16 x 32 x 32)/zygote/CPU/4 thread(s)
1976771
ns1932792
ns1.02
batchnorm(4, act=identity, affine=true)(16 x 16 x 32 x 32)/zygote/CPU/8 thread(s)
1793729
ns1995084
ns0.90
batchnorm(4, act=identity, affine=true)(16 x 16 x 32 x 32)/zygote/CPU/1 thread(s)
1888625
ns1890500
ns1.00
layernorm(4, act=gelu, affine=true)(16 x 16 x 4 x 32)/forward/CPU/2 thread(s)
268666.5
ns267854.5
ns1.00
layernorm(4, act=gelu, affine=true)(16 x 16 x 4 x 32)/forward/CPU/4 thread(s)
268458
ns267708
ns1.00
layernorm(4, act=gelu, affine=true)(16 x 16 x 4 x 32)/forward/CPU/8 thread(s)
269271
ns269750
ns1.00
layernorm(4, act=gelu, affine=true)(16 x 16 x 4 x 32)/forward/CPU/1 thread(s)
265875
ns268166
ns0.99
layernorm(4, act=gelu, affine=true)(16 x 16 x 4 x 32)/zygote/CPU/2 thread(s)
676000
ns594417
ns1.14
layernorm(4, act=gelu, affine=true)(16 x 16 x 4 x 32)/zygote/CPU/4 thread(s)
587417
ns681291
ns0.86
layernorm(4, act=gelu, affine=true)(16 x 16 x 4 x 32)/zygote/CPU/8 thread(s)
601499.5
ns604895.5
ns0.99
layernorm(4, act=gelu, affine=true)(16 x 16 x 4 x 32)/zygote/CPU/1 thread(s)
700333
ns689917
ns1.02
layernorm(4, act=gelu, affine=true)(16 x 16 x 32 x 32)/forward/CPU/2 thread(s)
2212542
ns2176375
ns1.02
layernorm(4, act=gelu, affine=true)(16 x 16 x 32 x 32)/forward/CPU/4 thread(s)
2211416
ns2222812.5
ns0.99
layernorm(4, act=gelu, affine=true)(16 x 16 x 32 x 32)/forward/CPU/8 thread(s)
2103833
ns2205042
ns0.95
layernorm(4, act=gelu, affine=true)(16 x 16 x 32 x 32)/forward/CPU/1 thread(s)
2216500
ns2093562.5
ns1.06
layernorm(4, act=gelu, affine=true)(16 x 16 x 32 x 32)/zygote/CPU/2 thread(s)
5504541
ns5514416
ns1.00
layernorm(4, act=gelu, affine=true)(16 x 16 x 32 x 32)/zygote/CPU/4 thread(s)
5488625
ns5508500
ns1.00
layernorm(4, act=gelu, affine=true)(16 x 16 x 32 x 32)/zygote/CPU/8 thread(s)
5582375
ns5535958
ns1.01
layernorm(4, act=gelu, affine=true)(16 x 16 x 32 x 32)/zygote/CPU/1 thread(s)
5490917
ns5491750
ns1.00
dense(512, bias=true, act=gelu)(512 x 128)/forward/CPU/2 thread(s)
647417
ns638167
ns1.01
dense(512, bias=true, act=gelu)(512 x 128)/forward/CPU/4 thread(s)
641916.5
ns647708
ns0.99
dense(512, bias=true, act=gelu)(512 x 128)/forward/CPU/8 thread(s)
650125
ns659416
ns0.99
dense(512, bias=true, act=gelu)(512 x 128)/forward/CPU/1 thread(s)
642917
ns643750
ns1.00
dense(512, bias=true, act=gelu)(512 x 128)/zygote/CPU/2 thread(s)
1821291
ns1822167
ns1.00
dense(512, bias=true, act=gelu)(512 x 128)/zygote/CPU/4 thread(s)
1717958
ns1723042
ns1.00
dense(512, bias=true, act=gelu)(512 x 128)/zygote/CPU/8 thread(s)
1666375
ns1727833
ns0.96
dense(512, bias=true, act=gelu)(512 x 128)/zygote/CPU/1 thread(s)
2103666.5
ns2106333
ns1.00
batchnorm(4, act=relu, affine=true)(16 x 16 x 32 x 32)/forward/CPU/2 thread(s)
58292
ns58458
ns1.00
batchnorm(4, act=relu, affine=true)(16 x 16 x 32 x 32)/forward/CPU/4 thread(s)
47209
ns46917
ns1.01
batchnorm(4, act=relu, affine=true)(16 x 16 x 32 x 32)/forward/CPU/8 thread(s)
37250
ns47292
ns0.79
batchnorm(4, act=relu, affine=true)(16 x 16 x 32 x 32)/forward/CPU/1 thread(s)
80791
ns84125
ns0.96
batchnorm(4, act=relu, affine=true)(16 x 16 x 32 x 32)/zygote/CPU/2 thread(s)
2017916.5
ns2030041
ns0.99
batchnorm(4, act=relu, affine=true)(16 x 16 x 32 x 32)/zygote/CPU/4 thread(s)
2086583
ns2004250
ns1.04
batchnorm(4, act=relu, affine=true)(16 x 16 x 32 x 32)/zygote/CPU/8 thread(s)
1901083
ns2122125
ns0.90
batchnorm(4, act=relu, affine=true)(16 x 16 x 32 x 32)/zygote/CPU/1 thread(s)
1990750
ns1985979.5
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
13371875
ns13357770.5
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
12426458
ns12440000
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
12666062
ns12492250
ns1.01
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
15204979
ns15108458
ns1.01
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
47257417
ns47178791.5
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
41744209
ns41760334
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
41179062.5
ns40950875
ns1.01
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
58639833
ns58205437.5
ns1.01
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
73940917
ns97014458.5
ns0.76
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
90904041
ns91152834
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
91001000
ns90701604.5
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
98448625
ns98541521.5
ns1.00
batchnorm(4, act=identity, affine=false)(16 x 16 x 32 x 32)/forward/CPU/2 thread(s)
58833
ns58959
ns1.00
batchnorm(4, act=identity, affine=false)(16 x 16 x 32 x 32)/forward/CPU/4 thread(s)
47958
ns47375
ns1.01
batchnorm(4, act=identity, affine=false)(16 x 16 x 32 x 32)/forward/CPU/8 thread(s)
38542
ns47750
ns0.81
batchnorm(4, act=identity, affine=false)(16 x 16 x 32 x 32)/forward/CPU/1 thread(s)
84292
ns79958
ns1.05
batchnorm(4, act=identity, affine=false)(16 x 16 x 32 x 32)/zygote/CPU/2 thread(s)
1904750
ns1918645.5
ns0.99
batchnorm(4, act=identity, affine=false)(16 x 16 x 32 x 32)/zygote/CPU/4 thread(s)
1969542
ns1971000
ns1.00
batchnorm(4, act=identity, affine=false)(16 x 16 x 32 x 32)/zygote/CPU/8 thread(s)
1800875
ns1997667
ns0.90
batchnorm(4, act=identity, affine=false)(16 x 16 x 32 x 32)/zygote/CPU/1 thread(s)
1895917
ns1889750
ns1.00
batchnorm(2, act=identity, affine=false)(4 x 32)/forward/CPU/2 thread(s)
292
ns416
ns0.70
batchnorm(2, act=identity, affine=false)(4 x 32)/forward/CPU/4 thread(s)
416
ns375
ns1.11
batchnorm(2, act=identity, affine=false)(4 x 32)/forward/CPU/8 thread(s)
375
ns375
ns1
batchnorm(2, act=identity, affine=false)(4 x 32)/forward/CPU/1 thread(s)
375
ns333
ns1.13
batchnorm(2, act=identity, affine=false)(4 x 32)/zygote/CPU/2 thread(s)
6145.5
ns6292
ns0.98
batchnorm(2, act=identity, affine=false)(4 x 32)/zygote/CPU/4 thread(s)
6458
ns6542
ns0.99
batchnorm(2, act=identity, affine=false)(4 x 32)/zygote/CPU/8 thread(s)
6375
ns6834
ns0.93
batchnorm(2, act=identity, affine=false)(4 x 32)/zygote/CPU/1 thread(s)
6625
ns6125
ns1.08
dense(2, bias=false, act=identity)(2 x 128)/forward/CPU/2 thread(s)
250
ns292
ns0.86
dense(2, bias=false, act=identity)(2 x 128)/forward/CPU/4 thread(s)
292
ns333
ns0.88
dense(2, bias=false, act=identity)(2 x 128)/forward/CPU/8 thread(s)
292
ns292
ns1
dense(2, bias=false, act=identity)(2 x 128)/forward/CPU/1 thread(s)
292
ns333
ns0.88
dense(2, bias=false, act=identity)(2 x 128)/zygote/CPU/2 thread(s)
2666
ns2833
ns0.94
dense(2, bias=false, act=identity)(2 x 128)/zygote/CPU/4 thread(s)
2875
ns2917
ns0.99
dense(2, bias=false, act=identity)(2 x 128)/zygote/CPU/8 thread(s)
2833
ns2917
ns0.97
dense(2, bias=false, act=identity)(2 x 128)/zygote/CPU/1 thread(s)
2875
ns2708
ns1.06
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
284556437.5
ns289426812.5
ns0.98
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
340224270.5
ns339624334
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
320916166
ns315284104.5
ns1.02
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
270718833
ns274668667
ns0.99
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
998965333.5
ns1014634416
ns0.98
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
956359521
ns953687125
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
868085334
ns857733312.5
ns1.01
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
1210263479.5
ns1265357333
ns0.96
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
1439494000
ns1675373667
ns0.86
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
1675455020.5
ns1668941291
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
1623450375
ns1606744000
ns1.01
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
1781275542
ns1787636084
ns1.00
groupnorm(4, act=gelu, affine=true)(16 x 16 x 32 x 32)/forward/CPU/2 thread(s)
1402500
ns1409499.5
ns1.00
groupnorm(4, act=gelu, affine=true)(16 x 16 x 32 x 32)/forward/CPU/4 thread(s)
1406416
ns1413833
ns0.99
groupnorm(4, act=gelu, affine=true)(16 x 16 x 32 x 32)/forward/CPU/8 thread(s)
1410125
ns1419895.5
ns0.99
groupnorm(4, act=gelu, affine=true)(16 x 16 x 32 x 32)/forward/CPU/1 thread(s)
1406875
ns1458541.5
ns0.96
groupnorm(4, act=gelu, affine=true)(16 x 16 x 32 x 32)/zygote/CPU/2 thread(s)
5015125
ns5016749.5
ns1.00
groupnorm(4, act=gelu, affine=true)(16 x 16 x 32 x 32)/zygote/CPU/4 thread(s)
5021375
ns4651917
ns1.08
groupnorm(4, act=gelu, affine=true)(16 x 16 x 32 x 32)/zygote/CPU/8 thread(s)
5065333
ns5058791
ns1.00
groupnorm(4, act=gelu, affine=true)(16 x 16 x 32 x 32)/zygote/CPU/1 thread(s)
5030104.5
ns5012792
ns1.00
vgg16(32, 32, 3, 32)/forward/CPU/2 thread(s)
178918125
ns171852250
ns1.04
vgg16(32, 32, 3, 32)/forward/CPU/4 thread(s)
137633791
ns129831062.5
ns1.06
vgg16(32, 32, 3, 32)/forward/CPU/8 thread(s)
137284041
ns115995771
ns1.18
vgg16(32, 32, 3, 32)/forward/CPU/1 thread(s)
169122750
ns168839667
ns1.00
vgg16(32, 32, 3, 32)/zygote/CPU/2 thread(s)
824093375
ns629070333
ns1.31
vgg16(32, 32, 3, 32)/zygote/CPU/4 thread(s)
493391208
ns493488792
ns1.00
vgg16(32, 32, 3, 32)/zygote/CPU/8 thread(s)
544904625
ns456364583
ns1.19
vgg16(32, 32, 3, 32)/zygote/CPU/1 thread(s)
646424584
ns675660292
ns0.96
batchedmm(512, Bsize=32)/forward/CPU/2 thread(s)
8944417
ns8950646
ns1.00
batchedmm(512, Bsize=32)/forward/CPU/4 thread(s)
8930333
ns8924625
ns1.00
batchedmm(512, Bsize=32)/forward/CPU/8 thread(s)
8002583
ns7865125
ns1.02
batchedmm(512, Bsize=32)/forward/CPU/1 thread(s)
9740458
ns9701750
ns1.00
batchedmm(512, Bsize=32)/zygote/CPU/2 thread(s)
37148750
ns36024125
ns1.03
batchedmm(512, Bsize=32)/zygote/CPU/4 thread(s)
36964208
ns37000208.5
ns1.00
batchedmm(512, Bsize=32)/zygote/CPU/8 thread(s)
34465958
ns33425875
ns1.03
batchedmm(512, Bsize=32)/zygote/CPU/1 thread(s)
38308250
ns37661542
ns1.02
bias_activation(32, act=tanh)(32 x 128)/forward/CPU/2 thread(s)
47458
ns47562.5
ns1.00
bias_activation(32, act=tanh)(32 x 128)/forward/CPU/4 thread(s)
47334
ns47416
ns1.00
bias_activation(32, act=tanh)(32 x 128)/forward/CPU/8 thread(s)
47542
ns47666
ns1.00
bias_activation(32, act=tanh)(32 x 128)/forward/CPU/1 thread(s)
47584
ns47375
ns1.00
bias_activation(32, act=tanh)(32 x 128)/zygote/CPU/2 thread(s)
50542
ns50542
ns1
bias_activation(32, act=tanh)(32 x 128)/zygote/CPU/4 thread(s)
50542
ns50375
ns1.00
bias_activation(32, act=tanh)(32 x 128)/zygote/CPU/8 thread(s)
50625
ns50584
ns1.00
bias_activation(32, act=tanh)(32 x 128)/zygote/CPU/1 thread(s)
50500
ns50583
ns1.00
groupnorm(2, act=gelu, affine=true)(4 x 32)/forward/CPU/2 thread(s)
6292
ns6958.5
ns0.90
groupnorm(2, act=gelu, affine=true)(4 x 32)/forward/CPU/4 thread(s)
6625
ns6500
ns1.02
groupnorm(2, act=gelu, affine=true)(4 x 32)/forward/CPU/8 thread(s)
8479
ns8042
ns1.05
groupnorm(2, act=gelu, affine=true)(4 x 32)/forward/CPU/1 thread(s)
6792
ns6542
ns1.04
groupnorm(2, act=gelu, affine=true)(4 x 32)/zygote/CPU/2 thread(s)
9584
ns10042
ns0.95
groupnorm(2, act=gelu, affine=true)(4 x 32)/zygote/CPU/4 thread(s)
10625
ns10437.5
ns1.02
groupnorm(2, act=gelu, affine=true)(4 x 32)/zygote/CPU/8 thread(s)
10375
ns10500
ns0.99
groupnorm(2, act=gelu, affine=true)(4 x 32)/zygote/CPU/1 thread(s)
10458
ns10375
ns1.01
groupnorm(2, act=relu, affine=false)(32 x 32)/forward/CPU/2 thread(s)
5250
ns5666
ns0.93
groupnorm(2, act=relu, affine=false)(32 x 32)/forward/CPU/4 thread(s)
5917
ns5958
ns0.99
groupnorm(2, act=relu, affine=false)(32 x 32)/forward/CPU/8 thread(s)
7917
ns7417
ns1.07
groupnorm(2, act=relu, affine=false)(32 x 32)/forward/CPU/1 thread(s)
5750
ns5458
ns1.05
groupnorm(2, act=relu, affine=false)(32 x 32)/zygote/CPU/2 thread(s)
18291.5
ns13125
ns1.39
groupnorm(2, act=relu, affine=false)(32 x 32)/zygote/CPU/4 thread(s)
15958
ns13250
ns1.20
groupnorm(2, act=relu, affine=false)(32 x 32)/zygote/CPU/8 thread(s)
16500
ns13375
ns1.23
groupnorm(2, act=relu, affine=false)(32 x 32)/zygote/CPU/1 thread(s)
16583
ns13208
ns1.26
batchnorm(2, act=gelu, affine=false)(4 x 32)/forward/CPU/2 thread(s)
1083
ns1083
ns1
batchnorm(2, act=gelu, affine=false)(4 x 32)/forward/CPU/4 thread(s)
1083
ns1083
ns1
batchnorm(2, act=gelu, affine=false)(4 x 32)/forward/CPU/8 thread(s)
1083
ns1084
ns1.00
batchnorm(2, act=gelu, affine=false)(4 x 32)/forward/CPU/1 thread(s)
1084
ns1084
ns1
batchnorm(2, act=gelu, affine=false)(4 x 32)/zygote/CPU/2 thread(s)
8104.5
ns8000
ns1.01
batchnorm(2, act=gelu, affine=false)(4 x 32)/zygote/CPU/4 thread(s)
8084
ns8292
ns0.97
batchnorm(2, act=gelu, affine=false)(4 x 32)/zygote/CPU/8 thread(s)
8125
ns8500
ns0.96
batchnorm(2, act=gelu, affine=false)(4 x 32)/zygote/CPU/1 thread(s)
8458
ns8125
ns1.04
bias_activation(32, act=gelu)(32 x 128)/forward/CPU/2 thread(s)
23125
ns23354.5
ns0.99
bias_activation(32, act=gelu)(32 x 128)/forward/CPU/4 thread(s)
23167
ns23250
ns1.00
bias_activation(32, act=gelu)(32 x 128)/forward/CPU/8 thread(s)
23167
ns23542
ns0.98
bias_activation(32, act=gelu)(32 x 128)/forward/CPU/1 thread(s)
23541
ns23125
ns1.02
bias_activation(32, act=gelu)(32 x 128)/zygote/CPU/2 thread(s)
52500
ns52667
ns1.00
bias_activation(32, act=gelu)(32 x 128)/zygote/CPU/4 thread(s)
52417
ns52584
ns1.00
bias_activation(32, act=gelu)(32 x 128)/zygote/CPU/8 thread(s)
52645.5
ns52750
ns1.00
bias_activation(32, act=gelu)(32 x 128)/zygote/CPU/1 thread(s)
52458
ns52417
ns1.00
groupnorm(4, act=gelu, affine=false)(16 x 16 x 32 x 32)/forward/CPU/2 thread(s)
1405062.5
ns1398084
ns1.00
groupnorm(4, act=gelu, affine=false)(16 x 16 x 32 x 32)/forward/CPU/4 thread(s)
1402583.5
ns1402791
ns1.00
groupnorm(4, act=gelu, affine=false)(16 x 16 x 32 x 32)/forward/CPU/8 thread(s)
1406875
ns1401792
ns1.00
groupnorm(4, act=gelu, affine=false)(16 x 16 x 32 x 32)/forward/CPU/1 thread(s)
1403729.5
ns1402875
ns1.00
groupnorm(4, act=gelu, affine=false)(16 x 16 x 32 x 32)/zygote/CPU/2 thread(s)
5007708
ns5010813
ns1.00
groupnorm(4, act=gelu, affine=false)(16 x 16 x 32 x 32)/zygote/CPU/4 thread(s)
5013292
ns5016584
ns1.00
groupnorm(4, act=gelu, affine=false)(16 x 16 x 32 x 32)/zygote/CPU/8 thread(s)
5046271
ns5062708
ns1.00
groupnorm(4, act=gelu, affine=false)(16 x 16 x 32 x 32)/zygote/CPU/1 thread(s)
5005125
ns5013500
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
3074708
ns3040417
ns1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2091499.5
ns2105083
ns0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2290083.5
ns2280208
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
4915708.5
ns4865521
ns1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
24422083
ns24414604.5
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
18926750
ns18876208.5
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
18059792
ns17652979
ns1.02
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
35835500.5
ns35825688
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
34039292
ns34006188
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
28325625
ns28283750
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
28468583
ns27926083.5
ns1.02
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
41461250
ns41742416.5
ns0.99
batchedmm(512, Bsize=512)/forward/CPU/2 thread(s)
144570938
ns144750166
ns1.00
batchedmm(512, Bsize=512)/forward/CPU/4 thread(s)
147768250
ns146949375
ns1.01
batchedmm(512, Bsize=512)/forward/CPU/8 thread(s)
127812375
ns126208208.5
ns1.01
batchedmm(512, Bsize=512)/forward/CPU/1 thread(s)
173201708
ns173205292
ns1.00
batchedmm(512, Bsize=512)/zygote/CPU/2 thread(s)
952803959
ns1847080125
ns0.52
batchedmm(512, Bsize=512)/zygote/CPU/4 thread(s)
1880403417
ns809911709
ns2.32
batchedmm(512, Bsize=512)/zygote/CPU/8 thread(s)
721103250
ns755677291
ns0.95
batchedmm(512, Bsize=512)/zygote/CPU/1 thread(s)
665759084
ns667449084
ns1.00
layernorm(4, act=identity, affine=true)(16 x 16 x 4 x 32)/forward/CPU/2 thread(s)
77270.5
ns76791
ns1.01
layernorm(4, act=identity, affine=true)(16 x 16 x 4 x 32)/forward/CPU/4 thread(s)
72541
ns76042
ns0.95
layernorm(4, act=identity, affine=true)(16 x 16 x 4 x 32)/forward/CPU/8 thread(s)
76166
ns76417
ns1.00
layernorm(4, act=identity, affine=true)(16 x 16 x 4 x 32)/forward/CPU/1 thread(s)
72646
ns72541
ns1.00
layernorm(4, act=identity, affine=true)(16 x 16 x 4 x 32)/zygote/CPU/2 thread(s)
291833
ns277229
ns1.05
layernorm(4, act=identity, affine=true)(16 x 16 x 4 x 32)/zygote/CPU/4 thread(s)
193625
ns193583
ns1.00
layernorm(4, act=identity, affine=true)(16 x 16 x 4 x 32)/zygote/CPU/8 thread(s)
275146
ns205417
ns1.34
layernorm(4, act=identity, affine=true)(16 x 16 x 4 x 32)/zygote/CPU/1 thread(s)
289604.5
ns303083.5
ns0.96
batchedmm(512, Bsize=128)/forward/CPU/2 thread(s)
35435979
ns35472875
ns1.00
batchedmm(512, Bsize=128)/forward/CPU/4 thread(s)
36430959
ns36379896
ns1.00
batchedmm(512, Bsize=128)/forward/CPU/8 thread(s)
32728396
ns32315333.5
ns1.01
batchedmm(512, Bsize=128)/forward/CPU/1 thread(s)
40524416
ns40618416.5
ns1.00
batchedmm(512, Bsize=128)/zygote/CPU/2 thread(s)
148443209
ns146765250
ns1.01
batchedmm(512, Bsize=128)/zygote/CPU/4 thread(s)
153839875
ns153200125
ns1.00
batchedmm(512, Bsize=128)/zygote/CPU/8 thread(s)
142207500
ns137307792
ns1.04
batchedmm(512, Bsize=128)/zygote/CPU/1 thread(s)
286559208
ns285301125
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
121670542
ns120518062.5
ns1.01
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
174360666.5
ns174031666
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
155087062.5
ns148283312.5
ns1.05
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
106968083
ns106552271
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
468237229
ns469918416
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
467305229
ns466837917
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
457270500
ns437920916.5
ns1.04
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
742197000
ns739774042
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
775778042
ns711087896
ns1.09
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
639059458
ns640897313
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
642570667
ns630411896
ns1.02
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
849532312.5
ns849787625
ns1.00
mlp7layer_bn(relu)(32 x 256)/forward/CPU/2 thread(s)
1345916
ns1302125
ns1.03
mlp7layer_bn(relu)(32 x 256)/forward/CPU/4 thread(s)
984292
ns905958
ns1.09
mlp7layer_bn(relu)(32 x 256)/forward/CPU/8 thread(s)
764770.5
ns938334
ns0.82
mlp7layer_bn(relu)(32 x 256)/forward/CPU/1 thread(s)
2095229.5
ns1987437
ns1.05
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/2 thread(s)
2954875
ns2951687.5
ns1.00
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/4 thread(s)
2619000
ns2611020.5
ns1.00
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/8 thread(s)
2499292
ns2639896
ns0.95
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/1 thread(s)
3688708.5
ns3702396
ns1.00
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/2 thread(s)
5790208
ns5801417
ns1.00
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/4 thread(s)
5791792
ns5727666.5
ns1.01
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/8 thread(s)
5888041
ns5818916
ns1.01
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/1 thread(s)
2887459
ns2913834
ns0.99
batchnorm(4, act=identity, affine=true)(16 x 16 x 4 x 32)/forward/CPU/2 thread(s)
7208
ns7417
ns0.97
batchnorm(4, act=identity, affine=true)(16 x 16 x 4 x 32)/forward/CPU/4 thread(s)
5833
ns6166
ns0.95
batchnorm(4, act=identity, affine=true)(16 x 16 x 4 x 32)/forward/CPU/8 thread(s)
5250
ns6209
ns0.85
batchnorm(4, act=identity, affine=true)(16 x 16 x 4 x 32)/forward/CPU/1 thread(s)
10125
ns10083
ns1.00
batchnorm(4, act=identity, affine=true)(16 x 16 x 4 x 32)/zygote/CPU/2 thread(s)
223354
ns212792
ns1.05
batchnorm(4, act=identity, affine=true)(16 x 16 x 4 x 32)/zygote/CPU/4 thread(s)
232209
ns220834
ns1.05
batchnorm(4, act=identity, affine=true)(16 x 16 x 4 x 32)/zygote/CPU/8 thread(s)
220729.5
ns221166
ns1.00
batchnorm(4, act=identity, affine=true)(16 x 16 x 4 x 32)/zygote/CPU/1 thread(s)
219292
ns215459
ns1.02
vgg16(32, 32, 3, 64)/forward/CPU/2 thread(s)
303148916.5
ns300445333
ns1.01
vgg16(32, 32, 3, 64)/forward/CPU/4 thread(s)
220759541.5
ns214002042
ns1.03
vgg16(32, 32, 3, 64)/forward/CPU/8 thread(s)
221905479
ns196386541
ns1.13
vgg16(32, 32, 3, 64)/forward/CPU/1 thread(s)
309164583
ns307720792
ns1.00
vgg16(32, 32, 3, 64)/zygote/CPU/2 thread(s)
1233285583
ns1232629833
ns1.00
vgg16(32, 32, 3, 64)/zygote/CPU/4 thread(s)
899326000
ns899311645.5
ns1.00
vgg16(32, 32, 3, 64)/zygote/CPU/8 thread(s)
858911520.5
ns825300584
ns1.04
vgg16(32, 32, 3, 64)/zygote/CPU/1 thread(s)
1144926250
ns1150330250
ns1.00
groupnorm(2, act=relu, affine=false)(4 x 32)/forward/CPU/2 thread(s)
4959
ns5458
ns0.91
groupnorm(2, act=relu, affine=false)(4 x 32)/forward/CPU/4 thread(s)
5209
ns5416
ns0.96
groupnorm(2, act=relu, affine=false)(4 x 32)/forward/CPU/8 thread(s)
6875
ns6750.5
ns1.02
groupnorm(2, act=relu, affine=false)(4 x 32)/forward/CPU/1 thread(s)
5125
ns5084
ns1.01
groupnorm(2, act=relu, affine=false)(4 x 32)/zygote/CPU/2 thread(s)
10333
ns7667
ns1.35
groupnorm(2, act=relu, affine=false)(4 x 32)/zygote/CPU/4 thread(s)
10209
ns7333
ns1.39
groupnorm(2, act=relu, affine=false)(4 x 32)/zygote/CPU/8 thread(s)
10375
ns7500
ns1.38
groupnorm(2, act=relu, affine=false)(4 x 32)/zygote/CPU/1 thread(s)
10583
ns7250
ns1.46
batchnorm(2, act=identity, affine=true)(32 x 32)/forward/CPU/2 thread(s)
500
ns583
ns0.86
batchnorm(2, act=identity, affine=true)(32 x 32)/forward/CPU/4 thread(s)
625
ns625
ns1
batchnorm(2, act=identity, affine=true)(32 x 32)/forward/CPU/8 thread(s)
625
ns625
ns1
batchnorm(2, act=identity, affine=true)(32 x 32)/forward/CPU/1 thread(s)
625
ns542
ns1.15
batchnorm(2, act=identity, affine=true)(32 x 32)/zygote/CPU/2 thread(s)
9125
ns9542
ns0.96
batchnorm(2, act=identity, affine=true)(32 x 32)/zygote/CPU/4 thread(s)
9208
ns9833
ns0.94
batchnorm(2, act=identity, affine=true)(32 x 32)/zygote/CPU/8 thread(s)
9209
ns9667
ns0.95
batchnorm(2, act=identity, affine=true)(32 x 32)/zygote/CPU/1 thread(s)
9417
ns9041
ns1.04
bias_activation(512, act=gelu)(512 x 128)/forward/CPU/2 thread(s)
352041
ns352562.5
ns1.00
bias_activation(512, act=gelu)(512 x 128)/forward/CPU/4 thread(s)
352167
ns351833
ns1.00
bias_activation(512, act=gelu)(512 x 128)/forward/CPU/8 thread(s)
352833
ns353416.5
ns1.00
bias_activation(512, act=gelu)(512 x 128)/forward/CPU/1 thread(s)
352250
ns366166
ns0.96
bias_activation(512, act=gelu)(512 x 128)/zygote/CPU/2 thread(s)
810042
ns826208
ns0.98
bias_activation(512, act=gelu)(512 x 128)/zygote/CPU/4 thread(s)
832334
ns775333.5
ns1.07
bias_activation(512, act=gelu)(512 x 128)/zygote/CPU/8 thread(s)
777896
ns808520.5
ns0.96
bias_activation(512, act=gelu)(512 x 128)/zygote/CPU/1 thread(s)
833959
ns828833
ns1.01
batchedmm(16, Bsize=32)/forward/CPU/2 thread(s)
339375
ns340917
ns1.00
batchedmm(16, Bsize=32)/forward/CPU/4 thread(s)
345208.5
ns342729.5
ns1.01
batchedmm(16, Bsize=32)/forward/CPU/8 thread(s)
443583
ns453708
ns0.98
batchedmm(16, Bsize=32)/forward/CPU/1 thread(s)
10500
ns10687.5
ns0.98
batchedmm(16, Bsize=32)/zygote/CPU/2 thread(s)
720437.5
ns709875
ns1.01
batchedmm(16, Bsize=32)/zygote/CPU/4 thread(s)
730000
ns728042
ns1.00
batchedmm(16, Bsize=32)/zygote/CPU/8 thread(s)
1036000
ns1005792
ns1.03
batchedmm(16, Bsize=32)/zygote/CPU/1 thread(s)
26584
ns26667
ns1.00
batchedmm(16, Bsize=128)/forward/CPU/2 thread(s)
378750
ns380187.5
ns1.00
batchedmm(16, Bsize=128)/forward/CPU/4 thread(s)
347042
ns355542
ns0.98
batchedmm(16, Bsize=128)/forward/CPU/8 thread(s)
446167
ns442146
ns1.01
batchedmm(16, Bsize=128)/forward/CPU/1 thread(s)
30208
ns30959
ns0.98
batchedmm(16, Bsize=128)/zygote/CPU/2 thread(s)
736541
ns726667
ns1.01
batchedmm(16, Bsize=128)/zygote/CPU/4 thread(s)
781270.5
ns778791.5
ns1.00
batchedmm(16, Bsize=128)/zygote/CPU/8 thread(s)
1066792
ns1034042
ns1.03
batchedmm(16, Bsize=128)/zygote/CPU/1 thread(s)
104812.5
ns105042
ns1.00
bias_activation(2, act=tanh)(2 x 128)/forward/CPU/2 thread(s)
3375
ns3583
ns0.94
bias_activation(2, act=tanh)(2 x 128)/forward/CPU/4 thread(s)
3458
ns3542
ns0.98
bias_activation(2, act=tanh)(2 x 128)/forward/CPU/8 thread(s)
3709
ns3708
ns1.00
bias_activation(2, act=tanh)(2 x 128)/forward/CPU/1 thread(s)
3625
ns3542
ns1.02
bias_activation(2, act=tanh)(2 x 128)/zygote/CPU/2 thread(s)
4167
ns4583
ns0.91
bias_activation(2, act=tanh)(2 x 128)/zygote/CPU/4 thread(s)
4208
ns4333
ns0.97
bias_activation(2, act=tanh)(2 x 128)/zygote/CPU/8 thread(s)
4250
ns4375
ns0.97
bias_activation(2, act=tanh)(2 x 128)/zygote/CPU/1 thread(s)
4291
ns4167
ns1.03
layernorm(2, act=identity, affine=true)(4 x 32)/forward/CPU/2 thread(s)
3625
ns3833
ns0.95
layernorm(2, act=identity, affine=true)(4 x 32)/forward/CPU/4 thread(s)
3375
ns3542
ns0.95
layernorm(2, act=identity, affine=true)(4 x 32)/forward/CPU/8 thread(s)
4437.5
ns4292
ns1.03
layernorm(2, act=identity, affine=true)(4 x 32)/forward/CPU/1 thread(s)
3708
ns3500
ns1.06
layernorm(2, act=identity, affine=true)(4 x 32)/zygote/CPU/2 thread(s)
8375
ns8334
ns1.00
layernorm(2, act=identity, affine=true)(4 x 32)/zygote/CPU/4 thread(s)
8208
ns8334
ns0.98
layernorm(2, act=identity, affine=true)(4 x 32)/zygote/CPU/8 thread(s)
8583
ns8708
ns0.99
layernorm(2, act=identity, affine=true)(4 x 32)/zygote/CPU/1 thread(s)
8542
ns8625
ns0.99
batchnorm(4, act=gelu, affine=false)(16 x 16 x 4 x 32)/forward/CPU/2 thread(s)
205167
ns203709
ns1.01
batchnorm(4, act=gelu, affine=false)(16 x 16 x 4 x 32)/forward/CPU/4 thread(s)
209208
ns209833
ns1.00
batchnorm(4, act=gelu, affine=false)(16 x 16 x 4 x 32)/forward/CPU/8 thread(s)
208833
ns213750
ns0.98
batchnorm(4, act=gelu, affine=false)(16 x 16 x 4 x 32)/forward/CPU/1 thread(s)
199083
ns200750
ns0.99
batchnorm(4, act=gelu, affine=false)(16 x 16 x 4 x 32)/zygote/CPU/2 thread(s)
606958
ns611979.5
ns0.99
batchnorm(4, act=gelu, affine=false)(16 x 16 x 4 x 32)/zygote/CPU/4 thread(s)
671708
ns623084
ns1.08
batchnorm(4, act=gelu, affine=false)(16 x 16 x 4 x 32)/zygote/CPU/8 thread(s)
624000
ns633542
ns0.98
batchnorm(4, act=gelu, affine=false)(16 x 16 x 4 x 32)/zygote/CPU/1 thread(s)
633208
ns630833
ns1.00
batchedmm(128, Bsize=128)/forward/CPU/2 thread(s)
996958.5
ns991250
ns1.01
batchedmm(128, Bsize=128)/forward/CPU/4 thread(s)
1038063
ns1017458.5
ns1.02
batchedmm(128, Bsize=128)/forward/CPU/8 thread(s)
970916.5
ns954833
ns1.02
batchedmm(128, Bsize=128)/forward/CPU/1 thread(s)
870270.5
ns864916.5
ns1.01
batchedmm(128, Bsize=128)/zygote/CPU/2 thread(s)
4514312
ns4517208
ns1.00
batchedmm(128, Bsize=128)/zygote/CPU/4 thread(s)
4740687.5
ns4768041
ns0.99
batchedmm(128, Bsize=128)/zygote/CPU/8 thread(s)
4626625
ns4459667
ns1.04
batchedmm(128, Bsize=128)/zygote/CPU/1 thread(s)
4278333
ns4281312
ns1.00
layernorm(2, act=identity, affine=false)(4 x 32)/forward/CPU/2 thread(s)
3083
ns3625
ns0.85
layernorm(2, act=identity, affine=false)(4 x 32)/forward/CPU/4 thread(s)
3209
ns3291
ns0.98
layernorm(2, act=identity, affine=false)(4 x 32)/forward/CPU/8 thread(s)
4417
ns4250
ns1.04
layernorm(2, act=identity, affine=false)(4 x 32)/forward/CPU/1 thread(s)
3458
ns3166
ns1.09
layernorm(2, act=identity, affine=false)(4 x 32)/zygote/CPU/2 thread(s)
7250
ns7500
ns0.97
layernorm(2, act=identity, affine=false)(4 x 32)/zygote/CPU/4 thread(s)
7167
ns7458
ns0.96
layernorm(2, act=identity, affine=false)(4 x 32)/zygote/CPU/8 thread(s)
7333
ns7687.5
ns0.95
layernorm(2, act=identity, affine=false)(4 x 32)/zygote/CPU/1 thread(s)
7541
ns7084
ns1.06
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
1650062.5
ns1644333
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1162479.5
ns1183209
ns0.98
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1343562.5
ns1370292
ns0.98
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
2474584
ns2475167
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
12306500
ns12346958.5
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
9576334
ns9593646
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
9347167
ns9292209
ns1.01
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
18004520.5
ns17963583.5
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
17357042
ns17361375
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
14404458
ns14393542
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
14505083.5
ns14339750
ns1.01
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
21117625
ns21095083
ns1.00
groupnorm(4, act=relu, affine=true)(16 x 16 x 32 x 32)/forward/CPU/2 thread(s)
88584
ns88167
ns1.00
groupnorm(4, act=relu, affine=true)(16 x 16 x 32 x 32)/forward/CPU/4 thread(s)
89416.5
ns88875
ns1.01
groupnorm(4, act=relu, affine=true)(16 x 16 x 32 x 32)/forward/CPU/8 thread(s)
91000
ns91875
ns0.99
groupnorm(4, act=relu, affine=true)(16 x 16 x 32 x 32)/forward/CPU/1 thread(s)
116312.5
ns134020.5
ns0.87
groupnorm(4, act=relu, affine=true)(16 x 16 x 32 x 32)/zygote/CPU/2 thread(s)
2027750
ns2027813
ns1.00
groupnorm(4, act=relu, affine=true)(16 x 16 x 32 x 32)/zygote/CPU/4 thread(s)
2156354
ns2027000.5
ns1.06
groupnorm(4, act=relu, affine=true)(16 x 16 x 32 x 32)/zygote/CPU/8 thread(s)
1755083
ns2054000
ns0.85
groupnorm(4, act=relu, affine=true)(16 x 16 x 32 x 32)/zygote/CPU/1 thread(s)
2022583
ns2028125
ns1.00
batchedmm(2, Bsize=4)/forward/CPU/2 thread(s)
3416
ns2792
ns1.22
batchedmm(2, Bsize=4)/forward/CPU/4 thread(s)
2792
ns2583
ns1.08
batchedmm(2, Bsize=4)/forward/CPU/8 thread(s)
2021
ns3458
ns0.58
batchedmm(2, Bsize=4)/forward/CPU/1 thread(s)
3459
ns1917
ns1.80
batchedmm(2, Bsize=4)/zygote/CPU/2 thread(s)
2750
ns2709
ns1.02
batchedmm(2, Bsize=4)/zygote/CPU/4 thread(s)
3042
ns2792
ns1.09
batchedmm(2, Bsize=4)/zygote/CPU/8 thread(s)
3083
ns2792
ns1.10
batchedmm(2, Bsize=4)/zygote/CPU/1 thread(s)
3084
ns2833.5
ns1.09
batchnorm(4, act=identity, affine=false)(16 x 16 x 4 x 32)/forward/CPU/2 thread(s)
7209
ns7375
ns0.98
batchnorm(4, act=identity, affine=false)(16 x 16 x 4 x 32)/forward/CPU/4 thread(s)
6041
ns6041
ns1
batchnorm(4, act=identity, affine=false)(16 x 16 x 4 x 32)/forward/CPU/8 thread(s)
5333
ns6167
ns0.86
batchnorm(4, act=identity, affine=false)(16 x 16 x 4 x 32)/forward/CPU/1 thread(s)
10083
ns10125
ns1.00
batchnorm(4, act=identity, affine=false)(16 x 16 x 4 x 32)/zygote/CPU/2 thread(s)
214125
ns242958
ns0.88
batchnorm(4, act=identity, affine=false)(16 x 16 x 4 x 32)/zygote/CPU/4 thread(s)
229084
ns220917
ns1.04
batchnorm(4, act=identity, affine=false)(16 x 16 x 4 x 32)/zygote/CPU/8 thread(s)
223791.5
ns220417
ns1.02
batchnorm(4, act=identity, affine=false)(16 x 16 x 4 x 32)/zygote/CPU/1 thread(s)
221708
ns240375
ns0.92
dense(32, bias=true, act=identity)(32 x 128)/forward/CPU/2 thread(s)
3708
ns3709
ns1.00
dense(32, bias=true, act=identity)(32 x 128)/forward/CPU/4 thread(s)
3792
ns3791
ns1.00
dense(32, bias=true, act=identity)(32 x 128)/forward/CPU/8 thread(s)
3791
ns3750
ns1.01
dense(32, bias=true, act=identity)(32 x 128)/forward/CPU/1 thread(s)
3708
ns3708
ns1
dense(32, bias=true, act=identity)(32 x 128)/zygote/CPU/2 thread(s)
14584
ns14584
ns1
dense(32, bias=true, act=identity)(32 x 128)/zygote/CPU/4 thread(s)
14458
ns14542
ns0.99
dense(32, bias=true, act=identity)(32 x 128)/zygote/CPU/8 thread(s)
14292
ns14584
ns0.98
dense(32, bias=true, act=identity)(32 x 128)/zygote/CPU/1 thread(s)
14583
ns14417
ns1.01
groupnorm(4, act=identity, affine=true)(16 x 16 x 32 x 32)/forward/CPU/2 thread(s)
96000
ns92125
ns1.04
groupnorm(4, act=identity, affine=true)(16 x 16 x 32 x 32)/forward/CPU/4 thread(s)
91334
ns92458
ns0.99
groupnorm(4, act=identity, affine=true)(16 x 16 x 32 x 32)/forward/CPU/8 thread(s)
94166.5
ns98562.5
ns0.96
groupnorm(4, act=identity, affine=true)(16 x 16 x 32 x 32)/forward/CPU/1 thread(s)
137583
ns118229
ns1.16
groupnorm(4, act=identity, affine=true)(16 x 16 x 32 x 32)/zygote/CPU/2 thread(s)
1927479
ns1913333
ns1.01
groupnorm(4, act=identity, affine=true)(16 x 16 x 32 x 32)/zygote/CPU/4 thread(s)
1933333
ns1909771
ns1.01
groupnorm(4, act=identity, affine=true)(16 x 16 x 32 x 32)/zygote/CPU/8 thread(s)
1671542
ns1956333
ns0.85
groupnorm(4, act=identity, affine=true)(16 x 16 x 32 x 32)/zygote/CPU/1 thread(s)
1929000
ns1924333
ns1.00
lenet(28, 28, 1, 32)/forward/CPU/2 thread(s)
880583
ns879000
ns1.00
lenet(28, 28, 1, 32)/forward/CPU/4 thread(s)
820750
ns818395.5
ns1.00
lenet(28, 28, 1, 32)/forward/CPU/8 thread(s)
1161125
ns1219520.5
ns0.95
lenet(28, 28, 1, 32)/forward/CPU/1 thread(s)
964042
ns966459
ns1.00
lenet(28, 28, 1, 32)/zygote/CPU/2 thread(s)
2817062.5
ns2822917
ns1.00
lenet(28, 28, 1, 32)/zygote/CPU/4 thread(s)
2505978.5
ns2496917
ns1.00
lenet(28, 28, 1, 32)/zygote/CPU/8 thread(s)
3333708
ns3359000
ns0.99
lenet(28, 28, 1, 32)/zygote/CPU/1 thread(s)
3424937.5
ns3411333
ns1.00
groupnorm(4, act=relu, affine=false)(16 x 16 x 4 x 32)/forward/CPU/2 thread(s)
17166
ns17000
ns1.01
groupnorm(4, act=relu, affine=false)(16 x 16 x 4 x 32)/forward/CPU/4 thread(s)
15292
ns15458.5
ns0.99
groupnorm(4, act=relu, affine=false)(16 x 16 x 4 x 32)/forward/CPU/8 thread(s)
16937.5
ns19041
ns0.89
groupnorm(4, act=relu, affine=false)(16 x 16 x 4 x 32)/forward/CPU/1 thread(s)
16792
ns16875
ns1.00
groupnorm(4, act=relu, affine=false)(16 x 16 x 4 x 32)/zygote/CPU/2 thread(s)
227729.5
ns258834
ns0.88
groupnorm(4, act=relu, affine=false)(16 x 16 x 4 x 32)/zygote/CPU/4 thread(s)
260125
ns215125
ns1.21
groupnorm(4, act=relu, affine=false)(16 x 16 x 4 x 32)/zygote/CPU/8 thread(s)
216458
ns215792
ns1.00
groupnorm(4, act=relu, affine=false)(16 x 16 x 4 x 32)/zygote/CPU/1 thread(s)
259708
ns227875
ns1.14
layernorm(4, act=gelu, affine=false)(16 x 16 x 4 x 32)/forward/CPU/2 thread(s)
221208.5
ns219062.5
ns1.01
layernorm(4, act=gelu, affine=false)(16 x 16 x 4 x 32)/forward/CPU/4 thread(s)
221937
ns221375
ns1.00
layernorm(4, act=gelu, affine=false)(16 x 16 x 4 x 32)/forward/CPU/8 thread(s)
221042
ns222875
ns0.99
layernorm(4, act=gelu, affine=false)(16 x 16 x 4 x 32)/forward/CPU/1 thread(s)
221958.5
ns220791
ns1.01
layernorm(4, act=gelu, affine=false)(16 x 16 x 4 x 32)/zygote/CPU/2 thread(s)
495666
ns497625
ns1.00
layernorm(4, act=gelu, affine=false)(16 x 16 x 4 x 32)/zygote/CPU/4 thread(s)
561062.5
ns535916
ns1.05
layernorm(4, act=gelu, affine=false)(16 x 16 x 4 x 32)/zygote/CPU/8 thread(s)
501250
ns499208
ns1.00
layernorm(4, act=gelu, affine=false)(16 x 16 x 4 x 32)/zygote/CPU/1 thread(s)
572917
ns511125
ns1.12
batchedmm(16, Bsize=4)/forward/CPU/2 thread(s)
4167
ns3833.5
ns1.09
batchedmm(16, Bsize=4)/forward/CPU/4 thread(s)
3625
ns4250
ns0.85
batchedmm(16, Bsize=4)/forward/CPU/8 thread(s)
5417
ns5166.5
ns1.05
batchedmm(16, Bsize=4)/forward/CPU/1 thread(s)
3750
ns3792
ns0.99
batchedmm(16, Bsize=4)/zygote/CPU/2 thread(s)
7500
ns7542
ns0.99
batchedmm(16, Bsize=4)/zygote/CPU/4 thread(s)
7458
ns7167
ns1.04
batchedmm(16, Bsize=4)/zygote/CPU/8 thread(s)
7458
ns7542
ns0.99
batchedmm(16, Bsize=4)/zygote/CPU/1 thread(s)
7917
ns7667
ns1.03
groupnorm(4, act=identity, affine=true)(16 x 16 x 4 x 32)/forward/CPU/2 thread(s)
18625
ns18667
ns1.00
groupnorm(4, act=identity, affine=true)(16 x 16 x 4 x 32)/forward/CPU/4 thread(s)
17500
ns16708
ns1.05
groupnorm(4, act=identity, affine=true)(16 x 16 x 4 x 32)/forward/CPU/8 thread(s)
19375
ns20584
ns0.94
groupnorm(4, act=identity, affine=true)(16 x 16 x 4 x 32)/forward/CPU/1 thread(s)
18292
ns18084
ns1.01
groupnorm(4, act=identity, affine=true)(16 x 16 x 4 x 32)/zygote/CPU/2 thread(s)
223917
ns224209
ns1.00
groupnorm(4, act=identity, affine=true)(16 x 16 x 4 x 32)/zygote/CPU/4 thread(s)
229208.5
ns212687
ns1.08
groupnorm(4, act=identity, affine=true)(16 x 16 x 4 x 32)/zygote/CPU/8 thread(s)
218333
ns213167
ns1.02
groupnorm(4, act=identity, affine=true)(16 x 16 x 4 x 32)/zygote/CPU/1 thread(s)
228667
ns222979.5
ns1.03
layernorm(2, act=identity, affine=false)(32 x 32)/forward/CPU/2 thread(s)
4166
ns4250
ns0.98
layernorm(2, act=identity, affine=false)(32 x 32)/forward/CPU/4 thread(s)
4166
ns4333.5
ns0.96
layernorm(2, act=identity, affine=false)(32 x 32)/forward/CPU/8 thread(s)
5375
ns5125
ns1.05
layernorm(2, act=identity, affine=false)(32 x 32)/forward/CPU/1 thread(s)
4416
ns3875
ns1.14
layernorm(2, act=identity, affine=false)(32 x 32)/zygote/CPU/2 thread(s)
10042
ns10542
ns0.95
layernorm(2, act=identity, affine=false)(32 x 32)/zygote/CPU/4 thread(s)
9750
ns10791
ns0.90
layernorm(2, act=identity, affine=false)(32 x 32)/zygote/CPU/8 thread(s)
10417
ns10959
ns0.95
layernorm(2, act=identity, affine=false)(32 x 32)/zygote/CPU/1 thread(s)
10334
ns10333
ns1.00
layernorm(2, act=relu, affine=false)(4 x 32)/forward/CPU/2 thread(s)
3375
ns3375
ns1
layernorm(2, act=relu, affine=false)(4 x 32)/forward/CPU/4 thread(s)
2833
ns3333
ns0.85
layernorm(2, act=relu, affine=false)(4 x 32)/forward/CPU/8 thread(s)
4375
ns4042
ns1.08
layernorm(2, act=relu, affine=false)(4 x 32)/forward/CPU/1 thread(s)
2792
ns2958
ns0.94
layernorm(2, act=relu, affine=false)(4 x 32)/zygote/CPU/2 thread(s)
7083
ns7500
ns0.94
layernorm(2, act=relu, affine=false)(4 x 32)/zygote/CPU/4 thread(s)
7333
ns7750
ns0.95
layernorm(2, act=relu, affine=false)(4 x 32)/zygote/CPU/8 thread(s)
7417
ns7625
ns0.97
layernorm(2, act=relu, affine=false)(4 x 32)/zygote/CPU/1 thread(s)
7375
ns7208
ns1.02
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
23307041.5
ns23498333.5
ns0.99
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
33839458
ns34789375
ns0.97
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
40745646
ns37689958
ns1.08
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
34862708
ns34909542
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
184254354
ns184647292
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
169428437.5
ns163834583
ns1.03
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
150235166.5
ns146363541.5
ns1.03
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
273092750
ns274565083
ns0.99
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
284314042
ns278243563
ns1.02
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
259222834
ns245760791.5
ns1.05
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
233454625
ns231789354
ns1.01
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
323194834
ns324000854.5
ns1.00
groupnorm(4, act=gelu, affine=true)(16 x 16 x 4 x 32)/forward/CPU/2 thread(s)
183354.5
ns182625
ns1.00
groupnorm(4, act=gelu, affine=true)(16 x 16 x 4 x 32)/forward/CPU/4 thread(s)
182083
ns184458
ns0.99
groupnorm(4, act=gelu, affine=true)(16 x 16 x 4 x 32)/forward/CPU/8 thread(s)
185375
ns186250
ns1.00
groupnorm(4, act=gelu, affine=true)(16 x 16 x 4 x 32)/forward/CPU/1 thread(s)
183166.5
ns181875
ns1.01
groupnorm(4, act=gelu, affine=true)(16 x 16 x 4 x 32)/zygote/CPU/2 thread(s)
598042
ns628291.5
ns0.95
groupnorm(4, act=gelu, affine=true)(16 x 16 x 4 x 32)/zygote/CPU/4 thread(s)
638604
ns608229.5
ns1.05
groupnorm(4, act=gelu, affine=true)(16 x 16 x 4 x 32)/zygote/CPU/8 thread(s)
590042
ns598250
ns0.99
groupnorm(4, act=gelu, affine=true)(16 x 16 x 4 x 32)/zygote/CPU/1 thread(s)
639625
ns637791
ns1.00
batchedmm(128, Bsize=512)/forward/CPU/2 thread(s)
3814396
ns3874375
ns0.98
batchedmm(128, Bsize=512)/forward/CPU/4 thread(s)
3917959
ns3917042
ns1.00
batchedmm(128, Bsize=512)/forward/CPU/8 thread(s)
3558667
ns3534687.5
ns1.01
batchedmm(128, Bsize=512)/forward/CPU/1 thread(s)
4558792
ns4554291
ns1.00
batchedmm(128, Bsize=512)/zygote/CPU/2 thread(s)
17242875
ns17461354.5
ns0.99
batchedmm(128, Bsize=512)/zygote/CPU/4 thread(s)
17847895.5
ns17833459
ns1.00
batchedmm(128, Bsize=512)/zygote/CPU/8 thread(s)
16851208
ns16559937.5
ns1.02
batchedmm(128, Bsize=512)/zygote/CPU/1 thread(s)
19971167
ns19938750
ns1.00
batchnorm(2, act=relu, affine=false)(32 x 32)/forward/CPU/2 thread(s)
500
ns625
ns0.80
batchnorm(2, act=relu, affine=false)(32 x 32)/forward/CPU/4 thread(s)
625
ns500
ns1.25
batchnorm(2, act=relu, affine=false)(32 x 32)/forward/CPU/8 thread(s)
542
ns666
ns0.81
batchnorm(2, act=relu, affine=false)(32 x 32)/forward/CPU/1 thread(s)
667
ns583
ns1.14
batchnorm(2, act=relu, affine=false)(32 x 32)/zygote/CPU/2 thread(s)
9333
ns9292
ns1.00
batchnorm(2, act=relu, affine=false)(32 x 32)/zygote/CPU/4 thread(s)
8917
ns9458
ns0.94
batchnorm(2, act=relu, affine=false)(32 x 32)/zygote/CPU/8 thread(s)
9792
ns9375
ns1.04
batchnorm(2, act=relu, affine=false)(32 x 32)/zygote/CPU/1 thread(s)
9750
ns9187.5
ns1.06
vgg16(32, 32, 3, 128)/forward/CPU/2 thread(s)
652733938
ns651812167
ns1.00
vgg16(32, 32, 3, 128)/forward/CPU/4 thread(s)
393383500
ns390086667
ns1.01
vgg16(32, 32, 3, 128)/forward/CPU/8 thread(s)
395122417
ns327502625
ns1.21
vgg16(32, 32, 3, 128)/forward/CPU/1 thread(s)
624702084
ns747314333
ns0.84
vgg16(32, 32, 3, 128)/zygote/CPU/2 thread(s)
1882307625
ns1879705041.5
ns1.00
vgg16(32, 32, 3, 128)/zygote/CPU/4 thread(s)
1638716333.5
ns1650371917
ns0.99
vgg16(32, 32, 3, 128)/zygote/CPU/8 thread(s)
1551357292
ns1514378771
ns1.02
vgg16(32, 32, 3, 128)/zygote/CPU/1 thread(s)
2292499417
ns2204966313
ns1.04
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
1649417
ns1651458
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1198625
ns1196083
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1369208
ns1387103.5
ns0.99
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
2494208
ns2353958
ns1.06
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
12699979.5
ns12704667
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
9947354
ns9935187.5
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
9680125.5
ns9671333.5
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
18361875
ns18432334
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
17714687.5
ns17670625
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
14723938
ns14743791.5
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
14690791
ns14593292
ns1.01
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
21421188
ns21437146
ns1.00
dense(32, bias=false, act=gelu)(32 x 128)/forward/CPU/2 thread(s)
26250
ns26250
ns1
dense(32, bias=false, act=gelu)(32 x 128)/forward/CPU/4 thread(s)
26209
ns26292
ns1.00
dense(32, bias=false, act=gelu)(32 x 128)/forward/CPU/8 thread(s)
26209
ns26333
ns1.00
dense(32, bias=false, act=gelu)(32 x 128)/forward/CPU/1 thread(s)
26250
ns26292
ns1.00
dense(32, bias=false, act=gelu)(32 x 128)/zygote/CPU/2 thread(s)
67292
ns67166
ns1.00
dense(32, bias=false, act=gelu)(32 x 128)/zygote/CPU/4 thread(s)
67625
ns67208
ns1.01
dense(32, bias=false, act=gelu)(32 x 128)/zygote/CPU/8 thread(s)
67000
ns67917
ns0.99
dense(32, bias=false, act=gelu)(32 x 128)/zygote/CPU/1 thread(s)
67167
ns66958
ns1.00
batchnorm(4, act=gelu, affine=true)(16 x 16 x 4 x 32)/forward/CPU/2 thread(s)
204208
ns202875
ns1.01
batchnorm(4, act=gelu, affine=true)(16 x 16 x 4 x 32)/forward/CPU/4 thread(s)
209583
ns210375
ns1.00
batchnorm(4, act=gelu, affine=true)(16 x 16 x 4 x 32)/forward/CPU/8 thread(s)
209542
ns209916
ns1.00
batchnorm(4, act=gelu, affine=true)(16 x 16 x 4 x 32)/forward/CPU/1 thread(s)
199166
ns198750
ns1.00
batchnorm(4, act=gelu, affine=true)(16 x 16 x 4 x 32)/zygote/CPU/2 thread(s)
602458
ns645354
ns0.93
batchnorm(4, act=gelu, affine=true)(16 x 16 x 4 x 32)/zygote/CPU/4 thread(s)
626542
ns637500.5
ns0.98
batchnorm(4, act=gelu, affine=true)(16 x 16 x 4 x 32)/zygote/CPU/8 thread(s)
624687.5
ns634542
ns0.98
batchnorm(4, act=gelu, affine=true)(16 x 16 x 4 x 32)/zygote/CPU/1 thread(s)
632958
ns634250
ns1.00
layernorm(4, act=identity, affine=true)(16 x 16 x 32 x 32)/forward/CPU/2 thread(s)
656125
ns672209
ns0.98
layernorm(4, act=identity, affine=true)(16 x 16 x 32 x 32)/forward/CPU/4 thread(s)
646104
ns637917
ns1.01
layernorm(4, act=identity, affine=true)(16 x 16 x 32 x 32)/forward/CPU/8 thread(s)
546958
ns665042
ns0.82
layernorm(4, act=identity, affine=true)(16 x 16 x 32 x 32)/forward/CPU/1 thread(s)
679042
ns664917
ns1.02
layernorm(4, act=identity, affine=true)(16 x 16 x 32 x 32)/zygote/CPU/2 thread(s)
2259375
ns2224563
ns1.02
layernorm(4, act=identity, affine=true)(16 x 16 x 32 x 32)/zygote/CPU/4 thread(s)
2247416.5
ns2248771
ns1.00
layernorm(4, act=identity, affine=true)(16 x 16 x 32 x 32)/zygote/CPU/8 thread(s)
2013146
ns2241125
ns0.90
layernorm(4, act=identity, affine=true)(16 x 16 x 32 x 32)/zygote/CPU/1 thread(s)
2262166.5
ns2237000
ns1.01
groupnorm(4, act=relu, affine=true)(16 x 16 x 4 x 32)/forward/CPU/2 thread(s)
18354.5
ns17417
ns1.05
groupnorm(4, act=relu, affine=true)(16 x 16 x 4 x 32)/forward/CPU/4 thread(s)
17375
ns17333
ns1.00
groupnorm(4, act=relu, affine=true)(16 x 16 x 4 x 32)/forward/CPU/8 thread(s)
19625
ns19500
ns1.01
groupnorm(4, act=relu, affine=true)(16 x 16 x 4 x 32)/forward/CPU/1 thread(s)
18542
ns16875
ns1.10
groupnorm(4, act=relu, affine=true)(16 x 16 x 4 x 32)/zygote/CPU/2 thread(s)
259959
ns260770.5
ns1.00
groupnorm(4, act=relu, affine=true)(16 x 16 x 4 x 32)/zygote/CPU/4 thread(s)
263500
ns219458.5
ns1.20
groupnorm(4, act=relu, affine=true)(16 x 16 x 4 x 32)/zygote/CPU/8 thread(s)
221375
ns229000
ns0.97
groupnorm(4, act=relu, affine=true)(16 x 16 x 4 x 32)/zygote/CPU/1 thread(s)
261334
ns263334
ns0.99
batchnorm(2, act=relu, affine=true)(32 x 32)/forward/CPU/2 thread(s)
584
ns625
ns0.93
batchnorm(2, act=relu, affine=true)(32 x 32)/forward/CPU/4 thread(s)
625
ns666
ns0.94
batchnorm(2, act=relu, affine=true)(32 x 32)/forward/CPU/8 thread(s)
625
ns667
ns0.94
batchnorm(2, act=relu, affine=true)(32 x 32)/forward/CPU/1 thread(s)
708
ns584
ns1.21
batchnorm(2, act=relu, affine=true)(32 x 32)/zygote/CPU/2 thread(s)
10125
ns10000
ns1.01
batchnorm(2, act=relu, affine=true)(32 x 32)/zygote/CPU/4 thread(s)
9709
ns9750
ns1.00
batchnorm(2, act=relu, affine=true)(32 x 32)/zygote/CPU/8 thread(s)
10458
ns10125
ns1.03
batchnorm(2, act=relu, affine=true)(32 x 32)/zygote/CPU/1 thread(s)
10250
ns9750
ns1.05
groupnorm(2, act=identity, affine=false)(4 x 32)/forward/CPU/2 thread(s)
5500
ns5375
ns1.02
groupnorm(2, act=identity, affine=false)(4 x 32)/forward/CPU/4 thread(s)
5375
ns5625
ns0.96
groupnorm(2, act=identity, affine=false)(4 x 32)/forward/CPU/8 thread(s)
7041.5
ns6604.5
ns1.07
groupnorm(2, act=identity, affine=false)(4 x 32)/forward/CPU/1 thread(s)
5167
ns5000
ns1.03
groupnorm(2, act=identity, affine=false)(4 x 32)/zygote/CPU/2 thread(s)
7875
ns7875
ns1
groupnorm(2, act=identity, affine=false)(4 x 32)/zygote/CPU/4 thread(s)
7750
ns7292
ns1.06
groupnorm(2, act=identity, affine=false)(4 x 32)/zygote/CPU/8 thread(s)
7542
ns7687.5
ns0.98
groupnorm(2, act=identity, affine=false)(4 x 32)/zygote/CPU/1 thread(s)
7791
ns7334
ns1.06
bias_activation(2, act=gelu)(2 x 128)/forward/CPU/2 thread(s)
2041
ns2041
ns1
bias_activation(2, act=gelu)(2 x 128)/forward/CPU/4 thread(s)
1958
ns2250
ns0.87
bias_activation(2, act=gelu)(2 x 128)/forward/CPU/8 thread(s)
2209
ns2458
ns0.90
bias_activation(2, act=gelu)(2 x 128)/forward/CPU/1 thread(s)
2167
ns2084
ns1.04
bias_activation(2, act=gelu)(2 x 128)/zygote/CPU/2 thread(s)
6333
ns6542
ns0.97
bias_activation(2, act=gelu)(2 x 128)/zygote/CPU/4 thread(s)
6542
ns6458
ns1.01
bias_activation(2, act=gelu)(2 x 128)/zygote/CPU/8 thread(s)
6416
ns6708
ns0.96
bias_activation(2, act=gelu)(2 x 128)/zygote/CPU/1 thread(s)
6666
ns6541
ns1.02
bias_activation(512, act=tanh)(512 x 128)/forward/CPU/2 thread(s)
749417
ns747125
ns1.00
bias_activation(512, act=tanh)(512 x 128)/forward/CPU/4 thread(s)
746625
ns749958.5
ns1.00
bias_activation(512, act=tanh)(512 x 128)/forward/CPU/8 thread(s)
749166.5
ns747167
ns1.00
bias_activation(512, act=tanh)(512 x 128)/forward/CPU/1 thread(s)
772625
ns771333.5
ns1.00
bias_activation(512, act=tanh)(512 x 128)/zygote/CPU/2 thread(s)
792667
ns791000
ns1.00
bias_activation(512, act=tanh)(512 x 128)/zygote/CPU/4 thread(s)
792625
ns780041.5
ns1.02
bias_activation(512, act=tanh)(512 x 128)/zygote/CPU/8 thread(s)
775750
ns775416
ns1.00
bias_activation(512, act=tanh)(512 x 128)/zygote/CPU/1 thread(s)
808562.5
ns794812.5
ns1.02
batchnorm(4, act=relu, affine=false)(16 x 16 x 4 x 32)/forward/CPU/2 thread(s)
7334
ns6959
ns1.05
batchnorm(4, act=relu, affine=false)(16 x 16 x 4 x 32)/forward/CPU/4 thread(s)
5959
ns6000
ns0.99
batchnorm(4, act=relu, affine=false)(16 x 16 x 4 x 32)/forward/CPU/8 thread(s)
5333
ns6125
ns0.87
batchnorm(4, act=relu, affine=false)(16 x 16 x 4 x 32)/forward/CPU/1 thread(s)
10125
ns10167
ns1.00
batchnorm(4, act=relu, affine=false)(16 x 16 x 4 x 32)/zygote/CPU/2 thread(s)
220166
ns259750
ns0.85
batchnorm(4, act=relu, affine=false)(16 x 16 x 4 x 32)/zygote/CPU/4 thread(s)
239292
ns238854
ns1.00
batchnorm(4, act=relu, affine=false)(16 x 16 x 4 x 32)/zygote/CPU/8 thread(s)
229167
ns231104
ns0.99
batchnorm(4, act=relu, affine=false)(16 x 16 x 4 x 32)/zygote/CPU/1 thread(s)
254959
ns250208
ns1.02
layernorm(2, act=gelu, affine=false)(32 x 32)/forward/CPU/2 thread(s)
9792
ns10125
ns0.97
layernorm(2, act=gelu, affine=false)(32 x 32)/forward/CPU/4 thread(s)
10000
ns10312.5
ns0.97
layernorm(2, act=gelu, affine=false)(32 x 32)/forward/CPU/8 thread(s)
11166
ns10875
ns1.03
layernorm(2, act=gelu, affine=false)(32 x 32)/forward/CPU/1 thread(s)
9750
ns10167
ns0.96
layernorm(2, act=gelu, affine=false)(32 x 32)/zygote/CPU/2 thread(s)
24541
ns24167
ns1.02
layernorm(2, act=gelu, affine=false)(32 x 32)/zygote/CPU/4 thread(s)
24291
ns24583
ns0.99
layernorm(2, act=gelu, affine=false)(32 x 32)/zygote/CPU/8 thread(s)
24917
ns25333
ns0.98
layernorm(2, act=gelu, affine=false)(32 x 32)/zygote/CPU/1 thread(s)
24625
ns24584
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
105924583
ns106104729.5
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
116546459
ns117502187.5
ns0.99
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
124211854
ns120758625
ns1.03
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
117471395.5
ns117423500
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
393647209
ns392280708
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
356631062.5
ns358697709
ns0.99
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
357758708
ns357440917
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
619205000
ns540821208.5
ns1.14
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
612150166
ns781416292
ns0.78
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
766180166.5
ns760831458
ns1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
749713459
ns750885583.5
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
785793916
ns784554021
ns1.00
groupnorm(2, act=identity, affine=true)(32 x 32)/forward/CPU/2 thread(s)
7000
ns7583
ns0.92
groupnorm(2, act=identity, affine=true)(32 x 32)/forward/CPU/4 thread(s)
6875
ns6875
ns1
groupnorm(2, act=identity, affine=true)(32 x 32)/forward/CPU/8 thread(s)
8625
ns8208
ns1.05
groupnorm(2, act=identity, affine=true)(32 x 32)/forward/CPU/1 thread(s)
6542
ns7917
ns0.83
groupnorm(2, act=identity, affine=true)(32 x 32)/zygote/CPU/2 thread(s)
13500
ns14542
ns0.93
groupnorm(2, act=identity, affine=true)(32 x 32)/zygote/CPU/4 thread(s)
13625
ns13667
ns1.00
groupnorm(2, act=identity, affine=true)(32 x 32)/zygote/CPU/8 thread(s)
14375
ns14125
ns1.02
groupnorm(2, act=identity, affine=true)(32 x 32)/zygote/CPU/1 thread(s)
14584
ns14375
ns1.01
groupnorm(2, act=identity, affine=false)(32 x 32)/forward/CPU/2 thread(s)
5917
ns5750
ns1.03
groupnorm(2, act=identity, affine=false)(32 x 32)/forward/CPU/4 thread(s)
5770.5
ns6125
ns0.94
groupnorm(2, act=identity, affine=false)(32 x 32)/forward/CPU/8 thread(s)
7875
ns7500
ns1.05
groupnorm(2, act=identity, affine=false)(32 x 32)/forward/CPU/1 thread(s)
5583
ns5500
ns1.02
groupnorm(2, act=identity, affine=false)(32 x 32)/zygote/CPU/2 thread(s)
13000
ns12875
ns1.01
groupnorm(2, act=identity, affine=false)(32 x 32)/zygote/CPU/4 thread(s)
12625
ns12417
ns1.02
groupnorm(2, act=identity, affine=false)(32 x 32)/zygote/CPU/8 thread(s)
12834
ns12687.5
ns1.01
groupnorm(2, act=identity, affine=false)(32 x 32)/zygote/CPU/1 thread(s)
12895.5
ns13042
ns0.99
batchedmm(2, Bsize=128)/forward/CPU/2 thread(s)
5895.5
ns5250
ns1.12
batchedmm(2, Bsize=128)/forward/CPU/4 thread(s)
5292
ns5709
ns0.93
batchedmm(2, Bsize=128)/forward/CPU/8 thread(s)
5916
ns6542
ns0.90
batchedmm(2, Bsize=128)/forward/CPU/1 thread(s)
5417
ns5375
ns1.01
batchedmm(2, Bsize=128)/zygote/CPU/2 thread(s)
15667
ns15750
ns0.99
batchedmm(2, Bsize=128)/zygote/CPU/4 thread(s)
15895.5
ns15375
ns1.03
batchedmm(2, Bsize=128)/zygote/CPU/8 thread(s)
15916
ns15584
ns1.02
batchedmm(2, Bsize=128)/zygote/CPU/1 thread(s)
16041
ns15916
ns1.01
batchnorm(2, act=identity, affine=true)(4 x 32)/forward/CPU/2 thread(s)
292
ns375
ns0.78
batchnorm(2, act=identity, affine=true)(4 x 32)/forward/CPU/4 thread(s)
375
ns417
ns0.90
batchnorm(2, act=identity, affine=true)(4 x 32)/forward/CPU/8 thread(s)
375
ns417
ns0.90
batchnorm(2, act=identity, affine=true)(4 x 32)/forward/CPU/1 thread(s)
417
ns334
ns1.25
batchnorm(2, act=identity, affine=true)(4 x 32)/zygote/CPU/2 thread(s)
6292
ns6583
ns0.96
batchnorm(2, act=identity, affine=true)(4 x 32)/zygote/CPU/4 thread(s)
6667
ns6625
ns1.01
batchnorm(2, act=identity, affine=true)(4 x 32)/zygote/CPU/8 thread(s)
6667
ns6625
ns1.01
batchnorm(2, act=identity, affine=true)(4 x 32)/zygote/CPU/1 thread(s)
6666
ns6375
ns1.05
batchnorm(2, act=gelu, affine=true)(32 x 32)/forward/CPU/2 thread(s)
5916
ns5958
ns0.99
batchnorm(2, act=gelu, affine=true)(32 x 32)/forward/CPU/4 thread(s)
5875
ns6041
ns0.97
batchnorm(2, act=gelu, affine=true)(32 x 32)/forward/CPU/8 thread(s)
5917
ns5959
ns0.99
batchnorm(2, act=gelu, affine=true)(32 x 32)/forward/CPU/1 thread(s)
6041
ns5875
ns1.03
batchnorm(2, act=gelu, affine=true)(32 x 32)/zygote/CPU/2 thread(s)
21667
ns21520.5
ns1.01
batchnorm(2, act=gelu, affine=true)(32 x 32)/zygote/CPU/4 thread(s)
21208
ns21209
ns1.00
batchnorm(2, act=gelu, affine=true)(32 x 32)/zygote/CPU/8 thread(s)
21750
ns21667
ns1.00
batchnorm(2, act=gelu, affine=true)(32 x 32)/zygote/CPU/1 thread(s)
21875
ns21334
ns1.03
layernorm(4, act=relu, affine=false)(16 x 16 x 32 x 32)/forward/CPU/2 thread(s)
144583
ns144062.5
ns1.00
layernorm(4, act=relu, affine=false)(16 x 16 x 32 x 32)/forward/CPU/4 thread(s)
162416
ns143042
ns1.14
layernorm(4, act=relu, affine=false)(16 x 16 x 32 x 32)/forward/CPU/8 thread(s)
146625
ns146334
ns1.00
layernorm(4, act=relu, affine=false)(16 x 16 x 32 x 32)/forward/CPU/1 thread(s)
187542
ns188146
ns1.00
layernorm(4, act=relu, affine=false)(16 x 16 x 32 x 32)/zygote/CPU/2 thread(s)
1319875
ns1317583
ns1.00
layernorm(4, act=relu, affine=false)(16 x 16 x 32 x 32)/zygote/CPU/4 thread(s)
1320770.5
ns1321709
ns1.00
layernorm(4, act=relu, affine=false)(16 x 16 x 32 x 32)/zygote/CPU/8 thread(s)
957604
ns1365791.5
ns0.70
layernorm(4, act=relu, affine=false)(16 x 16 x 32 x 32)/zygote/CPU/1 thread(s)
1324833
ns1318666
ns1.00
layernorm(4, act=relu, affine=false)(16 x 16 x 4 x 32)/forward/CPU/2 thread(s)
23125
ns24708
ns0.94
layernorm(4, act=relu, affine=false)(16 x 16 x 4 x 32)/forward/CPU/4 thread(s)
22437.5
ns24375
ns0.92
layernorm(4, act=relu, affine=false)(16 x 16 x 4 x 32)/forward/CPU/8 thread(s)
23854.5
ns24375
ns0.98
layernorm(4, act=relu, affine=false)(16 x 16 x 4 x 32)/forward/CPU/1 thread(s)
24396
ns22374.5
ns1.09
layernorm(4, act=relu, affine=false)(16 x 16 x 4 x 32)/zygote/CPU/2 thread(s)
129875
ns134750
ns0.96
layernorm(4, act=relu, affine=false)(16 x 16 x 4 x 32)/zygote/CPU/4 thread(s)
138125
ns181250
ns0.76
layernorm(4, act=relu, affine=false)(16 x 16 x 4 x 32)/zygote/CPU/8 thread(s)
118937.5
ns130000
ns0.91
layernorm(4, act=relu, affine=false)(16 x 16 x 4 x 32)/zygote/CPU/1 thread(s)
176083
ns130958
ns1.34
batchnorm(2, act=relu, affine=true)(4 x 32)/forward/CPU/2 thread(s)
333
ns375
ns0.89
batchnorm(2, act=relu, affine=true)(4 x 32)/forward/CPU/4 thread(s)
375
ns375
ns1
batchnorm(2, act=relu, affine=true)(4 x 32)/forward/CPU/8 thread(s)
375
ns417
ns0.90
batchnorm(2, act=relu, affine=true)(4 x 32)/forward/CPU/1 thread(s)
375
ns333
ns1.13
batchnorm(2, act=relu, affine=true)(4 x 32)/zygote/CPU/2 thread(s)
6833.5
ns6625
ns1.03
batchnorm(2, act=relu, affine=true)(4 x 32)/zygote/CPU/4 thread(s)
6708
ns6500
ns1.03
batchnorm(2, act=relu, affine=true)(4 x 32)/zygote/CPU/8 thread(s)
6667
ns6708
ns0.99
batchnorm(2, act=relu, affine=true)(4 x 32)/zygote/CPU/1 thread(s)
6917
ns6792
ns1.02
layernorm(2, act=gelu, affine=true)(4 x 32)/forward/CPU/2 thread(s)
4333.5
ns4625
ns0.94
layernorm(2, act=gelu, affine=true)(4 x 32)/forward/CPU/4 thread(s)
4292
ns4541.5
ns0.95
layernorm(2, act=gelu, affine=true)(4 x 32)/forward/CPU/8 thread(s)
5292
ns5333
ns0.99
layernorm(2, act=gelu, affine=true)(4 x 32)/forward/CPU/1 thread(s)
4042
ns4583
ns0.88
layernorm(2, act=gelu, affine=true)(4 x 32)/zygote/CPU/2 thread(s)
11542
ns9875
ns1.17
layernorm(2, act=gelu, affine=true)(4 x 32)/zygote/CPU/4 thread(s)
11958
ns9916.5
ns1.21
layernorm(2, act=gelu, affine=true)(4 x 32)/zygote/CPU/8 thread(s)
11708
ns10417
ns1.12
layernorm(2, act=gelu, affine=true)(4 x 32)/zygote/CPU/1 thread(s)
12625
ns10375
ns1.22
dense(2, bias=false, act=gelu)(2 x 128)/forward/CPU/2 thread(s)
1584
ns1625
ns0.97
dense(2, bias=false, act=gelu)(2 x 128)/forward/CPU/4 thread(s)
1583
ns1625
ns0.97
dense(2, bias=false, act=gelu)(2 x 128)/forward/CPU/8 thread(s)
1583
ns1625
ns0.97
dense(2, bias=false, act=gelu)(2 x 128)/forward/CPU/1 thread(s)
1667
ns1667
ns1
dense(2, bias=false, act=gelu)(2 x 128)/zygote/CPU/2 thread(s)
5667
ns5750
ns0.99
dense(2, bias=false, act=gelu)(2 x 128)/zygote/CPU/4 thread(s)
5625
ns5750
ns0.98
dense(2, bias=false, act=gelu)(2 x 128)/zygote/CPU/8 thread(s)
5791
ns6083
ns0.95
dense(2, bias=false, act=gelu)(2 x 128)/zygote/CPU/1 thread(s)
5791
ns5709
ns1.01
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
6893499.5
ns6814041
ns1.01
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
6374750
ns6367459
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
6500541.5
ns6578812.5
ns0.99
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
7628458
ns7695958
ns0.99
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
24057854
ns24052709
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
21255853.5
ns21310875
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
21045937.5
ns21123834
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
29752958
ns29855166.5
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
37194104
ns48838979.5
ns0.76
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
45565937.5
ns45549667
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
45856833
ns45706771
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
49410209
ns49408500
ns1.00
groupnorm(2, act=identity, affine=true)(4 x 32)/forward/CPU/2 thread(s)
5729.5
ns5875
ns0.98
groupnorm(2, act=identity, affine=true)(4 x 32)/forward/CPU/4 thread(s)
6041
ns5709
ns1.06
groupnorm(2, act=identity, affine=true)(4 x 32)/forward/CPU/8 thread(s)
7542
ns6708
ns1.12
groupnorm(2, act=identity, affine=true)(4 x 32)/forward/CPU/1 thread(s)
5583
ns5541
ns1.01
groupnorm(2, act=identity, affine=true)(4 x 32)/zygote/CPU/2 thread(s)
7812.5
ns8875
ns0.88
groupnorm(2, act=identity, affine=true)(4 x 32)/zygote/CPU/4 thread(s)
8333
ns8167
ns1.02
groupnorm(2, act=identity, affine=true)(4 x 32)/zygote/CPU/8 thread(s)
8667
ns8542
ns1.01
groupnorm(2, act=identity, affine=true)(4 x 32)/zygote/CPU/1 thread(s)
8750
ns8208
ns1.07
lenet(28, 28, 1, 128)/forward/CPU/2 thread(s)
1558521
ns1556417
ns1.00
lenet(28, 28, 1, 128)/forward/CPU/4 thread(s)
1261333
ns1270792
ns0.99
lenet(28, 28, 1, 128)/forward/CPU/8 thread(s)
1624791.5
ns1624187.5
ns1.00
lenet(28, 28, 1, 128)/forward/CPU/1 thread(s)
2151979
ns2180520.5
ns0.99
lenet(28, 28, 1, 128)/zygote/CPU/2 thread(s)
7911312.5
ns7888792
ns1.00
lenet(28, 28, 1, 128)/zygote/CPU/4 thread(s)
6595562.5
ns6591250
ns1.00
lenet(28, 28, 1, 128)/zygote/CPU/8 thread(s)
7113500.5
ns7197854
ns0.99
lenet(28, 28, 1, 128)/zygote/CPU/1 thread(s)
10486458
ns10478229.5
ns1.00
batchedmm(128, Bsize=4)/forward/CPU/2 thread(s)
370375.5
ns366500
ns1.01
batchedmm(128, Bsize=4)/forward/CPU/4 thread(s)
370334
ns371020.5
ns1.00
batchedmm(128, Bsize=4)/forward/CPU/8 thread(s)
457042
ns457708
ns1.00
batchedmm(128, Bsize=4)/forward/CPU/1 thread(s)
24083.5
ns33208.5
ns0.73
batchedmm(128, Bsize=4)/zygote/CPU/2 thread(s)
740416
ns723916.5
ns1.02
batchedmm(128, Bsize=4)/zygote/CPU/4 thread(s)
810542
ns801750
ns1.01
batchedmm(128, Bsize=4)/zygote/CPU/8 thread(s)
1091458.5
ns1064875
ns1.02
batchedmm(128, Bsize=4)/zygote/CPU/1 thread(s)
119250
ns115334
ns1.03
dense(512, bias=true, act=identity)(512 x 128)/forward/CPU/2 thread(s)
397375
ns397291
ns1.00
dense(512, bias=true, act=identity)(512 x 128)/forward/CPU/4 thread(s)
288000
ns287834
ns1.00
dense(512, bias=true, act=identity)(512 x 128)/forward/CPU/8 thread(s)
211583
ns288166
ns0.73
dense(512, bias=true, act=identity)(512 x 128)/forward/CPU/1 thread(s)
750270.5
ns750833
ns1.00
dense(512, bias=true, act=identity)(512 x 128)/zygote/CPU/2 thread(s)
673041
ns661875
ns1.02
dense(512, bias=true, act=identity)(512 x 128)/zygote/CPU/4 thread(s)
532334
ns532416
ns1.00
dense(512, bias=true, act=identity)(512 x 128)/zygote/CPU/8 thread(s)
474084
ns535458
ns0.89
dense(512, bias=true, act=identity)(512 x 128)/zygote/CPU/1 thread(s)
973792
ns973250
ns1.00
layernorm(4, act=relu, affine=true)(16 x 16 x 32 x 32)/forward/CPU/2 thread(s)
662833.5
ns670958
ns0.99
layernorm(4, act=relu, affine=true)(16 x 16 x 32 x 32)/forward/CPU/4 thread(s)
641958
ns644229
ns1.00
layernorm(4, act=relu, affine=true)(16 x 16 x 32 x 32)/forward/CPU/8 thread(s)
544334
ns680667
ns0.80
layernorm(4, act=relu, affine=true)(16 x 16 x 32 x 32)/forward/CPU/1 thread(s)
670813
ns648125
ns1.04
layernorm(4, act=relu, affine=true)(16 x 16 x 32 x 32)/zygote/CPU/2 thread(s)
2467229
ns2459333
ns1.00
layernorm(4, act=relu, affine=true)(16 x 16 x 32 x 32)/zygote/CPU/4 thread(s)
2462313
ns2456084
ns1.00
layernorm(4, act=relu, affine=true)(16 x 16 x 32 x 32)/zygote/CPU/8 thread(s)
2482583.5
ns2464542
ns1.01
layernorm(4, act=relu, affine=true)(16 x 16 x 32 x 32)/zygote/CPU/1 thread(s)
2448459
ns2456083
ns1.00
batchedmm(2, Bsize=32)/forward/CPU/2 thread(s)
3583.5
ns3708
ns0.97
batchedmm(2, Bsize=32)/forward/CPU/4 thread(s)
2687.5
ns3334
ns0.81
batchedmm(2, Bsize=32)/forward/CPU/8 thread(s)
2959
ns4334
ns0.68
batchedmm(2, Bsize=32)/forward/CPU/1 thread(s)
3833
ns2667
ns1.44
batchedmm(2, Bsize=32)/zygote/CPU/2 thread(s)
5542
ns5500
ns1.01
batchedmm(2, Bsize=32)/zygote/CPU/4 thread(s)
5792
ns5458
ns1.06
batchedmm(2, Bsize=32)/zygote/CPU/8 thread(s)
5833
ns5625
ns1.04
batchedmm(2, Bsize=32)/zygote/CPU/1 thread(s)
5833.5
ns5542
ns1.05
batchnorm(4, act=gelu, affine=true)(16 x 16 x 32 x 32)/forward/CPU/2 thread(s)
1460979.5
ns1458167
ns1.00
batchnorm(4, act=gelu, affine=true)(16 x 16 x 32 x 32)/forward/CPU/4 thread(s)
1498958
ns1500500
ns1.00
batchnorm(4, act=gelu, affine=true)(16 x 16 x 32 x 32)/forward/CPU/8 thread(s)
1492334
ns1499333
ns1.00
batchnorm(4, act=gelu, affine=true)(16 x 16 x 32 x 32)/forward/CPU/1 thread(s)
1436709
ns1437750
ns1.00
batchnorm(4, act=gelu, affine=true)(16 x 16 x 32 x 32)/zygote/CPU/2 thread(s)
5110375
ns5130750
ns1.00
batchnorm(4, act=gelu, affine=true)(16 x 16 x 32 x 32)/zygote/CPU/4 thread(s)
5286896
ns5285584
ns1.00
batchnorm(4, act=gelu, affine=true)(16 x 16 x 32 x 32)/zygote/CPU/8 thread(s)
4965208
ns5315979
ns0.93
batchnorm(4, act=gelu, affine=true)(16 x 16 x 32 x 32)/zygote/CPU/1 thread(s)
4987187.5
ns4998959
ns1.00
dense(32, bias=false, act=identity)(32 x 128)/forward/CPU/2 thread(s)
3709
ns3708
ns1.00
dense(32, bias=false, act=identity)(32 x 128)/forward/CPU/4 thread(s)
3750
ns3709
ns1.01
dense(32, bias=false, act=identity)(32 x 128)/forward/CPU/8 thread(s)
3709
ns3750
ns0.99
dense(32, bias=false, act=identity)(32 x 128)/forward/CPU/1 thread(s)
3709
ns3750
ns0.99
dense(32, bias=false, act=identity)(32 x 128)/zygote/CPU/2 thread(s)
15250
ns15375
ns0.99
dense(32, bias=false, act=identity)(32 x 128)/zygote/CPU/4 thread(s)
15375
ns15417
ns1.00
dense(32, bias=false, act=identity)(32 x 128)/zygote/CPU/8 thread(s)
15208
ns15500
ns0.98
dense(32, bias=false, act=identity)(32 x 128)/zygote/CPU/1 thread(s)
15542
ns15167
ns1.02
dense(512, bias=false, act=relu)(512 x 128)/forward/CPU/2 thread(s)
71167
ns70667
ns1.01
dense(512, bias=false, act=relu)(512 x 128)/forward/CPU/4 thread(s)
71208
ns71208
ns1
dense(512, bias=false, act=relu)(512 x 128)/forward/CPU/8 thread(s)
71125
ns71959
ns0.99
dense(512, bias=false, act=relu)(512 x 128)/forward/CPU/1 thread(s)
70145.5
ns71333
ns0.98
dense(512, bias=false, act=relu)(512 x 128)/zygote/CPU/2 thread(s)
318209
ns318500
ns1.00
dense(512, bias=false, act=relu)(512 x 128)/zygote/CPU/4 thread(s)
321166
ns318000
ns1.01
dense(512, bias=false, act=relu)(512 x 128)/zygote/CPU/8 thread(s)
331000
ns323666
ns1.02
dense(512, bias=false, act=relu)(512 x 128)/zygote/CPU/1 thread(s)
318208
ns317125
ns1.00
batchnorm(2, act=gelu, affine=true)(4 x 32)/forward/CPU/2 thread(s)
1000
ns1084
ns0.92
batchnorm(2, act=gelu, affine=true)(4 x 32)/forward/CPU/4 thread(s)
1084
ns1125
ns0.96
batchnorm(2, act=gelu, affine=true)(4 x 32)/forward/CPU/8 thread(s)
1083
ns1084
ns1.00
batchnorm(2, act=gelu, affine=true)(4 x 32)/forward/CPU/1 thread(s)
1125
ns1000
ns1.13
batchnorm(2, act=gelu, affine=true)(4 x 32)/zygote/CPU/2 thread(s)
8208
ns8458
ns0.97
batchnorm(2, act=gelu, affine=true)(4 x 32)/zygote/CPU/4 thread(s)
8333
ns8334
ns1.00
batchnorm(2, act=gelu, affine=true)(4 x 32)/zygote/CPU/8 thread(s)
8542
ns8292
ns1.03
batchnorm(2, act=gelu, affine=true)(4 x 32)/zygote/CPU/1 thread(s)
8458
ns8375
ns1.01
batchedmm(128, Bsize=32)/forward/CPU/2 thread(s)
513416.5
ns506709
ns1.01
batchedmm(128, Bsize=32)/forward/CPU/4 thread(s)
491000
ns492375
ns1.00
batchedmm(128, Bsize=32)/forward/CPU/8 thread(s)
564167
ns562708
ns1.00
batchedmm(128, Bsize=32)/forward/CPU/1 thread(s)
219125
ns222187.5
ns0.99
batchedmm(128, Bsize=32)/zygote/CPU/2 thread(s)
1389604.5
ns1387250
ns1.00
batchedmm(128, Bsize=32)/zygote/CPU/4 thread(s)
1470916.5
ns1449208
ns1.01
batchedmm(128, Bsize=32)/zygote/CPU/8 thread(s)
1739750
ns1788375
ns0.97
batchedmm(128, Bsize=32)/zygote/CPU/1 thread(s)
867042
ns865812.5
ns1.00
batchnorm(2, act=relu, affine=false)(4 x 32)/forward/CPU/2 thread(s)
333
ns375
ns0.89
batchnorm(2, act=relu, affine=false)(4 x 32)/forward/CPU/4 thread(s)
375
ns292
ns1.28
batchnorm(2, act=relu, affine=false)(4 x 32)/forward/CPU/8 thread(s)
292
ns416
ns0.70
batchnorm(2, act=relu, affine=false)(4 x 32)/forward/CPU/1 thread(s)
417
ns333
ns1.25
batchnorm(2, act=relu, affine=false)(4 x 32)/zygote/CPU/2 thread(s)
6792
ns6667
ns1.02
batchnorm(2, act=relu, affine=false)(4 x 32)/zygote/CPU/4 thread(s)
6667
ns6458
ns1.03
batchnorm(2, act=relu, affine=false)(4 x 32)/zygote/CPU/8 thread(s)
6667
ns6625
ns1.01
batchnorm(2, act=relu, affine=false)(4 x 32)/zygote/CPU/1 thread(s)
6583
ns6458
ns1.02
layernorm(4, act=gelu, affine=false)(16 x 16 x 32 x 32)/forward/CPU/2 thread(s)
1744875
ns1722042
ns1.01
layernorm(4, act=gelu, affine=false)(16 x 16 x 32 x 32)/forward/CPU/4 thread(s)
1720437.5
ns1723208.5
ns1.00
layernorm(4, act=gelu, affine=false)(16 x 16 x 32 x 32)/forward/CPU/8 thread(s)
1725229
ns1721083
ns1.00
layernorm(4, act=gelu, affine=false)(16 x 16 x 32 x 32)/forward/CPU/1 thread(s)
1774833.5
ns1723750
ns1.03
layernorm(4, act=gelu, affine=false)(16 x 16 x 32 x 32)/zygote/CPU/2 thread(s)
4362875
ns4362042
ns1.00
layernorm(4, act=gelu, affine=false)(16 x 16 x 32 x 32)/zygote/CPU/4 thread(s)
4366833.5
ns4261187.5
ns1.02
layernorm(4, act=gelu, affine=false)(16 x 16 x 32 x 32)/zygote/CPU/8 thread(s)
4017625
ns4415583.5
ns0.91
layernorm(4, act=gelu, affine=false)(16 x 16 x 32 x 32)/zygote/CPU/1 thread(s)
4360042
ns4366958.5
ns1.00
bias_activation(512, act=relu)(512 x 128)/forward/CPU/2 thread(s)
6709
ns6750
ns0.99
bias_activation(512, act=relu)(512 x 128)/forward/CPU/4 thread(s)
6541
ns6959
ns0.94
bias_activation(512, act=relu)(512 x 128)/forward/CPU/8 thread(s)
7125
ns6959
ns1.02
bias_activation(512, act=relu)(512 x 128)/forward/CPU/1 thread(s)
6896
ns6708.5
ns1.03
bias_activation(512, act=relu)(512 x 128)/zygote/CPU/2 thread(s)
32667
ns51417
ns0.64
bias_activation(512, act=relu)(512 x 128)/zygote/CPU/4 thread(s)
51125
ns32917
ns1.55
bias_activation(512, act=relu)(512 x 128)/zygote/CPU/8 thread(s)
33125
ns33333
ns0.99
bias_activation(512, act=relu)(512 x 128)/zygote/CPU/1 thread(s)
52271
ns51208.5
ns1.02
batchedmm(2, Bsize=512)/forward/CPU/2 thread(s)
18166.5
ns17542
ns1.04
batchedmm(2, Bsize=512)/forward/CPU/4 thread(s)
17500
ns17875
ns0.98
batchedmm(2, Bsize=512)/forward/CPU/8 thread(s)
18875
ns18916
ns1.00
batchedmm(2, Bsize=512)/forward/CPU/1 thread(s)
17666.5
ns17750
ns1.00
batchedmm(2, Bsize=512)/zygote/CPU/2 thread(s)
53667
ns53458
ns1.00
batchedmm(2, Bsize=512)/zygote/CPU/4 thread(s)
53584
ns53334
ns1.00
batchedmm(2, Bsize=512)/zygote/CPU/8 thread(s)
53417
ns53250
ns1.00
batchedmm(2, Bsize=512)/zygote/CPU/1 thread(s)
54000
ns53500
ns1.01
dense(512, bias=true, act=relu)(512 x 128)/forward/CPU/2 thread(s)
75334
ns75292
ns1.00
dense(512, bias=true, act=relu)(512 x 128)/forward/CPU/4 thread(s)
75375
ns75375
ns1
dense(512, bias=true, act=relu)(512 x 128)/forward/CPU/8 thread(s)
75209
ns75792
ns0.99
dense(512, bias=true, act=relu)(512 x 128)/forward/CPU/1 thread(s)
74916
ns75208
ns1.00
dense(512, bias=true, act=relu)(512 x 128)/zygote/CPU/2 thread(s)
324959
ns324375
ns1.00
dense(512, bias=true, act=relu)(512 x 128)/zygote/CPU/4 thread(s)
340167
ns327625
ns1.04
dense(512, bias=true, act=relu)(512 x 128)/zygote/CPU/8 thread(s)
336875
ns329583
ns1.02
dense(512, bias=true, act=relu)(512 x 128)/zygote/CPU/1 thread(s)
324833
ns324208
ns1.00
batchnorm(4, act=gelu, affine=false)(16 x 16 x 32 x 32)/forward/CPU/2 thread(s)
1486958
ns1484375
ns1.00
batchnorm(4, act=gelu, affine=false)(16 x 16 x 32 x 32)/forward/CPU/4 thread(s)
1526792
ns1527958
ns1.00
batchnorm(4, act=gelu, affine=false)(16 x 16 x 32 x 32)/forward/CPU/8 thread(s)
1521459
ns1527583
ns1.00
batchnorm(4, act=gelu, affine=false)(16 x 16 x 32 x 32)/forward/CPU/1 thread(s)
1463834
ns1462209
ns1.00
batchnorm(4, act=gelu, affine=false)(16 x 16 x 32 x 32)/zygote/CPU/2 thread(s)
5117062
ns5124708
ns1.00
batchnorm(4, act=gelu, affine=false)(16 x 16 x 32 x 32)/zygote/CPU/4 thread(s)
5294604
ns5280333
ns1.00
batchnorm(4, act=gelu, affine=false)(16 x 16 x 32 x 32)/zygote/CPU/8 thread(s)
4960833
ns5332500
ns0.93
batchnorm(4, act=gelu, affine=false)(16 x 16 x 32 x 32)/zygote/CPU/1 thread(s)
4987709
ns4985875
ns1.00
dense(32, bias=true, act=gelu)(32 x 128)/forward/CPU/2 thread(s)
28167
ns28250
ns1.00
dense(32, bias=true, act=gelu)(32 x 128)/forward/CPU/4 thread(s)
28167
ns28291
ns1.00
dense(32, bias=true, act=gelu)(32 x 128)/forward/CPU/8 thread(s)
28292
ns28333
ns1.00
dense(32, bias=true, act=gelu)(32 x 128)/forward/CPU/1 thread(s)
28292
ns28291
ns1.00
dense(32, bias=true, act=gelu)(32 x 128)/zygote/CPU/2 thread(s)
66333
ns66459
ns1.00
dense(32, bias=true, act=gelu)(32 x 128)/zygote/CPU/4 thread(s)
66833
ns66458
ns1.01
dense(32, bias=true, act=gelu)(32 x 128)/zygote/CPU/8 thread(s)
66500
ns66833
ns1.00
dense(32, bias=true, act=gelu)(32 x 128)/zygote/CPU/1 thread(s)
66459
ns66416
ns1.00
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/2 thread(s)
1395354
ns1501229
ns0.93
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/4 thread(s)
1059146
ns1127563
ns0.94
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/8 thread(s)
814208
ns1119291.5
ns0.73
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/1 thread(s)
2269396
ns2246375
ns1.01
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/2 thread(s)
3090979
ns3082875
ns1.00
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/4 thread(s)
2740854.5
ns2738375
ns1.00
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/8 thread(s)
2544104.5
ns2760354
ns0.92
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/1 thread(s)
3812666
ns3780667
ns1.01
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/2 thread(s)
7882104
ns7895333
ns1.00
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/4 thread(s)
7902666.5
ns7893459
ns1.00
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/8 thread(s)
8008791.5
ns7944812.5
ns1.01
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/1 thread(s)
4806271
ns4834521
ns0.99
groupnorm(4, act=relu, affine=false)(16 x 16 x 32 x 32)/forward/CPU/2 thread(s)
81167
ns80959
ns1.00
groupnorm(4, act=relu, affine=false)(16 x 16 x 32 x 32)/forward/CPU/4 thread(s)
83208.5
ns80333
ns1.04
groupnorm(4, act=relu, affine=false)(16 x 16 x 32 x 32)/forward/CPU/8 thread(s)
81979.5
ns82166
ns1.00
groupnorm(4, act=relu, affine=false)(16 x 16 x 32 x 32)/forward/CPU/1 thread(s)
80417
ns134375.5
ns0.60
groupnorm(4, act=relu, affine=false)(16 x 16 x 32 x 32)/zygote/CPU/2 thread(s)
2017166.5
ns2014625
ns1.00
groupnorm(4, act=relu, affine=false)(16 x 16 x 32 x 32)/zygote/CPU/4 thread(s)
2013729
ns2006229
ns1.00
groupnorm(4, act=relu, affine=false)(16 x 16 x 32 x 32)/zygote/CPU/8 thread(s)
1774125
ns2047021
ns0.87
groupnorm(4, act=relu, affine=false)(16 x 16 x 32 x 32)/zygote/CPU/1 thread(s)
2014354.5
ns2022958
ns1.00
This comment was automatically generated by workflow using github-action-benchmark.