-
Notifications
You must be signed in to change notification settings - Fork 342
/
Copy path300_act3_mobilenetv3_large.log
5680 lines (5680 loc) · 705 KB
/
300_act3_mobilenetv3_large.log
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
WARNING:__main__:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
| distributed init (rank 1): env://, gpu 1
| distributed init (rank 2): env://, gpu 2
| distributed init (rank 0): env://, gpu 0
| distributed init (rank 7): env://, gpu 7
| distributed init (rank 6): env://, gpu 6
| distributed init (rank 4): env://, gpu 4
| distributed init (rank 5): env://, gpu 5
| distributed init (rank 3): env://, gpu 3
Namespace(aa='rand-m9-mstd0.5-inc1', auto_resume=True, batch_size=256, clip_grad=None, color_jitter=0.4, crop_pct=None, cutmix=0.0, cutmix_minmax=None, data_path='/data/benchmarks/ILSVRC2012_LMDB', data_set='IMNET_LMDB', device='cuda', disable_eval=False, dist_backend='nccl', dist_eval=True, dist_on_itp=False, dist_url='env://', distributed=True, drop_path=0.2, enable_wandb=False, epochs=300, eval=False, eval_data_path=None, finetune='', gpu=0, head_init_scale=1.0, imagenet_default_mean_and_std=True, input_size=224, layer_decay=1.0, layer_scale_init_value=1e-06, local_rank=-1, log_dir=None, lr=0.004, min_lr=1e-06, mixup=0.0, mixup_mode='batch', mixup_prob=1.0, mixup_switch_prob=0.5, model='convnext_tiny', model_ema=False, model_ema_decay=0.9999, model_ema_eval=False, model_ema_force_cpu=False, model_key='model|module', model_prefix='', momentum=0.9, nb_classes=1000, num_workers=10, opt='adamw', opt_betas=None, opt_eps=1e-08, output_dir='./checkpoint', pin_mem=True, project='convnext', rank=0, recount=1, remode='pixel', reprob=0.25, resplit=False, resume='', save_ckpt=True, save_ckpt_freq=1, save_ckpt_num=3, seed=0, smoothing=0.1, start_epoch=0, train_interpolation='bicubic', update_freq=2, use_amp=True, wandb_ckpt=False, warmup_epochs=20, warmup_steps=-1, weight_decay=0.05, weight_decay_end=None, world_size=8)
Transform =
RandomResizedCropAndInterpolation(size=(224, 224), scale=(0.08, 1.0), ratio=(0.75, 1.3333), interpolation=bicubic)
RandomHorizontalFlip(p=0.5)
RandAugment(n=2, ops=
AugmentOp(name=AutoContrast, p=0.5, m=9, mstd=0.5)
AugmentOp(name=Equalize, p=0.5, m=9, mstd=0.5)
AugmentOp(name=Invert, p=0.5, m=9, mstd=0.5)
AugmentOp(name=Rotate, p=0.5, m=9, mstd=0.5)
AugmentOp(name=PosterizeIncreasing, p=0.5, m=9, mstd=0.5)
AugmentOp(name=SolarizeIncreasing, p=0.5, m=9, mstd=0.5)
AugmentOp(name=SolarizeAdd, p=0.5, m=9, mstd=0.5)
AugmentOp(name=ColorIncreasing, p=0.5, m=9, mstd=0.5)
AugmentOp(name=ContrastIncreasing, p=0.5, m=9, mstd=0.5)
AugmentOp(name=BrightnessIncreasing, p=0.5, m=9, mstd=0.5)
AugmentOp(name=SharpnessIncreasing, p=0.5, m=9, mstd=0.5)
AugmentOp(name=ShearX, p=0.5, m=9, mstd=0.5)
AugmentOp(name=ShearY, p=0.5, m=9, mstd=0.5)
AugmentOp(name=TranslateXRel, p=0.5, m=9, mstd=0.5)
AugmentOp(name=TranslateYRel, p=0.5, m=9, mstd=0.5))
ToTensor()
Normalize(mean=tensor([0.4850, 0.4560, 0.4060]), std=tensor([0.2290, 0.2240, 0.2250]))
RandomErasing(p=0.25, mode=pixel, count=(1, 1))
---------------------------
reading from datapath /data/benchmarks/ILSVRC2012_LMDB
Number of the class = 1000
Transform =
Resize(size=256, interpolation=bicubic, max_size=None, antialias=None)
CenterCrop(size=(224, 224))
ToTensor()
Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))
---------------------------
reading from datapath /data/benchmarks/ILSVRC2012_LMDB
Number of the class = 1000
Sampler_train = <torch.utils.data.distributed.DistributedSampler object at 0x7fea802e9c40>
Model = MobileNetV3_Large(
(conv1): Conv2d(3, 16, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(hs1): Hardswish()
(bneck): Sequential(
(0): Block(
(conv1): Conv2d(16, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act1): ReLU(inplace=True)
(conv2): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=16, bias=False)
(bn2): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act2): ReLU(inplace=True)
(se): Identity()
(conv3): Conv2d(16, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act3): ReLU(inplace=True)
)
(1): Block(
(conv1): Conv2d(16, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act1): ReLU(inplace=True)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=64, bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act2): ReLU(inplace=True)
(se): Identity()
(conv3): Conv2d(64, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act3): ReLU(inplace=True)
(skip): Sequential(
(0): Conv2d(16, 16, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=16, bias=False)
(1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): Conv2d(16, 24, kernel_size=(1, 1), stride=(1, 1))
(3): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(2): Block(
(conv1): Conv2d(24, 72, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(72, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act1): ReLU(inplace=True)
(conv2): Conv2d(72, 72, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=72, bias=False)
(bn2): BatchNorm2d(72, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act2): ReLU(inplace=True)
(se): Identity()
(conv3): Conv2d(72, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act3): ReLU(inplace=True)
)
(3): Block(
(conv1): Conv2d(24, 72, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(72, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act1): ReLU(inplace=True)
(conv2): Conv2d(72, 72, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2), groups=72, bias=False)
(bn2): BatchNorm2d(72, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act2): ReLU(inplace=True)
(se): SeModule(
(se): Sequential(
(0): AdaptiveAvgPool2d(output_size=1)
(1): Conv2d(72, 18, kernel_size=(1, 1), stride=(1, 1), bias=False)
(2): BatchNorm2d(18, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): ReLU(inplace=True)
(4): Conv2d(18, 72, kernel_size=(1, 1), stride=(1, 1), bias=False)
(5): Hardsigmoid()
)
)
(conv3): Conv2d(72, 40, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(40, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act3): ReLU(inplace=True)
(skip): Sequential(
(0): Conv2d(24, 24, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=24, bias=False)
(1): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): Conv2d(24, 40, kernel_size=(1, 1), stride=(1, 1))
(3): BatchNorm2d(40, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(4): Block(
(conv1): Conv2d(40, 120, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(120, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act1): ReLU(inplace=True)
(conv2): Conv2d(120, 120, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=120, bias=False)
(bn2): BatchNorm2d(120, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act2): ReLU(inplace=True)
(se): SeModule(
(se): Sequential(
(0): AdaptiveAvgPool2d(output_size=1)
(1): Conv2d(120, 30, kernel_size=(1, 1), stride=(1, 1), bias=False)
(2): BatchNorm2d(30, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): ReLU(inplace=True)
(4): Conv2d(30, 120, kernel_size=(1, 1), stride=(1, 1), bias=False)
(5): Hardsigmoid()
)
)
(conv3): Conv2d(120, 40, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(40, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act3): ReLU(inplace=True)
)
(5): Block(
(conv1): Conv2d(40, 120, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(120, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act1): ReLU(inplace=True)
(conv2): Conv2d(120, 120, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=120, bias=False)
(bn2): BatchNorm2d(120, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act2): ReLU(inplace=True)
(se): SeModule(
(se): Sequential(
(0): AdaptiveAvgPool2d(output_size=1)
(1): Conv2d(120, 30, kernel_size=(1, 1), stride=(1, 1), bias=False)
(2): BatchNorm2d(30, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): ReLU(inplace=True)
(4): Conv2d(30, 120, kernel_size=(1, 1), stride=(1, 1), bias=False)
(5): Hardsigmoid()
)
)
(conv3): Conv2d(120, 40, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(40, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act3): ReLU(inplace=True)
)
(6): Block(
(conv1): Conv2d(40, 240, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(240, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act1): Hardswish()
(conv2): Conv2d(240, 240, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=240, bias=False)
(bn2): BatchNorm2d(240, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act2): Hardswish()
(se): Identity()
(conv3): Conv2d(240, 80, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(80, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act3): Hardswish()
(skip): Sequential(
(0): Conv2d(40, 40, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=40, bias=False)
(1): BatchNorm2d(40, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): Conv2d(40, 80, kernel_size=(1, 1), stride=(1, 1))
(3): BatchNorm2d(80, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(7): Block(
(conv1): Conv2d(80, 200, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(200, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act1): Hardswish()
(conv2): Conv2d(200, 200, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=200, bias=False)
(bn2): BatchNorm2d(200, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act2): Hardswish()
(se): Identity()
(conv3): Conv2d(200, 80, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(80, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act3): Hardswish()
)
(8): Block(
(conv1): Conv2d(80, 184, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(184, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act1): Hardswish()
(conv2): Conv2d(184, 184, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=184, bias=False)
(bn2): BatchNorm2d(184, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act2): Hardswish()
(se): Identity()
(conv3): Conv2d(184, 80, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(80, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act3): Hardswish()
)
(9): Block(
(conv1): Conv2d(80, 184, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(184, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act1): Hardswish()
(conv2): Conv2d(184, 184, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=184, bias=False)
(bn2): BatchNorm2d(184, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act2): Hardswish()
(se): Identity()
(conv3): Conv2d(184, 80, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(80, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act3): Hardswish()
)
(10): Block(
(conv1): Conv2d(80, 480, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(480, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act1): Hardswish()
(conv2): Conv2d(480, 480, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=480, bias=False)
(bn2): BatchNorm2d(480, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act2): Hardswish()
(se): SeModule(
(se): Sequential(
(0): AdaptiveAvgPool2d(output_size=1)
(1): Conv2d(480, 120, kernel_size=(1, 1), stride=(1, 1), bias=False)
(2): BatchNorm2d(120, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): ReLU(inplace=True)
(4): Conv2d(120, 480, kernel_size=(1, 1), stride=(1, 1), bias=False)
(5): Hardsigmoid()
)
)
(conv3): Conv2d(480, 112, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(112, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act3): Hardswish()
(skip): Sequential(
(0): Conv2d(80, 112, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(112, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(11): Block(
(conv1): Conv2d(112, 672, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(672, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act1): Hardswish()
(conv2): Conv2d(672, 672, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=672, bias=False)
(bn2): BatchNorm2d(672, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act2): Hardswish()
(se): SeModule(
(se): Sequential(
(0): AdaptiveAvgPool2d(output_size=1)
(1): Conv2d(672, 168, kernel_size=(1, 1), stride=(1, 1), bias=False)
(2): BatchNorm2d(168, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): ReLU(inplace=True)
(4): Conv2d(168, 672, kernel_size=(1, 1), stride=(1, 1), bias=False)
(5): Hardsigmoid()
)
)
(conv3): Conv2d(672, 112, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(112, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act3): Hardswish()
)
(12): Block(
(conv1): Conv2d(112, 672, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(672, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act1): Hardswish()
(conv2): Conv2d(672, 672, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2), groups=672, bias=False)
(bn2): BatchNorm2d(672, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act2): Hardswish()
(se): SeModule(
(se): Sequential(
(0): AdaptiveAvgPool2d(output_size=1)
(1): Conv2d(672, 168, kernel_size=(1, 1), stride=(1, 1), bias=False)
(2): BatchNorm2d(168, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): ReLU(inplace=True)
(4): Conv2d(168, 672, kernel_size=(1, 1), stride=(1, 1), bias=False)
(5): Hardsigmoid()
)
)
(conv3): Conv2d(672, 160, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act3): Hardswish()
(skip): Sequential(
(0): Conv2d(112, 112, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=112, bias=False)
(1): BatchNorm2d(112, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): Conv2d(112, 160, kernel_size=(1, 1), stride=(1, 1))
(3): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(13): Block(
(conv1): Conv2d(160, 672, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(672, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act1): Hardswish()
(conv2): Conv2d(672, 672, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=672, bias=False)
(bn2): BatchNorm2d(672, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act2): Hardswish()
(se): SeModule(
(se): Sequential(
(0): AdaptiveAvgPool2d(output_size=1)
(1): Conv2d(672, 168, kernel_size=(1, 1), stride=(1, 1), bias=False)
(2): BatchNorm2d(168, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): ReLU(inplace=True)
(4): Conv2d(168, 672, kernel_size=(1, 1), stride=(1, 1), bias=False)
(5): Hardsigmoid()
)
)
(conv3): Conv2d(672, 160, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act3): Hardswish()
)
(14): Block(
(conv1): Conv2d(160, 960, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(960, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act1): Hardswish()
(conv2): Conv2d(960, 960, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=960, bias=False)
(bn2): BatchNorm2d(960, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act2): Hardswish()
(se): SeModule(
(se): Sequential(
(0): AdaptiveAvgPool2d(output_size=1)
(1): Conv2d(960, 240, kernel_size=(1, 1), stride=(1, 1), bias=False)
(2): BatchNorm2d(240, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): ReLU(inplace=True)
(4): Conv2d(240, 960, kernel_size=(1, 1), stride=(1, 1), bias=False)
(5): Hardsigmoid()
)
)
(conv3): Conv2d(960, 160, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act3): Hardswish()
)
)
(conv2): Conv2d(160, 960, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn2): BatchNorm2d(960, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(hs2): Hardswish()
(gap): AdaptiveAvgPool2d(output_size=1)
(linear3): Linear(in_features=960, out_features=1280, bias=False)
(bn3): BatchNorm1d(1280, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(hs3): Hardswish()
(drop): Dropout(p=0.2, inplace=False)
(linear4): Linear(in_features=1280, out_features=1000, bias=True)
)
number of params: 5178732
LR = 0.00400000
Batch size = 4096
Update frequent = 2
Number of training examples = 1281167
Number of training training per epoch = 312
Param groups = {
"decay": {
"weight_decay": 0.05,
"params": [
"conv1.weight",
"bneck.0.conv1.weight",
"bneck.0.conv2.weight",
"bneck.0.conv3.weight",
"bneck.1.conv1.weight",
"bneck.1.conv2.weight",
"bneck.1.conv3.weight",
"bneck.1.skip.0.weight",
"bneck.1.skip.2.weight",
"bneck.2.conv1.weight",
"bneck.2.conv2.weight",
"bneck.2.conv3.weight",
"bneck.3.conv1.weight",
"bneck.3.conv2.weight",
"bneck.3.se.se.1.weight",
"bneck.3.se.se.4.weight",
"bneck.3.conv3.weight",
"bneck.3.skip.0.weight",
"bneck.3.skip.2.weight",
"bneck.4.conv1.weight",
"bneck.4.conv2.weight",
"bneck.4.se.se.1.weight",
"bneck.4.se.se.4.weight",
"bneck.4.conv3.weight",
"bneck.5.conv1.weight",
"bneck.5.conv2.weight",
"bneck.5.se.se.1.weight",
"bneck.5.se.se.4.weight",
"bneck.5.conv3.weight",
"bneck.6.conv1.weight",
"bneck.6.conv2.weight",
"bneck.6.conv3.weight",
"bneck.6.skip.0.weight",
"bneck.6.skip.2.weight",
"bneck.7.conv1.weight",
"bneck.7.conv2.weight",
"bneck.7.conv3.weight",
"bneck.8.conv1.weight",
"bneck.8.conv2.weight",
"bneck.8.conv3.weight",
"bneck.9.conv1.weight",
"bneck.9.conv2.weight",
"bneck.9.conv3.weight",
"bneck.10.conv1.weight",
"bneck.10.conv2.weight",
"bneck.10.se.se.1.weight",
"bneck.10.se.se.4.weight",
"bneck.10.conv3.weight",
"bneck.10.skip.0.weight",
"bneck.11.conv1.weight",
"bneck.11.conv2.weight",
"bneck.11.se.se.1.weight",
"bneck.11.se.se.4.weight",
"bneck.11.conv3.weight",
"bneck.12.conv1.weight",
"bneck.12.conv2.weight",
"bneck.12.se.se.1.weight",
"bneck.12.se.se.4.weight",
"bneck.12.conv3.weight",
"bneck.12.skip.0.weight",
"bneck.12.skip.2.weight",
"bneck.13.conv1.weight",
"bneck.13.conv2.weight",
"bneck.13.se.se.1.weight",
"bneck.13.se.se.4.weight",
"bneck.13.conv3.weight",
"bneck.14.conv1.weight",
"bneck.14.conv2.weight",
"bneck.14.se.se.1.weight",
"bneck.14.se.se.4.weight",
"bneck.14.conv3.weight",
"conv2.weight",
"linear3.weight",
"linear4.weight"
],
"lr_scale": 1.0
},
"no_decay": {
"weight_decay": 0.0,
"params": [
"bn1.weight",
"bn1.bias",
"bneck.0.bn1.weight",
"bneck.0.bn1.bias",
"bneck.0.bn2.weight",
"bneck.0.bn2.bias",
"bneck.0.bn3.weight",
"bneck.0.bn3.bias",
"bneck.1.bn1.weight",
"bneck.1.bn1.bias",
"bneck.1.bn2.weight",
"bneck.1.bn2.bias",
"bneck.1.bn3.weight",
"bneck.1.bn3.bias",
"bneck.1.skip.1.weight",
"bneck.1.skip.1.bias",
"bneck.1.skip.2.bias",
"bneck.1.skip.3.weight",
"bneck.1.skip.3.bias",
"bneck.2.bn1.weight",
"bneck.2.bn1.bias",
"bneck.2.bn2.weight",
"bneck.2.bn2.bias",
"bneck.2.bn3.weight",
"bneck.2.bn3.bias",
"bneck.3.bn1.weight",
"bneck.3.bn1.bias",
"bneck.3.bn2.weight",
"bneck.3.bn2.bias",
"bneck.3.se.se.2.weight",
"bneck.3.se.se.2.bias",
"bneck.3.bn3.weight",
"bneck.3.bn3.bias",
"bneck.3.skip.1.weight",
"bneck.3.skip.1.bias",
"bneck.3.skip.2.bias",
"bneck.3.skip.3.weight",
"bneck.3.skip.3.bias",
"bneck.4.bn1.weight",
"bneck.4.bn1.bias",
"bneck.4.bn2.weight",
"bneck.4.bn2.bias",
"bneck.4.se.se.2.weight",
"bneck.4.se.se.2.bias",
"bneck.4.bn3.weight",
"bneck.4.bn3.bias",
"bneck.5.bn1.weight",
"bneck.5.bn1.bias",
"bneck.5.bn2.weight",
"bneck.5.bn2.bias",
"bneck.5.se.se.2.weight",
"bneck.5.se.se.2.bias",
"bneck.5.bn3.weight",
"bneck.5.bn3.bias",
"bneck.6.bn1.weight",
"bneck.6.bn1.bias",
"bneck.6.bn2.weight",
"bneck.6.bn2.bias",
"bneck.6.bn3.weight",
"bneck.6.bn3.bias",
"bneck.6.skip.1.weight",
"bneck.6.skip.1.bias",
"bneck.6.skip.2.bias",
"bneck.6.skip.3.weight",
"bneck.6.skip.3.bias",
"bneck.7.bn1.weight",
"bneck.7.bn1.bias",
"bneck.7.bn2.weight",
"bneck.7.bn2.bias",
"bneck.7.bn3.weight",
"bneck.7.bn3.bias",
"bneck.8.bn1.weight",
"bneck.8.bn1.bias",
"bneck.8.bn2.weight",
"bneck.8.bn2.bias",
"bneck.8.bn3.weight",
"bneck.8.bn3.bias",
"bneck.9.bn1.weight",
"bneck.9.bn1.bias",
"bneck.9.bn2.weight",
"bneck.9.bn2.bias",
"bneck.9.bn3.weight",
"bneck.9.bn3.bias",
"bneck.10.bn1.weight",
"bneck.10.bn1.bias",
"bneck.10.bn2.weight",
"bneck.10.bn2.bias",
"bneck.10.se.se.2.weight",
"bneck.10.se.se.2.bias",
"bneck.10.bn3.weight",
"bneck.10.bn3.bias",
"bneck.10.skip.1.weight",
"bneck.10.skip.1.bias",
"bneck.11.bn1.weight",
"bneck.11.bn1.bias",
"bneck.11.bn2.weight",
"bneck.11.bn2.bias",
"bneck.11.se.se.2.weight",
"bneck.11.se.se.2.bias",
"bneck.11.bn3.weight",
"bneck.11.bn3.bias",
"bneck.12.bn1.weight",
"bneck.12.bn1.bias",
"bneck.12.bn2.weight",
"bneck.12.bn2.bias",
"bneck.12.se.se.2.weight",
"bneck.12.se.se.2.bias",
"bneck.12.bn3.weight",
"bneck.12.bn3.bias",
"bneck.12.skip.1.weight",
"bneck.12.skip.1.bias",
"bneck.12.skip.2.bias",
"bneck.12.skip.3.weight",
"bneck.12.skip.3.bias",
"bneck.13.bn1.weight",
"bneck.13.bn1.bias",
"bneck.13.bn2.weight",
"bneck.13.bn2.bias",
"bneck.13.se.se.2.weight",
"bneck.13.se.se.2.bias",
"bneck.13.bn3.weight",
"bneck.13.bn3.bias",
"bneck.14.bn1.weight",
"bneck.14.bn1.bias",
"bneck.14.bn2.weight",
"bneck.14.bn2.bias",
"bneck.14.se.se.2.weight",
"bneck.14.se.se.2.bias",
"bneck.14.bn3.weight",
"bneck.14.bn3.bias",
"bn2.weight",
"bn2.bias",
"bn3.weight",
"bn3.bias",
"linear4.bias"
],
"lr_scale": 1.0
}
}
Use Cosine LR scheduler
Set warmup steps = 6240
Set warmup steps = 0
Max WD = 0.0500000, Min WD = 0.0500000
criterion = LabelSmoothingCrossEntropy()
Auto resume checkpoint:
Start training for 300 epochs
Epoch: [0] [ 0/625] eta: 5:56:07 lr: 0.000000 min_lr: 0.000000 loss: 6.9095 (6.9095) class_acc: 0.0000 (0.0000) weight_decay: 0.0500 (0.0500) time: 34.1885 data: 19.7940 max mem: 6925
Epoch: [0] [200/625] eta: 0:15:18 lr: 0.000064 min_lr: 0.000064 loss: 6.8804 (6.8977) class_acc: 0.0000 (0.0015) weight_decay: 0.0500 (0.0500) grad_norm: 0.5480 (0.5217) time: 2.0561 data: 0.0005 max mem: 6925
Epoch: [0] [400/625] eta: 0:07:56 lr: 0.000128 min_lr: 0.000128 loss: 6.7886 (6.8663) class_acc: 0.0039 (0.0020) weight_decay: 0.0500 (0.0500) grad_norm: 0.9387 (0.6578) time: 2.1968 data: 0.0007 max mem: 6925
Epoch: [0] [600/625] eta: 0:00:52 lr: 0.000192 min_lr: 0.000192 loss: 6.6098 (6.8063) class_acc: 0.0078 (0.0031) weight_decay: 0.0500 (0.0500) grad_norm: 1.2534 (0.8297) time: 1.8978 data: 0.0007 max mem: 6925
Epoch: [0] [624/625] eta: 0:00:02 lr: 0.000199 min_lr: 0.000199 loss: 6.5938 (6.7985) class_acc: 0.0078 (0.0033) weight_decay: 0.0500 (0.0500) grad_norm: 1.2840 (0.8490) time: 1.0091 data: 0.0020 max mem: 6925
Epoch: [0] Total time: 0:21:35 (2.0729 s / it)
Averaged stats: lr: 0.000199 min_lr: 0.000199 loss: 6.5938 (6.7976) class_acc: 0.0078 (0.0035) weight_decay: 0.0500 (0.0500) grad_norm: 1.2840 (0.8490)
Test: [ 0/50] eta: 0:11:57 loss: 6.2253 (6.2253) acc1: 0.8000 (0.8000) acc5: 1.6000 (1.6000) time: 14.3473 data: 12.3008 max mem: 6925
Test: [10/50] eta: 0:01:27 loss: 6.2876 (6.2817) acc1: 0.8000 (1.7455) acc5: 4.0000 (4.3636) time: 2.1955 data: 1.9836 max mem: 6925
Test: [20/50] eta: 0:00:53 loss: 6.3267 (6.3014) acc1: 0.8000 (1.7524) acc5: 4.0000 (4.8000) time: 1.1468 data: 1.1177 max mem: 6925
Test: [30/50] eta: 0:00:32 loss: 6.2951 (6.2799) acc1: 0.8000 (1.7548) acc5: 6.4000 (5.4710) time: 1.2733 data: 1.2435 max mem: 6925
Test: [40/50] eta: 0:00:13 loss: 6.2815 (6.2820) acc1: 0.8000 (1.7561) acc5: 5.6000 (5.3854) time: 0.9462 data: 0.9165 max mem: 6925
Test: [49/50] eta: 0:00:01 loss: 6.2896 (6.2860) acc1: 0.8000 (1.7120) acc5: 4.0000 (5.3280) time: 0.8818 data: 0.8531 max mem: 6925
Test: Total time: 0:00:57 (1.1422 s / it)
* Acc@1 1.482 Acc@5 5.464 loss 6.276
Accuracy of the model on the 50000 test images: 1.5%
Max accuracy: 1.48%
Epoch: [1] [ 0/625] eta: 3:12:22 lr: 0.000200 min_lr: 0.000200 loss: 6.5556 (6.5556) class_acc: 0.0078 (0.0078) weight_decay: 0.0500 (0.0500) time: 18.4673 data: 15.9101 max mem: 6925
Epoch: [1] [200/625] eta: 0:13:36 lr: 0.000264 min_lr: 0.000264 loss: 6.3951 (6.4803) class_acc: 0.0195 (0.0132) weight_decay: 0.0500 (0.0500) grad_norm: 1.3826 (1.3894) time: 1.7949 data: 0.1612 max mem: 6925
Epoch: [1] [400/625] eta: 0:07:10 lr: 0.000328 min_lr: 0.000328 loss: 6.2116 (6.3916) class_acc: 0.0234 (0.0178) weight_decay: 0.0500 (0.0500) grad_norm: 1.4349 (1.3960) time: 1.9017 data: 0.0011 max mem: 6925
Epoch: [1] [600/625] eta: 0:00:48 lr: 0.000392 min_lr: 0.000392 loss: 6.0291 (6.3028) class_acc: 0.0430 (0.0230) weight_decay: 0.0500 (0.0500) grad_norm: 1.4847 (1.4249) time: 1.7438 data: 0.0008 max mem: 6925
Epoch: [1] [624/625] eta: 0:00:01 lr: 0.000399 min_lr: 0.000399 loss: 6.0516 (6.2934) class_acc: 0.0391 (0.0236) weight_decay: 0.0500 (0.0500) grad_norm: 1.5434 (1.4315) time: 0.5599 data: 0.0019 max mem: 6925
Epoch: [1] Total time: 0:19:46 (1.8979 s / it)
Averaged stats: lr: 0.000399 min_lr: 0.000399 loss: 6.0516 (6.2915) class_acc: 0.0391 (0.0231) weight_decay: 0.0500 (0.0500) grad_norm: 1.5434 (1.4315)
Test: [ 0/50] eta: 0:10:03 loss: 5.2781 (5.2781) acc1: 7.2000 (7.2000) acc5: 16.8000 (16.8000) time: 12.0729 data: 12.0404 max mem: 6925
Test: [10/50] eta: 0:01:30 loss: 5.2781 (5.2251) acc1: 8.0000 (8.6545) acc5: 22.4000 (22.9818) time: 2.2596 data: 2.2302 max mem: 6925
Test: [20/50] eta: 0:00:52 loss: 5.2521 (5.2607) acc1: 6.4000 (7.5048) acc5: 20.8000 (20.9524) time: 1.2508 data: 1.2220 max mem: 6925
Test: [30/50] eta: 0:00:30 loss: 5.2521 (5.2644) acc1: 6.4000 (7.5613) acc5: 17.6000 (20.9032) time: 1.0913 data: 1.0628 max mem: 6925
Test: [40/50] eta: 0:00:12 loss: 5.2966 (5.2780) acc1: 8.0000 (7.4927) acc5: 19.2000 (20.3317) time: 0.7001 data: 0.6713 max mem: 6925
Test: [49/50] eta: 0:00:01 loss: 5.3171 (5.2957) acc1: 6.4000 (7.2320) acc5: 19.2000 (20.0480) time: 0.6051 data: 0.5763 max mem: 6925
Test: Total time: 0:00:52 (1.0436 s / it)
* Acc@1 7.512 Acc@5 20.642 loss 5.264
Accuracy of the model on the 50000 test images: 7.5%
Max accuracy: 7.51%
Epoch: [2] [ 0/625] eta: 3:50:46 lr: 0.000400 min_lr: 0.000400 loss: 6.0365 (6.0365) class_acc: 0.0352 (0.0352) weight_decay: 0.0500 (0.0500) time: 22.1549 data: 16.8647 max mem: 6925
Epoch: [2] [200/625] eta: 0:13:42 lr: 0.000464 min_lr: 0.000464 loss: 5.8730 (5.9224) class_acc: 0.0547 (0.0480) weight_decay: 0.0500 (0.0500) grad_norm: 1.5818 (1.5650) time: 1.7536 data: 0.0009 max mem: 6925
Epoch: [2] [400/625] eta: 0:07:08 lr: 0.000528 min_lr: 0.000528 loss: 5.6966 (5.8369) class_acc: 0.0703 (0.0567) weight_decay: 0.0500 (0.0500) grad_norm: 1.5028 (1.5539) time: 1.8044 data: 0.0007 max mem: 6925
Epoch: [2] [600/625] eta: 0:00:47 lr: 0.000592 min_lr: 0.000592 loss: 5.5131 (5.7568) class_acc: 0.0859 (0.0653) weight_decay: 0.0500 (0.0500) grad_norm: 1.6248 (1.5674) time: 1.8818 data: 0.0010 max mem: 6925
Epoch: [2] [624/625] eta: 0:00:01 lr: 0.000599 min_lr: 0.000599 loss: 5.5190 (5.7481) class_acc: 0.0898 (0.0662) weight_decay: 0.0500 (0.0500) grad_norm: 1.5775 (1.5696) time: 1.1627 data: 0.0016 max mem: 6925
Epoch: [2] Total time: 0:19:30 (1.8721 s / it)
Averaged stats: lr: 0.000599 min_lr: 0.000599 loss: 5.5190 (5.7496) class_acc: 0.0898 (0.0660) weight_decay: 0.0500 (0.0500) grad_norm: 1.5775 (1.5696)
Test: [ 0/50] eta: 0:10:10 loss: 4.4641 (4.4641) acc1: 16.8000 (16.8000) acc5: 36.8000 (36.8000) time: 12.2086 data: 12.1381 max mem: 6925
Test: [10/50] eta: 0:01:23 loss: 4.4238 (4.4051) acc1: 17.6000 (17.4545) acc5: 36.8000 (37.3818) time: 2.0823 data: 2.0499 max mem: 6925
Test: [20/50] eta: 0:00:48 loss: 4.5066 (4.4858) acc1: 13.6000 (14.9714) acc5: 32.8000 (34.5905) time: 1.0871 data: 1.0582 max mem: 6925
Test: [30/50] eta: 0:00:28 loss: 4.5066 (4.4635) acc1: 13.6000 (15.3290) acc5: 35.2000 (35.5613) time: 1.0285 data: 0.9997 max mem: 6925
Test: [40/50] eta: 0:00:11 loss: 4.5031 (4.4885) acc1: 15.2000 (15.2195) acc5: 35.2000 (34.7902) time: 0.6810 data: 0.6509 max mem: 6925
Test: [49/50] eta: 0:00:00 loss: 4.5565 (4.5106) acc1: 13.6000 (15.0240) acc5: 33.6000 (34.2880) time: 0.6760 data: 0.6447 max mem: 6925
Test: Total time: 0:00:48 (0.9725 s / it)
* Acc@1 15.342 Acc@5 34.758 loss 4.479
Accuracy of the model on the 50000 test images: 15.3%
Max accuracy: 15.34%
Epoch: [3] [ 0/625] eta: 3:26:32 lr: 0.000600 min_lr: 0.000600 loss: 5.4816 (5.4816) class_acc: 0.0898 (0.0898) weight_decay: 0.0500 (0.0500) time: 19.8282 data: 19.5995 max mem: 6925
Epoch: [3] [200/625] eta: 0:13:27 lr: 0.000664 min_lr: 0.000664 loss: 5.2927 (5.4067) class_acc: 0.1055 (0.1020) weight_decay: 0.0500 (0.0500) grad_norm: 1.5359 (1.6177) time: 1.7715 data: 0.0016 max mem: 6925
Epoch: [3] [400/625] eta: 0:07:04 lr: 0.000728 min_lr: 0.000728 loss: 5.2244 (5.3416) class_acc: 0.1328 (0.1121) weight_decay: 0.0500 (0.0500) grad_norm: 1.5293 (inf) time: 1.8600 data: 0.0120 max mem: 6925
Epoch: [3] [600/625] eta: 0:00:47 lr: 0.000792 min_lr: 0.000792 loss: 5.1330 (5.2767) class_acc: 0.1406 (0.1202) weight_decay: 0.0500 (0.0500) grad_norm: 1.4537 (inf) time: 2.0165 data: 0.0015 max mem: 6925
Epoch: [3] [624/625] eta: 0:00:01 lr: 0.000799 min_lr: 0.000799 loss: 5.0808 (5.2703) class_acc: 0.1328 (0.1208) weight_decay: 0.0500 (0.0500) grad_norm: 1.4591 (inf) time: 0.6042 data: 0.0032 max mem: 6925
Epoch: [3] Total time: 0:19:34 (1.8785 s / it)
Averaged stats: lr: 0.000799 min_lr: 0.000799 loss: 5.0808 (5.2733) class_acc: 0.1328 (0.1203) weight_decay: 0.0500 (0.0500) grad_norm: 1.4591 (inf)
Test: [ 0/50] eta: 0:10:05 loss: 3.9517 (3.9517) acc1: 17.6000 (17.6000) acc5: 40.8000 (40.8000) time: 12.1128 data: 12.0784 max mem: 6925
Test: [10/50] eta: 0:01:26 loss: 3.8810 (3.7958) acc1: 24.8000 (24.1455) acc5: 48.8000 (48.5818) time: 2.1531 data: 2.1218 max mem: 6925
Test: [20/50] eta: 0:00:51 loss: 3.8732 (3.8766) acc1: 22.4000 (21.7143) acc5: 46.4000 (46.6286) time: 1.1844 data: 1.1546 max mem: 6925
Test: [30/50] eta: 0:00:30 loss: 3.8732 (3.8733) acc1: 20.8000 (22.2710) acc5: 45.6000 (46.4258) time: 1.1598 data: 1.1308 max mem: 6925
Test: [40/50] eta: 0:00:12 loss: 3.8971 (3.8955) acc1: 21.6000 (22.0683) acc5: 44.8000 (45.5805) time: 0.7770 data: 0.7478 max mem: 6925
Test: [49/50] eta: 0:00:01 loss: 4.0065 (3.9263) acc1: 19.2000 (21.5520) acc5: 40.8000 (44.7360) time: 0.6816 data: 0.6513 max mem: 6925
Test: Total time: 0:00:52 (1.0439 s / it)
* Acc@1 22.670 Acc@5 45.312 loss 3.893
Accuracy of the model on the 50000 test images: 22.7%
Max accuracy: 22.67%
Epoch: [4] [ 0/625] eta: 3:07:33 lr: 0.000800 min_lr: 0.000800 loss: 5.1065 (5.1065) class_acc: 0.1250 (0.1250) weight_decay: 0.0500 (0.0500) time: 18.0048 data: 17.7750 max mem: 6925
Epoch: [4] [200/625] eta: 0:13:52 lr: 0.000864 min_lr: 0.000864 loss: 4.9621 (5.0058) class_acc: 0.1641 (0.1548) weight_decay: 0.0500 (0.0500) grad_norm: 1.6045 (1.6961) time: 2.0155 data: 0.1118 max mem: 6925
Epoch: [4] [400/625] eta: 0:07:14 lr: 0.000928 min_lr: 0.000928 loss: 4.8520 (4.9529) class_acc: 0.1797 (0.1624) weight_decay: 0.0500 (0.0500) grad_norm: 1.4766 (1.6356) time: 2.1786 data: 0.0447 max mem: 6925
Epoch: [4] [600/625] eta: 0:00:47 lr: 0.000992 min_lr: 0.000992 loss: 4.7885 (4.9018) class_acc: 0.1797 (0.1695) weight_decay: 0.0500 (0.0500) grad_norm: 1.5506 (1.6296) time: 1.8418 data: 0.0143 max mem: 6925
Epoch: [4] [624/625] eta: 0:00:01 lr: 0.001000 min_lr: 0.001000 loss: 4.7369 (4.8964) class_acc: 0.2031 (0.1706) weight_decay: 0.0500 (0.0500) grad_norm: 1.3972 (1.6202) time: 0.7306 data: 0.0019 max mem: 6925
Epoch: [4] Total time: 0:19:43 (1.8935 s / it)
Averaged stats: lr: 0.001000 min_lr: 0.001000 loss: 4.7369 (4.8939) class_acc: 0.2031 (0.1713) weight_decay: 0.0500 (0.0500) grad_norm: 1.3972 (1.6202)
Test: [ 0/50] eta: 0:10:02 loss: 3.4860 (3.4860) acc1: 32.8000 (32.8000) acc5: 48.0000 (48.0000) time: 12.0533 data: 12.0156 max mem: 6925
Test: [10/50] eta: 0:01:21 loss: 3.3917 (3.3568) acc1: 32.0000 (31.7818) acc5: 56.8000 (55.9273) time: 2.0370 data: 2.0075 max mem: 6925
Test: [20/50] eta: 0:00:48 loss: 3.5504 (3.4944) acc1: 27.2000 (29.0286) acc5: 50.4000 (52.9524) time: 1.0812 data: 1.0525 max mem: 6925
Test: [30/50] eta: 0:00:28 loss: 3.6201 (3.4817) acc1: 25.6000 (29.3161) acc5: 49.6000 (53.0065) time: 1.1072 data: 1.0779 max mem: 6925
Test: [40/50] eta: 0:00:12 loss: 3.5172 (3.5017) acc1: 28.0000 (28.9171) acc5: 51.2000 (52.6244) time: 0.8841 data: 0.8544 max mem: 6925
Test: [49/50] eta: 0:00:01 loss: 3.6958 (3.5341) acc1: 25.6000 (28.3840) acc5: 51.2000 (52.1760) time: 0.8263 data: 0.7963 max mem: 6925
Test: Total time: 0:00:53 (1.0631 s / it)
* Acc@1 28.870 Acc@5 53.312 loss 3.489
Accuracy of the model on the 50000 test images: 28.9%
Max accuracy: 28.87%
Epoch: [5] [ 0/625] eta: 3:29:39 lr: 0.001000 min_lr: 0.001000 loss: 4.8149 (4.8149) class_acc: 0.1836 (0.1836) weight_decay: 0.0500 (0.0500) time: 20.1276 data: 16.7215 max mem: 6925
Epoch: [5] [200/625] eta: 0:13:53 lr: 0.001064 min_lr: 0.001064 loss: 4.6243 (4.6814) class_acc: 0.2070 (0.2051) weight_decay: 0.0500 (0.0500) grad_norm: 1.4361 (1.6153) time: 1.5607 data: 0.1304 max mem: 6925
Epoch: [5] [400/625] eta: 0:07:14 lr: 0.001128 min_lr: 0.001128 loss: 4.5867 (4.6457) class_acc: 0.2188 (0.2094) weight_decay: 0.0500 (0.0500) grad_norm: 1.4751 (1.5614) time: 1.9145 data: 0.0009 max mem: 6925
Epoch: [5] [600/625] eta: 0:00:48 lr: 0.001192 min_lr: 0.001192 loss: 4.4963 (4.6113) class_acc: 0.2266 (0.2151) weight_decay: 0.0500 (0.0500) grad_norm: 1.4929 (1.5797) time: 1.9518 data: 0.0007 max mem: 6925
Epoch: [5] [624/625] eta: 0:00:01 lr: 0.001200 min_lr: 0.001200 loss: 4.5370 (4.6068) class_acc: 0.2227 (0.2157) weight_decay: 0.0500 (0.0500) grad_norm: 1.7646 (1.5858) time: 1.2191 data: 0.0015 max mem: 6925
Epoch: [5] Total time: 0:19:54 (1.9105 s / it)
Averaged stats: lr: 0.001200 min_lr: 0.001200 loss: 4.5370 (4.6027) class_acc: 0.2227 (0.2161) weight_decay: 0.0500 (0.0500) grad_norm: 1.7646 (1.5858)
Test: [ 0/50] eta: 0:10:35 loss: 3.1712 (3.1712) acc1: 36.0000 (36.0000) acc5: 56.8000 (56.8000) time: 12.7045 data: 12.6276 max mem: 6925
Test: [10/50] eta: 0:01:17 loss: 3.1367 (3.0492) acc1: 37.6000 (38.9091) acc5: 59.2000 (61.5273) time: 1.9477 data: 1.9129 max mem: 6925
Test: [20/50] eta: 0:00:43 loss: 3.2614 (3.1718) acc1: 35.2000 (34.5143) acc5: 58.4000 (59.1238) time: 0.8792 data: 0.8490 max mem: 6925
Test: [30/50] eta: 0:00:24 loss: 3.3595 (3.1679) acc1: 30.4000 (34.1677) acc5: 56.0000 (58.8903) time: 0.8330 data: 0.8024 max mem: 6925
Test: [40/50] eta: 0:00:11 loss: 3.1905 (3.1928) acc1: 32.0000 (33.6781) acc5: 56.8000 (58.3220) time: 0.8307 data: 0.8005 max mem: 6925
Test: [49/50] eta: 0:00:01 loss: 3.2527 (3.2158) acc1: 30.4000 (33.2960) acc5: 56.8000 (58.1440) time: 0.6131 data: 0.5827 max mem: 6925
Test: Total time: 0:00:50 (1.0101 s / it)
* Acc@1 33.714 Acc@5 58.854 loss 3.178
Accuracy of the model on the 50000 test images: 33.7%
Max accuracy: 33.71%
Epoch: [6] [ 0/625] eta: 3:15:14 lr: 0.001200 min_lr: 0.001200 loss: 4.5212 (4.5212) class_acc: 0.2383 (0.2383) weight_decay: 0.0500 (0.0500) time: 18.7426 data: 18.5114 max mem: 6925
Epoch: [6] [200/625] eta: 0:13:40 lr: 0.001264 min_lr: 0.001264 loss: 4.4352 (4.4509) class_acc: 0.2383 (0.2428) weight_decay: 0.0500 (0.0500) grad_norm: 1.3908 (1.5878) time: 1.9018 data: 0.2037 max mem: 6925
Epoch: [6] [400/625] eta: 0:07:06 lr: 0.001328 min_lr: 0.001328 loss: 4.3493 (4.4172) class_acc: 0.2695 (0.2481) weight_decay: 0.0500 (0.0500) grad_norm: 1.4002 (1.5686) time: 1.9066 data: 0.4408 max mem: 6925
Epoch: [6] [600/625] eta: 0:00:48 lr: 0.001393 min_lr: 0.001393 loss: 4.2470 (4.3861) class_acc: 0.2539 (0.2532) weight_decay: 0.0500 (0.0500) grad_norm: 1.4963 (1.5648) time: 2.2510 data: 0.0806 max mem: 6925
Epoch: [6] [624/625] eta: 0:00:01 lr: 0.001400 min_lr: 0.001400 loss: 4.3315 (4.3834) class_acc: 0.2578 (0.2537) weight_decay: 0.0500 (0.0500) grad_norm: 1.4046 (1.5620) time: 0.7345 data: 0.0015 max mem: 6925
Epoch: [6] Total time: 0:19:34 (1.8791 s / it)
Averaged stats: lr: 0.001400 min_lr: 0.001400 loss: 4.3315 (4.3758) class_acc: 0.2578 (0.2540) weight_decay: 0.0500 (0.0500) grad_norm: 1.4046 (1.5620)
Test: [ 0/50] eta: 0:10:07 loss: 2.7027 (2.7027) acc1: 40.0000 (40.0000) acc5: 67.2000 (67.2000) time: 12.1527 data: 12.1185 max mem: 6925
Test: [10/50] eta: 0:01:18 loss: 2.7564 (2.7849) acc1: 40.0000 (41.7455) acc5: 66.4000 (65.8182) time: 1.9603 data: 1.9309 max mem: 6925
Test: [20/50] eta: 0:00:42 loss: 2.9449 (2.9259) acc1: 37.6000 (38.3619) acc5: 62.4000 (63.3524) time: 0.8751 data: 0.8456 max mem: 6925
Test: [30/50] eta: 0:00:25 loss: 2.9736 (2.9032) acc1: 36.0000 (38.7355) acc5: 62.4000 (63.8194) time: 0.8991 data: 0.8694 max mem: 6925
Test: [40/50] eta: 0:00:11 loss: 3.0083 (2.9283) acc1: 36.0000 (38.2829) acc5: 62.4000 (63.4927) time: 0.9614 data: 0.9322 max mem: 6925
Test: [49/50] eta: 0:00:01 loss: 3.0386 (2.9546) acc1: 35.2000 (37.7440) acc5: 60.8000 (63.1360) time: 0.6520 data: 0.6226 max mem: 6925
Test: Total time: 0:00:52 (1.0549 s / it)
* Acc@1 37.696 Acc@5 63.402 loss 2.937
Accuracy of the model on the 50000 test images: 37.7%
Max accuracy: 37.70%
Epoch: [7] [ 0/625] eta: 3:49:25 lr: 0.001400 min_lr: 0.001400 loss: 4.3410 (4.3410) class_acc: 0.2422 (0.2422) weight_decay: 0.0500 (0.0500) time: 22.0254 data: 16.2290 max mem: 6925
Epoch: [7] [200/625] eta: 0:14:22 lr: 0.001464 min_lr: 0.001464 loss: 4.2663 (4.2318) class_acc: 0.2812 (0.2802) weight_decay: 0.0500 (0.0500) grad_norm: 1.3494 (1.5012) time: 2.1662 data: 0.0013 max mem: 6925
Epoch: [7] [400/625] eta: 0:07:25 lr: 0.001528 min_lr: 0.001528 loss: 4.1935 (4.2155) class_acc: 0.2773 (0.2825) weight_decay: 0.0500 (0.0500) grad_norm: 1.4383 (1.5125) time: 1.9888 data: 0.0010 max mem: 6925
Epoch: [7] [600/625] eta: 0:00:49 lr: 0.001593 min_lr: 0.001593 loss: 4.0658 (4.1929) class_acc: 0.2812 (0.2859) weight_decay: 0.0500 (0.0500) grad_norm: 1.5424 (1.5101) time: 1.9448 data: 0.0010 max mem: 6925
Epoch: [7] [624/625] eta: 0:00:01 lr: 0.001600 min_lr: 0.001600 loss: 4.1433 (4.1897) class_acc: 0.3047 (0.2865) weight_decay: 0.0500 (0.0500) grad_norm: 1.4047 (1.5040) time: 0.8216 data: 0.0016 max mem: 6925
Epoch: [7] Total time: 0:19:58 (1.9184 s / it)
Averaged stats: lr: 0.001600 min_lr: 0.001600 loss: 4.1433 (4.1987) class_acc: 0.3047 (0.2852) weight_decay: 0.0500 (0.0500) grad_norm: 1.4047 (1.5040)
Test: [ 0/50] eta: 0:10:28 loss: 2.5643 (2.5643) acc1: 40.0000 (40.0000) acc5: 72.0000 (72.0000) time: 12.5791 data: 12.5481 max mem: 6925
Test: [10/50] eta: 0:01:26 loss: 2.5643 (2.6200) acc1: 43.2000 (44.6545) acc5: 72.0000 (68.4364) time: 2.1506 data: 2.1209 max mem: 6925
Test: [20/50] eta: 0:00:48 loss: 2.8511 (2.7905) acc1: 40.8000 (41.6000) acc5: 64.8000 (66.0952) time: 1.0637 data: 1.0346 max mem: 6925
Test: [30/50] eta: 0:00:27 loss: 2.9363 (2.7697) acc1: 39.2000 (41.6516) acc5: 65.6000 (66.6065) time: 0.9124 data: 0.8824 max mem: 6925
Test: [40/50] eta: 0:00:11 loss: 2.7805 (2.7972) acc1: 40.0000 (41.1902) acc5: 65.6000 (66.1268) time: 0.6348 data: 0.6038 max mem: 6925
Test: [49/50] eta: 0:00:00 loss: 2.8003 (2.8071) acc1: 39.2000 (40.8320) acc5: 64.8000 (66.1440) time: 0.5118 data: 0.4824 max mem: 6925
Test: Total time: 0:00:49 (0.9877 s / it)
* Acc@1 41.030 Acc@5 66.562 loss 2.771
Accuracy of the model on the 50000 test images: 41.0%
Max accuracy: 41.03%
Epoch: [8] [ 0/625] eta: 3:31:16 lr: 0.001600 min_lr: 0.001600 loss: 4.1364 (4.1364) class_acc: 0.2734 (0.2734) weight_decay: 0.0500 (0.0500) time: 20.2826 data: 18.7704 max mem: 6925
Epoch: [8] [200/625] eta: 0:14:18 lr: 0.001664 min_lr: 0.001664 loss: 4.0123 (4.0864) class_acc: 0.3125 (0.3047) weight_decay: 0.0500 (0.0500) grad_norm: 1.3290 (1.5126) time: 1.9716 data: 0.0476 max mem: 6925
Epoch: [8] [400/625] eta: 0:07:25 lr: 0.001728 min_lr: 0.001728 loss: 4.0074 (4.0690) class_acc: 0.3125 (0.3070) weight_decay: 0.0500 (0.0500) grad_norm: 1.3430 (1.5013) time: 1.9832 data: 0.0089 max mem: 6925
Epoch: [8] [600/625] eta: 0:00:49 lr: 0.001793 min_lr: 0.001793 loss: 4.0096 (4.0575) class_acc: 0.3125 (0.3095) weight_decay: 0.0500 (0.0500) grad_norm: 1.5426 (1.5261) time: 1.9662 data: 0.0159 max mem: 6925
Epoch: [8] [624/625] eta: 0:00:01 lr: 0.001800 min_lr: 0.001800 loss: 3.9937 (4.0541) class_acc: 0.3320 (0.3104) weight_decay: 0.0500 (0.0500) grad_norm: 1.2533 (1.5163) time: 0.8105 data: 0.0281 max mem: 6925
Epoch: [8] Total time: 0:20:06 (1.9301 s / it)
Averaged stats: lr: 0.001800 min_lr: 0.001800 loss: 3.9937 (4.0527) class_acc: 0.3320 (0.3116) weight_decay: 0.0500 (0.0500) grad_norm: 1.2533 (1.5163)
Test: [ 0/50] eta: 0:10:12 loss: 2.4263 (2.4263) acc1: 46.4000 (46.4000) acc5: 76.0000 (76.0000) time: 12.2491 data: 12.2181 max mem: 6925
Test: [10/50] eta: 0:01:18 loss: 2.4263 (2.4293) acc1: 46.4000 (46.6909) acc5: 72.8000 (72.2182) time: 1.9683 data: 1.9380 max mem: 6925
Test: [20/50] eta: 0:00:46 loss: 2.5984 (2.5959) acc1: 40.8000 (43.2762) acc5: 67.2000 (69.5619) time: 1.0158 data: 0.9863 max mem: 6925
Test: [30/50] eta: 0:00:27 loss: 2.7424 (2.5964) acc1: 40.8000 (43.7936) acc5: 67.2000 (69.3936) time: 1.0757 data: 1.0469 max mem: 6925
Test: [40/50] eta: 0:00:12 loss: 2.7092 (2.6361) acc1: 42.4000 (43.4146) acc5: 67.2000 (68.6439) time: 0.8801 data: 0.8512 max mem: 6925
Test: [49/50] eta: 0:00:01 loss: 2.6926 (2.6537) acc1: 42.4000 (43.1360) acc5: 68.0000 (68.4000) time: 0.7090 data: 0.6800 max mem: 6925
Test: Total time: 0:00:52 (1.0429 s / it)
* Acc@1 43.824 Acc@5 68.946 loss 2.614
Accuracy of the model on the 50000 test images: 43.8%
Max accuracy: 43.82%
Epoch: [9] [ 0/625] eta: 3:20:24 lr: 0.001800 min_lr: 0.001800 loss: 4.0534 (4.0534) class_acc: 0.2891 (0.2891) weight_decay: 0.0500 (0.0500) time: 19.2384 data: 18.2822 max mem: 6925
Epoch: [9] [200/625] eta: 0:13:39 lr: 0.001864 min_lr: 0.001864 loss: 3.9425 (3.9574) class_acc: 0.3320 (0.3284) weight_decay: 0.0500 (0.0500) grad_norm: 1.3525 (1.4977) time: 1.8314 data: 0.0011 max mem: 6925
Epoch: [9] [400/625] eta: 0:07:06 lr: 0.001929 min_lr: 0.001929 loss: 3.9690 (3.9474) class_acc: 0.3203 (0.3301) weight_decay: 0.0500 (0.0500) grad_norm: 1.6176 (1.4755) time: 2.0909 data: 0.0009 max mem: 6925
Epoch: [9] [600/625] eta: 0:00:47 lr: 0.001993 min_lr: 0.001993 loss: 3.9572 (3.9319) class_acc: 0.3320 (0.3334) weight_decay: 0.0500 (0.0500) grad_norm: 1.2477 (1.4769) time: 1.8500 data: 0.0145 max mem: 6925
Epoch: [9] [624/625] eta: 0:00:01 lr: 0.002000 min_lr: 0.002000 loss: 3.9135 (3.9309) class_acc: 0.3359 (0.3338) weight_decay: 0.0500 (0.0500) grad_norm: 1.3179 (1.4755) time: 0.9920 data: 0.0021 max mem: 6925
Epoch: [9] Total time: 0:19:29 (1.8710 s / it)
Averaged stats: lr: 0.002000 min_lr: 0.002000 loss: 3.9135 (3.9263) class_acc: 0.3359 (0.3354) weight_decay: 0.0500 (0.0500) grad_norm: 1.3179 (1.4755)
Test: [ 0/50] eta: 0:10:30 loss: 2.3571 (2.3571) acc1: 45.6000 (45.6000) acc5: 75.2000 (75.2000) time: 12.6090 data: 12.5540 max mem: 6925
Test: [10/50] eta: 0:01:22 loss: 2.3571 (2.3315) acc1: 48.8000 (48.5818) acc5: 74.4000 (74.1818) time: 2.0735 data: 2.0409 max mem: 6925
Test: [20/50] eta: 0:00:47 loss: 2.4959 (2.4849) acc1: 45.6000 (45.2190) acc5: 72.0000 (71.7714) time: 1.0378 data: 1.0068 max mem: 6925
Test: [30/50] eta: 0:00:27 loss: 2.6075 (2.4851) acc1: 43.2000 (45.6516) acc5: 69.6000 (71.4839) time: 1.0034 data: 0.9726 max mem: 6925
Test: [40/50] eta: 0:00:11 loss: 2.6013 (2.5173) acc1: 44.0000 (45.0927) acc5: 69.6000 (71.0244) time: 0.7082 data: 0.6790 max mem: 6925
Test: [49/50] eta: 0:00:00 loss: 2.5379 (2.5305) acc1: 43.2000 (44.8160) acc5: 70.4000 (70.8320) time: 0.5766 data: 0.5469 max mem: 6925
Test: Total time: 0:00:48 (0.9756 s / it)
* Acc@1 45.736 Acc@5 71.226 loss 2.497
Accuracy of the model on the 50000 test images: 45.7%
Max accuracy: 45.74%
Epoch: [10] [ 0/625] eta: 3:35:06 lr: 0.002000 min_lr: 0.002000 loss: 3.7979 (3.7979) class_acc: 0.3633 (0.3633) weight_decay: 0.0500 (0.0500) time: 20.6510 data: 18.7331 max mem: 6925
Epoch: [10] [200/625] eta: 0:13:40 lr: 0.002064 min_lr: 0.002064 loss: 3.8849 (3.8497) class_acc: 0.3398 (0.3498) weight_decay: 0.0500 (0.0500) grad_norm: 1.3145 (1.4611) time: 1.9134 data: 0.0010 max mem: 6925
Epoch: [10] [400/625] eta: 0:07:08 lr: 0.002129 min_lr: 0.002129 loss: 3.8394 (3.8339) class_acc: 0.3516 (0.3543) weight_decay: 0.0500 (0.0500) grad_norm: 1.3298 (inf) time: 2.0682 data: 0.0660 max mem: 6925
Epoch: [10] [600/625] eta: 0:00:47 lr: 0.002193 min_lr: 0.002193 loss: 3.7669 (3.8240) class_acc: 0.3633 (0.3557) weight_decay: 0.0500 (0.0500) grad_norm: 1.5628 (inf) time: 2.0368 data: 0.0069 max mem: 6925
Epoch: [10] [624/625] eta: 0:00:01 lr: 0.002200 min_lr: 0.002200 loss: 3.7446 (3.8211) class_acc: 0.3633 (0.3559) weight_decay: 0.0500 (0.0500) grad_norm: 1.5540 (inf) time: 0.7914 data: 0.0152 max mem: 6925
Epoch: [10] Total time: 0:19:25 (1.8641 s / it)
Averaged stats: lr: 0.002200 min_lr: 0.002200 loss: 3.7446 (3.8183) class_acc: 0.3633 (0.3560) weight_decay: 0.0500 (0.0500) grad_norm: 1.5540 (inf)
Test: [ 0/50] eta: 0:09:35 loss: 2.3656 (2.3656) acc1: 51.2000 (51.2000) acc5: 74.4000 (74.4000) time: 11.5067 data: 11.4707 max mem: 6925
Test: [10/50] eta: 0:01:17 loss: 2.3600 (2.2959) acc1: 52.0000 (51.2727) acc5: 74.4000 (75.6364) time: 1.9374 data: 1.9069 max mem: 6925
Test: [20/50] eta: 0:00:46 loss: 2.4139 (2.4155) acc1: 47.2000 (47.8476) acc5: 72.8000 (74.2095) time: 1.0475 data: 1.0167 max mem: 6925
Test: [30/50] eta: 0:00:27 loss: 2.5498 (2.4178) acc1: 44.0000 (47.7677) acc5: 72.0000 (73.4194) time: 1.0458 data: 1.0157 max mem: 6925
Test: [40/50] eta: 0:00:11 loss: 2.5523 (2.4372) acc1: 45.6000 (47.4732) acc5: 69.6000 (72.8195) time: 0.6576 data: 0.6289 max mem: 6925
Test: [49/50] eta: 0:00:00 loss: 2.4461 (2.4430) acc1: 46.4000 (47.6160) acc5: 70.4000 (72.8000) time: 0.5457 data: 0.5163 max mem: 6925
Test: Total time: 0:00:46 (0.9336 s / it)
* Acc@1 48.118 Acc@5 73.062 loss 2.402
Accuracy of the model on the 50000 test images: 48.1%
Max accuracy: 48.12%
Epoch: [11] [ 0/625] eta: 3:31:44 lr: 0.002200 min_lr: 0.002200 loss: 3.6978 (3.6978) class_acc: 0.3867 (0.3867) weight_decay: 0.0500 (0.0500) time: 20.3274 data: 18.0206 max mem: 6925
Epoch: [11] [200/625] eta: 0:13:35 lr: 0.002264 min_lr: 0.002264 loss: 3.7348 (3.7438) class_acc: 0.3633 (0.3695) weight_decay: 0.0500 (0.0500) grad_norm: 1.5929 (1.5562) time: 1.8127 data: 0.0009 max mem: 6925
Epoch: [11] [400/625] eta: 0:07:07 lr: 0.002329 min_lr: 0.002329 loss: 3.6450 (3.7349) class_acc: 0.3672 (0.3719) weight_decay: 0.0500 (0.0500) grad_norm: 1.5047 (1.5681) time: 1.8559 data: 0.0292 max mem: 6925
Epoch: [11] [600/625] eta: 0:00:47 lr: 0.002393 min_lr: 0.002393 loss: 3.6886 (3.7292) class_acc: 0.3789 (0.3733) weight_decay: 0.0500 (0.0500) grad_norm: 1.4215 (1.5687) time: 1.9081 data: 0.0014 max mem: 6925
Epoch: [11] [624/625] eta: 0:00:01 lr: 0.002400 min_lr: 0.002400 loss: 3.6711 (3.7270) class_acc: 0.3906 (0.3738) weight_decay: 0.0500 (0.0500) grad_norm: 1.4339 (1.5639) time: 0.8405 data: 0.0151 max mem: 6925
Epoch: [11] Total time: 0:19:29 (1.8712 s / it)
Averaged stats: lr: 0.002400 min_lr: 0.002400 loss: 3.6711 (3.7264) class_acc: 0.3906 (0.3737) weight_decay: 0.0500 (0.0500) grad_norm: 1.4339 (1.5639)
Test: [ 0/50] eta: 0:10:34 loss: 2.2988 (2.2988) acc1: 48.0000 (48.0000) acc5: 73.6000 (73.6000) time: 12.6837 data: 12.6409 max mem: 6925
Test: [10/50] eta: 0:01:26 loss: 2.2122 (2.1660) acc1: 52.0000 (53.3091) acc5: 75.2000 (76.4364) time: 2.1524 data: 2.1216 max mem: 6925
Test: [20/50] eta: 0:00:50 loss: 2.2729 (2.2869) acc1: 50.4000 (49.7905) acc5: 75.2000 (75.4667) time: 1.1458 data: 1.1166 max mem: 6925
Test: [30/50] eta: 0:00:31 loss: 2.3732 (2.2890) acc1: 47.2000 (49.2645) acc5: 74.4000 (75.2774) time: 1.2337 data: 1.2048 max mem: 6925
Test: [40/50] eta: 0:00:13 loss: 2.4234 (2.3161) acc1: 47.2000 (48.9951) acc5: 72.0000 (74.7122) time: 1.0500 data: 1.0209 max mem: 6925
Test: [49/50] eta: 0:00:01 loss: 2.3700 (2.3096) acc1: 49.6000 (49.2480) acc5: 72.8000 (74.6880) time: 0.8589 data: 0.8297 max mem: 6925
Test: Total time: 0:00:57 (1.1552 s / it)
* Acc@1 50.052 Acc@5 75.208 loss 2.262
Accuracy of the model on the 50000 test images: 50.1%
Max accuracy: 50.05%
Epoch: [12] [ 0/625] eta: 3:40:25 lr: 0.002400 min_lr: 0.002400 loss: 3.9719 (3.9719) class_acc: 0.3242 (0.3242) weight_decay: 0.0500 (0.0500) time: 21.1601 data: 20.8835 max mem: 6925
Epoch: [12] [200/625] eta: 0:14:21 lr: 0.002464 min_lr: 0.002464 loss: 3.6442 (3.6511) class_acc: 0.3789 (0.3897) weight_decay: 0.0500 (0.0500) grad_norm: 1.7213 (1.6294) time: 1.8270 data: 1.4725 max mem: 6925
Epoch: [12] [400/625] eta: 0:07:18 lr: 0.002529 min_lr: 0.002529 loss: 3.6375 (3.6514) class_acc: 0.3828 (0.3896) weight_decay: 0.0500 (0.0500) grad_norm: 1.4712 (1.5392) time: 1.9212 data: 1.6315 max mem: 6925
Epoch: [12] [600/625] eta: 0:00:49 lr: 0.002593 min_lr: 0.002593 loss: 3.5887 (3.6482) class_acc: 0.3984 (0.3896) weight_decay: 0.0500 (0.0500) grad_norm: 1.1837 (1.5223) time: 2.1414 data: 1.8369 max mem: 6925
Epoch: [12] [624/625] eta: 0:00:01 lr: 0.002600 min_lr: 0.002600 loss: 3.6115 (3.6468) class_acc: 0.3906 (0.3897) weight_decay: 0.0500 (0.0500) grad_norm: 1.1411 (1.5169) time: 0.7565 data: 0.4857 max mem: 6925
Epoch: [12] Total time: 0:20:14 (1.9430 s / it)
Averaged stats: lr: 0.002600 min_lr: 0.002600 loss: 3.6115 (3.6467) class_acc: 0.3906 (0.3895) weight_decay: 0.0500 (0.0500) grad_norm: 1.1411 (1.5169)
Test: [ 0/50] eta: 0:11:11 loss: 1.9926 (1.9926) acc1: 59.2000 (59.2000) acc5: 80.8000 (80.8000) time: 13.4302 data: 13.3990 max mem: 6925
Test: [10/50] eta: 0:01:29 loss: 1.9990 (2.0311) acc1: 56.8000 (56.5091) acc5: 80.0000 (78.9091) time: 2.2330 data: 2.2019 max mem: 6925
Test: [20/50] eta: 0:00:52 loss: 2.2662 (2.1680) acc1: 52.0000 (52.1143) acc5: 77.6000 (77.4476) time: 1.1790 data: 1.1491 max mem: 6925
Test: [30/50] eta: 0:00:29 loss: 2.2676 (2.1699) acc1: 48.8000 (52.2581) acc5: 76.0000 (77.1613) time: 1.0674 data: 1.0388 max mem: 6925
Test: [40/50] eta: 0:00:12 loss: 2.1295 (2.2074) acc1: 50.4000 (51.4341) acc5: 76.0000 (76.5463) time: 0.6198 data: 0.5911 max mem: 6925
Test: [49/50] eta: 0:00:01 loss: 2.2974 (2.2271) acc1: 47.2000 (51.1040) acc5: 73.6000 (76.1920) time: 0.5375 data: 0.5090 max mem: 6925
Test: Total time: 0:00:50 (1.0134 s / it)
* Acc@1 52.120 Acc@5 76.486 loss 2.192
Accuracy of the model on the 50000 test images: 52.1%
Max accuracy: 52.12%
Epoch: [13] [ 0/625] eta: 3:51:03 lr: 0.002600 min_lr: 0.002600 loss: 3.6038 (3.6038) class_acc: 0.4023 (0.4023) weight_decay: 0.0500 (0.0500) time: 22.1810 data: 20.5355 max mem: 6925
Epoch: [13] [200/625] eta: 0:16:01 lr: 0.002665 min_lr: 0.002665 loss: 3.5499 (3.5709) class_acc: 0.4141 (0.4057) weight_decay: 0.0500 (0.0500) grad_norm: 1.1951 (1.4398) time: 1.9438 data: 1.3023 max mem: 6925
Epoch: [13] [400/625] eta: 0:08:02 lr: 0.002729 min_lr: 0.002729 loss: 3.6085 (3.5725) class_acc: 0.4023 (0.4056) weight_decay: 0.0500 (0.0500) grad_norm: 1.4695 (1.5375) time: 2.1444 data: 0.0007 max mem: 6925
Epoch: [13] [600/625] eta: 0:00:55 lr: 0.002793 min_lr: 0.002793 loss: 3.5015 (3.5724) class_acc: 0.4141 (0.4055) weight_decay: 0.0500 (0.0500) grad_norm: 1.2768 (1.4835) time: 2.5757 data: 0.4782 max mem: 6925
Epoch: [13] [624/625] eta: 0:00:02 lr: 0.002800 min_lr: 0.002800 loss: 3.5638 (3.5717) class_acc: 0.4102 (0.4055) weight_decay: 0.0500 (0.0500) grad_norm: 1.1903 (1.4767) time: 1.1197 data: 0.2401 max mem: 6925
Epoch: [13] Total time: 0:23:10 (2.2246 s / it)
Averaged stats: lr: 0.002800 min_lr: 0.002800 loss: 3.5638 (3.5824) class_acc: 0.4102 (0.4023) weight_decay: 0.0500 (0.0500) grad_norm: 1.1903 (1.4767)
Test: [ 0/50] eta: 0:12:59 loss: 1.9250 (1.9250) acc1: 58.4000 (58.4000) acc5: 81.6000 (81.6000) time: 15.5897 data: 15.5557 max mem: 6925
Test: [10/50] eta: 0:01:46 loss: 2.0562 (2.0035) acc1: 56.0000 (56.8727) acc5: 79.2000 (79.9273) time: 2.6703 data: 2.6403 max mem: 6925
Test: [20/50] eta: 0:01:08 loss: 2.1759 (2.1147) acc1: 55.2000 (53.8667) acc5: 78.4000 (78.1333) time: 1.6251 data: 1.5961 max mem: 6925
Test: [30/50] eta: 0:00:39 loss: 2.2028 (2.1041) acc1: 51.2000 (53.8581) acc5: 76.0000 (77.8839) time: 1.5841 data: 1.5552 max mem: 6925
Test: [40/50] eta: 0:00:16 loss: 2.0854 (2.1390) acc1: 51.2000 (52.8390) acc5: 76.0000 (77.2878) time: 1.0469 data: 1.0180 max mem: 6925
Test: [49/50] eta: 0:00:01 loss: 2.1602 (2.1576) acc1: 49.6000 (52.5600) acc5: 76.0000 (76.9760) time: 0.9021 data: 0.8735 max mem: 6925
Test: Total time: 0:01:10 (1.4139 s / it)
* Acc@1 53.068 Acc@5 77.256 loss 2.128
Accuracy of the model on the 50000 test images: 53.1%
Max accuracy: 53.07%
Epoch: [14] [ 0/625] eta: 3:47:45 lr: 0.002800 min_lr: 0.002800 loss: 3.3681 (3.3681) class_acc: 0.4570 (0.4570) weight_decay: 0.0500 (0.0500) time: 21.8653 data: 21.6336 max mem: 6925
Epoch: [14] [200/625] eta: 0:14:40 lr: 0.002865 min_lr: 0.002865 loss: 3.4788 (3.5440) class_acc: 0.4062 (0.4102) weight_decay: 0.0500 (0.0500) grad_norm: 1.1076 (1.5876) time: 1.9578 data: 0.0554 max mem: 6925
Epoch: [14] [400/625] eta: 0:07:47 lr: 0.002929 min_lr: 0.002929 loss: 3.5163 (3.5376) class_acc: 0.4062 (0.4108) weight_decay: 0.0500 (0.0500) grad_norm: 1.3727 (1.6105) time: 1.9791 data: 0.0576 max mem: 6925
Epoch: [14] [600/625] eta: 0:00:51 lr: 0.002993 min_lr: 0.002993 loss: 3.6315 (3.5367) class_acc: 0.4023 (0.4113) weight_decay: 0.0500 (0.0500) grad_norm: 1.3977 (1.5698) time: 2.3133 data: 0.0103 max mem: 6925
Epoch: [14] [624/625] eta: 0:00:02 lr: 0.003000 min_lr: 0.003000 loss: 3.5076 (3.5361) class_acc: 0.4258 (0.4117) weight_decay: 0.0500 (0.0500) grad_norm: 1.5024 (1.5632) time: 0.7930 data: 0.0253 max mem: 6925
Epoch: [14] Total time: 0:21:12 (2.0363 s / it)
Averaged stats: lr: 0.003000 min_lr: 0.003000 loss: 3.5076 (3.5297) class_acc: 0.4258 (0.4124) weight_decay: 0.0500 (0.0500) grad_norm: 1.5024 (1.5632)
Test: [ 0/50] eta: 0:10:54 loss: 1.8116 (1.8116) acc1: 58.4000 (58.4000) acc5: 81.6000 (81.6000) time: 13.0956 data: 13.0580 max mem: 6925
Test: [10/50] eta: 0:01:23 loss: 2.0015 (1.9533) acc1: 57.6000 (57.6727) acc5: 80.8000 (79.7818) time: 2.0780 data: 2.0478 max mem: 6925
Test: [20/50] eta: 0:00:48 loss: 2.0676 (2.0900) acc1: 54.4000 (54.2095) acc5: 79.2000 (78.5524) time: 1.0446 data: 1.0158 max mem: 6925
Test: [30/50] eta: 0:00:28 loss: 2.2031 (2.0872) acc1: 52.0000 (54.1419) acc5: 78.4000 (78.3742) time: 1.0641 data: 1.0356 max mem: 6925
Test: [40/50] eta: 0:00:13 loss: 2.0475 (2.1102) acc1: 53.6000 (53.9317) acc5: 76.8000 (77.6585) time: 1.0009 data: 0.9718 max mem: 6925
Test: [49/50] eta: 0:00:01 loss: 2.1816 (2.1282) acc1: 52.8000 (53.5520) acc5: 75.2000 (77.4080) time: 0.8804 data: 0.8503 max mem: 6925
Test: Total time: 0:00:59 (1.1811 s / it)
* Acc@1 53.402 Acc@5 77.840 loss 2.109
Accuracy of the model on the 50000 test images: 53.4%
Max accuracy: 53.40%
Epoch: [15] [ 0/625] eta: 3:14:10 lr: 0.003000 min_lr: 0.003000 loss: 3.4767 (3.4767) class_acc: 0.3906 (0.3906) weight_decay: 0.0500 (0.0500) time: 18.6402 data: 16.2598 max mem: 6925
Epoch: [15] [200/625] eta: 0:14:41 lr: 0.003065 min_lr: 0.003065 loss: 3.5513 (3.4757) class_acc: 0.3984 (0.4224) weight_decay: 0.0500 (0.0500) grad_norm: 1.3221 (1.4427) time: 2.0642 data: 0.0009 max mem: 6925
Epoch: [15] [400/625] eta: 0:07:41 lr: 0.003129 min_lr: 0.003129 loss: 3.5201 (3.4865) class_acc: 0.4141 (0.4212) weight_decay: 0.0500 (0.0500) grad_norm: 1.6657 (1.5378) time: 2.1972 data: 0.0009 max mem: 6925
Epoch: [15] [600/625] eta: 0:00:50 lr: 0.003193 min_lr: 0.003193 loss: 3.4235 (3.4870) class_acc: 0.4258 (0.4212) weight_decay: 0.0500 (0.0500) grad_norm: 1.4227 (1.5087) time: 2.1383 data: 1.0790 max mem: 6925
Epoch: [15] [624/625] eta: 0:00:01 lr: 0.003200 min_lr: 0.003200 loss: 3.5013 (3.4865) class_acc: 0.4258 (0.4212) weight_decay: 0.0500 (0.0500) grad_norm: 1.2106 (1.5000) time: 0.8800 data: 0.2174 max mem: 6925
Epoch: [15] Total time: 0:20:46 (1.9951 s / it)
Averaged stats: lr: 0.003200 min_lr: 0.003200 loss: 3.5013 (3.4805) class_acc: 0.4258 (0.4226) weight_decay: 0.0500 (0.0500) grad_norm: 1.2106 (1.5000)
Test: [ 0/50] eta: 0:10:38 loss: 2.2376 (2.2376) acc1: 47.2000 (47.2000) acc5: 79.2000 (79.2000) time: 12.7720 data: 12.7301 max mem: 6925
Test: [10/50] eta: 0:01:17 loss: 1.9107 (1.9385) acc1: 57.6000 (57.7455) acc5: 81.6000 (80.9455) time: 1.9306 data: 1.9005 max mem: 6925
Test: [20/50] eta: 0:00:41 loss: 2.1752 (2.1111) acc1: 54.4000 (53.6762) acc5: 76.8000 (78.2857) time: 0.8144 data: 0.7853 max mem: 6925
Test: [30/50] eta: 0:00:26 loss: 2.2672 (2.1060) acc1: 50.4000 (53.5742) acc5: 75.2000 (78.0129) time: 0.9745 data: 0.9448 max mem: 6925
Test: [40/50] eta: 0:00:11 loss: 2.1741 (2.1416) acc1: 52.8000 (53.0146) acc5: 76.8000 (77.6585) time: 0.9834 data: 0.9534 max mem: 6925
Test: [49/50] eta: 0:00:01 loss: 2.1741 (2.1489) acc1: 53.6000 (53.0080) acc5: 78.4000 (77.5360) time: 0.5609 data: 0.5316 max mem: 6925
Test: Total time: 0:00:52 (1.0414 s / it)
* Acc@1 53.838 Acc@5 77.888 loss 2.097
Accuracy of the model on the 50000 test images: 53.8%
Max accuracy: 53.84%
Epoch: [16] [ 0/625] eta: 3:57:39 lr: 0.003201 min_lr: 0.003201 loss: 3.5308 (3.5308) class_acc: 0.4141 (0.4141) weight_decay: 0.0500 (0.0500) time: 22.8149 data: 22.5833 max mem: 6925
Epoch: [16] [200/625] eta: 0:14:33 lr: 0.003265 min_lr: 0.003265 loss: 3.4309 (3.4520) class_acc: 0.4297 (0.4285) weight_decay: 0.0500 (0.0500) grad_norm: 1.2463 (1.5067) time: 1.9697 data: 0.4155 max mem: 6925
Epoch: [16] [400/625] eta: 0:07:36 lr: 0.003329 min_lr: 0.003329 loss: 3.4118 (3.4434) class_acc: 0.4414 (0.4300) weight_decay: 0.0500 (0.0500) grad_norm: 1.1861 (1.5175) time: 2.1271 data: 0.5679 max mem: 6925
Epoch: [16] [600/625] eta: 0:00:51 lr: 0.003393 min_lr: 0.003393 loss: 3.4307 (3.4385) class_acc: 0.4219 (0.4307) weight_decay: 0.0500 (0.0500) grad_norm: 1.6104 (1.5296) time: 2.1493 data: 0.0236 max mem: 6925
Epoch: [16] [624/625] eta: 0:00:02 lr: 0.003400 min_lr: 0.003400 loss: 3.3894 (3.4380) class_acc: 0.4297 (0.4306) weight_decay: 0.0500 (0.0500) grad_norm: 1.4334 (1.5319) time: 0.7540 data: 0.1656 max mem: 6925
Epoch: [16] Total time: 0:20:58 (2.0140 s / it)
Averaged stats: lr: 0.003400 min_lr: 0.003400 loss: 3.3894 (3.4382) class_acc: 0.4297 (0.4314) weight_decay: 0.0500 (0.0500) grad_norm: 1.4334 (1.5319)
Test: [ 0/50] eta: 0:10:56 loss: 1.8150 (1.8150) acc1: 59.2000 (59.2000) acc5: 78.4000 (78.4000) time: 13.1257 data: 13.0902 max mem: 6925
Test: [10/50] eta: 0:01:29 loss: 1.9224 (1.8670) acc1: 59.2000 (59.2727) acc5: 80.0000 (81.4545) time: 2.2420 data: 2.2119 max mem: 6925
Test: [20/50] eta: 0:00:53 loss: 1.9845 (1.9906) acc1: 56.0000 (56.5333) acc5: 79.2000 (80.0000) time: 1.2224 data: 1.1931 max mem: 6925
Test: [30/50] eta: 0:00:32 loss: 2.1207 (2.0110) acc1: 51.2000 (55.1226) acc5: 79.2000 (79.3548) time: 1.2673 data: 1.2386 max mem: 6925
Test: [40/50] eta: 0:00:14 loss: 2.1001 (2.0459) acc1: 51.2000 (54.6341) acc5: 76.8000 (78.7122) time: 1.0646 data: 1.0362 max mem: 6925
Test: [49/50] eta: 0:00:01 loss: 2.1001 (2.0460) acc1: 52.0000 (54.5600) acc5: 76.8000 (78.6720) time: 1.0002 data: 0.9714 max mem: 6925
Test: Total time: 0:00:59 (1.1861 s / it)
* Acc@1 55.056 Acc@5 79.126 loss 2.011
Accuracy of the model on the 50000 test images: 55.1%
Max accuracy: 55.06%
Epoch: [17] [ 0/625] eta: 3:10:19 lr: 0.003401 min_lr: 0.003401 loss: 3.2761 (3.2761) class_acc: 0.4414 (0.4414) weight_decay: 0.0500 (0.0500) time: 18.2719 data: 17.3292 max mem: 6925
Epoch: [17] [200/625] eta: 0:14:52 lr: 0.003465 min_lr: 0.003465 loss: 3.3909 (3.3878) class_acc: 0.4414 (0.4426) weight_decay: 0.0500 (0.0500) grad_norm: 1.3092 (1.4510) time: 2.1927 data: 0.5247 max mem: 6925
Epoch: [17] [400/625] eta: 0:07:47 lr: 0.003529 min_lr: 0.003529 loss: 3.3460 (3.4010) class_acc: 0.4414 (0.4403) weight_decay: 0.0500 (0.0500) grad_norm: 1.2241 (1.4742) time: 1.9340 data: 0.4094 max mem: 6925
Epoch: [17] [600/625] eta: 0:00:51 lr: 0.003593 min_lr: 0.003593 loss: 3.4586 (3.4046) class_acc: 0.4336 (0.4393) weight_decay: 0.0500 (0.0500) grad_norm: 1.3688 (1.4467) time: 2.0656 data: 0.0114 max mem: 6925
Epoch: [17] [624/625] eta: 0:00:02 lr: 0.003600 min_lr: 0.003600 loss: 3.3960 (3.4054) class_acc: 0.4375 (0.4394) weight_decay: 0.0500 (0.0500) grad_norm: 1.2734 (1.4532) time: 0.5433 data: 0.0205 max mem: 6925
Epoch: [17] Total time: 0:21:18 (2.0461 s / it)
Averaged stats: lr: 0.003600 min_lr: 0.003600 loss: 3.3960 (3.4033) class_acc: 0.4375 (0.4385) weight_decay: 0.0500 (0.0500) grad_norm: 1.2734 (1.4532)
Test: [ 0/50] eta: 0:10:16 loss: 1.9363 (1.9363) acc1: 56.8000 (56.8000) acc5: 80.0000 (80.0000) time: 12.3355 data: 12.2976 max mem: 6925
Test: [10/50] eta: 0:01:18 loss: 1.9575 (1.9601) acc1: 56.8000 (56.5091) acc5: 80.0000 (79.4909) time: 1.9600 data: 1.9299 max mem: 6925
Test: [20/50] eta: 0:00:45 loss: 2.1019 (2.0714) acc1: 53.6000 (54.3619) acc5: 79.2000 (78.3619) time: 0.9590 data: 0.9302 max mem: 6925
Test: [30/50] eta: 0:00:26 loss: 2.2022 (2.0754) acc1: 51.2000 (54.0645) acc5: 76.8000 (77.7806) time: 0.9812 data: 0.9522 max mem: 6925
Test: [40/50] eta: 0:00:12 loss: 2.1551 (2.0982) acc1: 50.4000 (53.0732) acc5: 75.2000 (77.2878) time: 1.0598 data: 1.0301 max mem: 6925
Test: [49/50] eta: 0:00:01 loss: 2.1114 (2.0945) acc1: 50.4000 (53.0400) acc5: 76.0000 (77.2480) time: 0.8446 data: 0.8154 max mem: 6925
Test: Total time: 0:00:58 (1.1641 s / it)
* Acc@1 53.734 Acc@5 77.882 loss 2.064
Accuracy of the model on the 50000 test images: 53.7%
Max accuracy: 55.06%
Epoch: [18] [ 0/625] eta: 3:39:41 lr: 0.003601 min_lr: 0.003601 loss: 3.4790 (3.4790) class_acc: 0.4375 (0.4375) weight_decay: 0.0500 (0.0500) time: 21.0904 data: 20.8600 max mem: 6925
Epoch: [18] [200/625] eta: 0:14:13 lr: 0.003665 min_lr: 0.003665 loss: 3.2935 (3.3646) class_acc: 0.4570 (0.4477) weight_decay: 0.0500 (0.0500) grad_norm: 1.3893 (1.5764) time: 1.9814 data: 0.0838 max mem: 6925
Epoch: [18] [400/625] eta: 0:07:34 lr: 0.003729 min_lr: 0.003729 loss: 3.3851 (3.3706) class_acc: 0.4570 (0.4459) weight_decay: 0.0500 (0.0500) grad_norm: 1.2128 (1.5045) time: 2.1540 data: 0.0468 max mem: 6925
Epoch: [18] [600/625] eta: 0:00:51 lr: 0.003793 min_lr: 0.003793 loss: 3.3546 (3.3771) class_acc: 0.4414 (0.4439) weight_decay: 0.0500 (0.0500) grad_norm: 1.4672 (1.4971) time: 2.2075 data: 0.0008 max mem: 6925
Epoch: [18] [624/625] eta: 0:00:02 lr: 0.003800 min_lr: 0.003800 loss: 3.3689 (3.3785) class_acc: 0.4336 (0.4437) weight_decay: 0.0500 (0.0500) grad_norm: 1.4598 (1.5079) time: 1.1720 data: 0.0202 max mem: 6925
Epoch: [18] Total time: 0:20:54 (2.0068 s / it)
Averaged stats: lr: 0.003800 min_lr: 0.003800 loss: 3.3689 (3.3757) class_acc: 0.4336 (0.4444) weight_decay: 0.0500 (0.0500) grad_norm: 1.4598 (1.5079)
Test: [ 0/50] eta: 0:10:37 loss: 1.8819 (1.8819) acc1: 56.0000 (56.0000) acc5: 80.0000 (80.0000) time: 12.7470 data: 12.7039 max mem: 6925
Test: [10/50] eta: 0:01:24 loss: 1.8819 (1.8590) acc1: 56.8000 (58.2545) acc5: 80.0000 (81.6000) time: 2.1094 data: 2.0790 max mem: 6925
Test: [20/50] eta: 0:00:50 loss: 2.0360 (2.0109) acc1: 54.4000 (55.4286) acc5: 78.4000 (79.5048) time: 1.1217 data: 1.0928 max mem: 6925
Test: [30/50] eta: 0:00:30 loss: 2.1274 (2.0095) acc1: 52.8000 (54.9419) acc5: 78.4000 (79.5355) time: 1.1937 data: 1.1644 max mem: 6925
Test: [40/50] eta: 0:00:14 loss: 2.0620 (2.0297) acc1: 51.2000 (54.8098) acc5: 78.4000 (78.7317) time: 1.1275 data: 1.0978 max mem: 6925
Test: [49/50] eta: 0:00:01 loss: 2.1176 (2.0345) acc1: 52.0000 (54.5760) acc5: 76.8000 (78.7680) time: 1.1179 data: 1.0888 max mem: 6925
Test: Total time: 0:00:59 (1.1876 s / it)
* Acc@1 55.540 Acc@5 79.248 loss 1.995
Accuracy of the model on the 50000 test images: 55.5%
Max accuracy: 55.54%
Epoch: [19] [ 0/625] eta: 3:54:41 lr: 0.003801 min_lr: 0.003801 loss: 3.3794 (3.3794) class_acc: 0.4609 (0.4609) weight_decay: 0.0500 (0.0500) time: 22.5298 data: 22.0592 max mem: 6925
Epoch: [19] [200/625] eta: 0:14:26 lr: 0.003865 min_lr: 0.003865 loss: 3.3368 (3.3375) class_acc: 0.4531 (0.4518) weight_decay: 0.0500 (0.0500) grad_norm: 1.2369 (1.3929) time: 2.0328 data: 0.0007 max mem: 6925
Epoch: [19] [400/625] eta: 0:07:42 lr: 0.003929 min_lr: 0.003929 loss: 3.2767 (3.3476) class_acc: 0.4531 (0.4494) weight_decay: 0.0500 (0.0500) grad_norm: 1.0596 (1.3889) time: 2.1770 data: 0.0007 max mem: 6925
Epoch: [19] [600/625] eta: 0:00:51 lr: 0.003993 min_lr: 0.003993 loss: 3.2959 (3.3459) class_acc: 0.4531 (0.4502) weight_decay: 0.0500 (0.0500) grad_norm: 1.3254 (1.4013) time: 2.2302 data: 0.0009 max mem: 6925
Epoch: [19] [624/625] eta: 0:00:02 lr: 0.004000 min_lr: 0.004000 loss: 3.3472 (3.3482) class_acc: 0.4414 (0.4499) weight_decay: 0.0500 (0.0500) grad_norm: 1.0930 (1.3911) time: 1.0664 data: 0.0018 max mem: 6925
Epoch: [19] Total time: 0:20:58 (2.0137 s / it)
Averaged stats: lr: 0.004000 min_lr: 0.004000 loss: 3.3472 (3.3489) class_acc: 0.4414 (0.4496) weight_decay: 0.0500 (0.0500) grad_norm: 1.0930 (1.3911)
Test: [ 0/50] eta: 0:09:59 loss: 1.8480 (1.8480) acc1: 59.2000 (59.2000) acc5: 83.2000 (83.2000) time: 11.9827 data: 11.9450 max mem: 6925
Test: [10/50] eta: 0:01:10 loss: 1.8480 (1.7980) acc1: 59.2000 (60.5818) acc5: 83.2000 (82.6909) time: 1.7686 data: 1.7380 max mem: 6925
Test: [20/50] eta: 0:00:39 loss: 1.9844 (1.9459) acc1: 56.0000 (56.8000) acc5: 80.8000 (80.6476) time: 0.7819 data: 0.7528 max mem: 6925
Test: [30/50] eta: 0:00:23 loss: 2.0674 (1.9816) acc1: 51.2000 (55.6387) acc5: 78.4000 (79.6129) time: 0.8224 data: 0.7941 max mem: 6925
Test: [40/50] eta: 0:00:11 loss: 1.9983 (1.9848) acc1: 52.0000 (55.6683) acc5: 77.6000 (79.3756) time: 0.9966 data: 0.9679 max mem: 6925
Test: [49/50] eta: 0:00:01 loss: 2.0363 (1.9971) acc1: 54.4000 (55.4240) acc5: 77.6000 (79.1520) time: 0.8041 data: 0.7754 max mem: 6925
Test: Total time: 0:00:52 (1.0425 s / it)
* Acc@1 55.980 Acc@5 80.052 loss 1.959
Accuracy of the model on the 50000 test images: 56.0%
Max accuracy: 55.98%
Epoch: [20] [ 0/625] eta: 3:58:38 lr: 0.004000 min_lr: 0.004000 loss: 3.3307 (3.3307) class_acc: 0.4375 (0.4375) weight_decay: 0.0500 (0.0500) time: 22.9096 data: 15.4595 max mem: 6925
Epoch: [20] [200/625] eta: 0:14:50 lr: 0.004000 min_lr: 0.004000 loss: 3.3444 (3.3180) class_acc: 0.4453 (0.4563) weight_decay: 0.0500 (0.0500) grad_norm: 1.6040 (1.4375) time: 1.9376 data: 0.0009 max mem: 6925
Epoch: [20] [400/625] eta: 0:07:40 lr: 0.004000 min_lr: 0.004000 loss: 3.3124 (3.3227) class_acc: 0.4609 (0.4550) weight_decay: 0.0500 (0.0500) grad_norm: 1.3311 (1.4387) time: 1.8448 data: 0.1129 max mem: 6925
Epoch: [20] [600/625] eta: 0:00:51 lr: 0.004000 min_lr: 0.004000 loss: 3.3709 (3.3262) class_acc: 0.4375 (0.4544) weight_decay: 0.0500 (0.0500) grad_norm: 1.2990 (1.4154) time: 2.0352 data: 0.0007 max mem: 6925
Epoch: [20] [624/625] eta: 0:00:02 lr: 0.004000 min_lr: 0.004000 loss: 3.4000 (3.3276) class_acc: 0.4336 (0.4540) weight_decay: 0.0500 (0.0500) grad_norm: 1.2902 (1.4217) time: 0.9161 data: 0.0014 max mem: 6925
Epoch: [20] Total time: 0:20:55 (2.0085 s / it)
Averaged stats: lr: 0.004000 min_lr: 0.004000 loss: 3.4000 (3.3211) class_acc: 0.4336 (0.4556) weight_decay: 0.0500 (0.0500) grad_norm: 1.2902 (1.4217)
Test: [ 0/50] eta: 0:08:17 loss: 1.8120 (1.8120) acc1: 58.4000 (58.4000) acc5: 80.8000 (80.8000) time: 9.9593 data: 9.9265 max mem: 6925
Test: [10/50] eta: 0:01:13 loss: 1.9912 (1.8961) acc1: 58.4000 (57.8909) acc5: 80.8000 (81.0182) time: 1.8483 data: 1.8188 max mem: 6925
Test: [20/50] eta: 0:00:45 loss: 1.9951 (2.0200) acc1: 54.4000 (55.3143) acc5: 79.2000 (79.2000) time: 1.0773 data: 1.0478 max mem: 6925
Test: [30/50] eta: 0:00:27 loss: 2.1099 (2.0241) acc1: 52.0000 (54.7097) acc5: 78.4000 (79.3290) time: 1.0992 data: 1.0686 max mem: 6925
Test: [40/50] eta: 0:00:12 loss: 2.0292 (2.0326) acc1: 51.2000 (54.1854) acc5: 79.2000 (79.2000) time: 0.9739 data: 0.9440 max mem: 6925
Test: [49/50] eta: 0:00:01 loss: 2.0373 (2.0390) acc1: 52.0000 (54.1760) acc5: 80.0000 (78.9760) time: 0.9869 data: 0.9567 max mem: 6925
Test: Total time: 0:00:57 (1.1445 s / it)
* Acc@1 55.662 Acc@5 79.310 loss 1.985
Accuracy of the model on the 50000 test images: 55.7%
Max accuracy: 55.98%
Epoch: [21] [ 0/625] eta: 3:46:41 lr: 0.004000 min_lr: 0.004000 loss: 3.1756 (3.1756) class_acc: 0.5000 (0.5000) weight_decay: 0.0500 (0.0500) time: 21.7626 data: 18.8851 max mem: 6925
Epoch: [21] [200/625] eta: 0:14:44 lr: 0.004000 min_lr: 0.004000 loss: 3.2297 (3.2715) class_acc: 0.4727 (0.4635) weight_decay: 0.0500 (0.0500) grad_norm: 1.0984 (1.3901) time: 2.0127 data: 0.0343 max mem: 6925
Epoch: [21] [400/625] eta: 0:07:39 lr: 0.004000 min_lr: 0.004000 loss: 3.2596 (3.2742) class_acc: 0.4609 (0.4627) weight_decay: 0.0500 (0.0500) grad_norm: 1.2856 (1.3879) time: 2.0071 data: 0.0008 max mem: 6925
Epoch: [21] [600/625] eta: 0:00:50 lr: 0.004000 min_lr: 0.004000 loss: 3.3081 (3.2786) class_acc: 0.4609 (0.4629) weight_decay: 0.0500 (0.0500) grad_norm: 1.1094 (1.4098) time: 1.9889 data: 0.0006 max mem: 6925
Epoch: [21] [624/625] eta: 0:00:01 lr: 0.003999 min_lr: 0.003999 loss: 3.3010 (3.2800) class_acc: 0.4570 (0.4628) weight_decay: 0.0500 (0.0500) grad_norm: 1.1884 (1.4073) time: 1.0288 data: 0.0013 max mem: 6925
Epoch: [21] Total time: 0:20:46 (1.9948 s / it)
Averaged stats: lr: 0.003999 min_lr: 0.003999 loss: 3.3010 (3.2879) class_acc: 0.4570 (0.4623) weight_decay: 0.0500 (0.0500) grad_norm: 1.1884 (1.4073)
Test: [ 0/50] eta: 0:09:17 loss: 1.9069 (1.9069) acc1: 56.8000 (56.8000) acc5: 83.2000 (83.2000) time: 11.1503 data: 11.1189 max mem: 6925
Test: [10/50] eta: 0:01:16 loss: 1.7605 (1.7950) acc1: 59.2000 (60.1455) acc5: 81.6000 (82.1818) time: 1.9124 data: 1.8827 max mem: 6925
Test: [20/50] eta: 0:00:44 loss: 1.9519 (1.9149) acc1: 57.6000 (57.2571) acc5: 80.8000 (80.7619) time: 1.0093 data: 0.9802 max mem: 6925
Test: [30/50] eta: 0:00:26 loss: 2.0325 (1.9291) acc1: 55.2000 (57.0323) acc5: 78.4000 (80.2581) time: 1.0197 data: 0.9910 max mem: 6925
Test: [40/50] eta: 0:00:11 loss: 2.0043 (1.9758) acc1: 55.2000 (55.8049) acc5: 78.4000 (79.3756) time: 0.8673 data: 0.8384 max mem: 6925
Test: [49/50] eta: 0:00:01 loss: 2.0043 (1.9855) acc1: 55.2000 (55.7760) acc5: 80.0000 (79.3280) time: 0.8046 data: 0.7753 max mem: 6925
Test: Total time: 0:00:53 (1.0607 s / it)
* Acc@1 56.702 Acc@5 80.214 loss 1.936
Accuracy of the model on the 50000 test images: 56.7%
Max accuracy: 56.70%
Epoch: [22] [ 0/625] eta: 3:49:06 lr: 0.003999 min_lr: 0.003999 loss: 3.2059 (3.2059) class_acc: 0.4961 (0.4961) weight_decay: 0.0500 (0.0500) time: 21.9939 data: 21.2263 max mem: 6925
Epoch: [22] [200/625] eta: 0:14:21 lr: 0.003999 min_lr: 0.003999 loss: 3.2852 (3.2531) class_acc: 0.4531 (0.4717) weight_decay: 0.0500 (0.0500) grad_norm: 1.2180 (1.3728) time: 2.0918 data: 0.0159 max mem: 6925
Epoch: [22] [400/625] eta: 0:07:31 lr: 0.003999 min_lr: 0.003999 loss: 3.2529 (3.2568) class_acc: 0.4492 (0.4689) weight_decay: 0.0500 (0.0500) grad_norm: 1.3509 (inf) time: 2.0125 data: 0.0489 max mem: 6925
Epoch: [22] [600/625] eta: 0:00:50 lr: 0.003999 min_lr: 0.003999 loss: 3.2755 (3.2551) class_acc: 0.4648 (0.4697) weight_decay: 0.0500 (0.0500) grad_norm: 1.0178 (inf) time: 1.8202 data: 0.0008 max mem: 6925
Epoch: [22] [624/625] eta: 0:00:01 lr: 0.003999 min_lr: 0.003999 loss: 3.2056 (3.2551) class_acc: 0.4727 (0.4698) weight_decay: 0.0500 (0.0500) grad_norm: 1.2569 (inf) time: 0.8582 data: 0.0016 max mem: 6925
Epoch: [22] Total time: 0:20:39 (1.9827 s / it)
Averaged stats: lr: 0.003999 min_lr: 0.003999 loss: 3.2056 (3.2486) class_acc: 0.4727 (0.4708) weight_decay: 0.0500 (0.0500) grad_norm: 1.2569 (inf)
Test: [ 0/50] eta: 0:12:16 loss: 2.0468 (2.0468) acc1: 53.6000 (53.6000) acc5: 82.4000 (82.4000) time: 14.7328 data: 14.6967 max mem: 6925
Test: [10/50] eta: 0:01:29 loss: 1.9067 (1.8810) acc1: 57.6000 (57.8182) acc5: 82.4000 (82.4727) time: 2.2356 data: 2.2056 max mem: 6925
Test: [20/50] eta: 0:00:50 loss: 2.0120 (2.0484) acc1: 53.6000 (54.8190) acc5: 77.6000 (80.0000) time: 1.0170 data: 0.9882 max mem: 6925
Test: [30/50] eta: 0:00:28 loss: 2.1720 (2.0652) acc1: 51.2000 (53.9871) acc5: 76.0000 (79.2516) time: 1.0110 data: 0.9825 max mem: 6925
Test: [40/50] eta: 0:00:12 loss: 2.1720 (2.0886) acc1: 52.0000 (53.6585) acc5: 76.0000 (78.7707) time: 0.7643 data: 0.7354 max mem: 6925
Test: [49/50] eta: 0:00:01 loss: 2.1234 (2.0790) acc1: 52.0000 (54.0000) acc5: 78.4000 (78.9280) time: 0.8242 data: 0.7951 max mem: 6925
Test: Total time: 0:00:54 (1.0948 s / it)
* Acc@1 54.728 Acc@5 79.098 loss 2.033
Accuracy of the model on the 50000 test images: 54.7%
Max accuracy: 56.70%
Epoch: [23] [ 0/625] eta: 3:30:03 lr: 0.003999 min_lr: 0.003999 loss: 3.2045 (3.2045) class_acc: 0.4961 (0.4961) weight_decay: 0.0500 (0.0500) time: 20.1658 data: 18.6996 max mem: 6925
Epoch: [23] [200/625] eta: 0:14:24 lr: 0.003999 min_lr: 0.003999 loss: 3.2049 (3.2234) class_acc: 0.4766 (0.4746) weight_decay: 0.0500 (0.0500) grad_norm: 1.5337 (1.4239) time: 1.9802 data: 0.5958 max mem: 6925
Epoch: [23] [400/625] eta: 0:07:23 lr: 0.003998 min_lr: 0.003998 loss: 3.2479 (3.2242) class_acc: 0.4688 (0.4761) weight_decay: 0.0500 (0.0500) grad_norm: 0.9697 (1.3550) time: 1.9830 data: 0.1801 max mem: 6925
Epoch: [23] [600/625] eta: 0:00:49 lr: 0.003998 min_lr: 0.003998 loss: 3.2030 (3.2257) class_acc: 0.4688 (0.4752) weight_decay: 0.0500 (0.0500) grad_norm: 1.0974 (1.3447) time: 1.9075 data: 1.4613 max mem: 6925
Epoch: [23] [624/625] eta: 0:00:01 lr: 0.003998 min_lr: 0.003998 loss: 3.2291 (3.2256) class_acc: 0.4648 (0.4752) weight_decay: 0.0500 (0.0500) grad_norm: 1.2201 (1.3421) time: 0.8960 data: 0.5650 max mem: 6925
Epoch: [23] Total time: 0:20:22 (1.9552 s / it)
Averaged stats: lr: 0.003998 min_lr: 0.003998 loss: 3.2291 (3.2218) class_acc: 0.4648 (0.4766) weight_decay: 0.0500 (0.0500) grad_norm: 1.2201 (1.3421)
Test: [ 0/50] eta: 0:09:36 loss: 1.7914 (1.7914) acc1: 58.4000 (58.4000) acc5: 83.2000 (83.2000) time: 11.5359 data: 11.5002 max mem: 6925
Test: [10/50] eta: 0:01:27 loss: 1.8038 (1.8272) acc1: 57.6000 (58.4000) acc5: 82.4000 (81.7455) time: 2.1750 data: 2.1458 max mem: 6925
Test: [20/50] eta: 0:00:53 loss: 1.8913 (1.9583) acc1: 55.2000 (55.9238) acc5: 80.0000 (80.3048) time: 1.2886 data: 1.2600 max mem: 6925
Test: [30/50] eta: 0:00:30 loss: 2.0884 (1.9825) acc1: 53.6000 (55.1226) acc5: 78.4000 (79.5871) time: 1.1919 data: 1.1630 max mem: 6925
Test: [40/50] eta: 0:00:12 loss: 2.0884 (2.0074) acc1: 53.6000 (54.6146) acc5: 76.8000 (79.1610) time: 0.7206 data: 0.6906 max mem: 6925
Test: [49/50] eta: 0:00:01 loss: 2.0101 (2.0100) acc1: 52.0000 (54.6240) acc5: 80.0000 (79.2640) time: 0.6200 data: 0.5897 max mem: 6925
Test: Total time: 0:00:53 (1.0756 s / it)
* Acc@1 56.242 Acc@5 80.122 loss 1.954
Accuracy of the model on the 50000 test images: 56.2%
Max accuracy: 56.70%
Epoch: [24] [ 0/625] eta: 3:50:36 lr: 0.003998 min_lr: 0.003998 loss: 3.0273 (3.0273) class_acc: 0.5234 (0.5234) weight_decay: 0.0500 (0.0500) time: 22.1390 data: 14.7003 max mem: 6925
Epoch: [24] [200/625] eta: 0:14:00 lr: 0.003998 min_lr: 0.003998 loss: 3.2235 (3.1959) class_acc: 0.4727 (0.4837) weight_decay: 0.0500 (0.0500) grad_norm: 1.4420 (1.4856) time: 1.7901 data: 0.0006 max mem: 6925
Epoch: [24] [400/625] eta: 0:07:20 lr: 0.003997 min_lr: 0.003997 loss: 3.2135 (3.1972) class_acc: 0.4766 (0.4818) weight_decay: 0.0500 (0.0500) grad_norm: 1.3066 (1.3565) time: 2.0085 data: 0.0007 max mem: 6925
Epoch: [24] [600/625] eta: 0:00:49 lr: 0.003997 min_lr: 0.003997 loss: 3.1898 (3.1984) class_acc: 0.4844 (0.4812) weight_decay: 0.0500 (0.0500) grad_norm: 1.0375 (1.3528) time: 2.2258 data: 0.0007 max mem: 6925
Epoch: [24] [624/625] eta: 0:00:01 lr: 0.003997 min_lr: 0.003997 loss: 3.1570 (3.1984) class_acc: 0.4961 (0.4814) weight_decay: 0.0500 (0.0500) grad_norm: 1.3195 (1.3615) time: 1.1225 data: 0.0014 max mem: 6925
Epoch: [24] Total time: 0:20:07 (1.9321 s / it)
Averaged stats: lr: 0.003997 min_lr: 0.003997 loss: 3.1570 (3.1975) class_acc: 0.4961 (0.4819) weight_decay: 0.0500 (0.0500) grad_norm: 1.3195 (1.3615)
Test: [ 0/50] eta: 0:09:34 loss: 1.7549 (1.7549) acc1: 59.2000 (59.2000) acc5: 86.4000 (86.4000) time: 11.4959 data: 11.4576 max mem: 6925
Test: [10/50] eta: 0:01:12 loss: 1.7945 (1.8123) acc1: 58.4000 (59.4182) acc5: 82.4000 (82.1091) time: 1.8246 data: 1.7944 max mem: 6925
Test: [20/50] eta: 0:00:41 loss: 1.9018 (1.9572) acc1: 56.0000 (56.6476) acc5: 80.8000 (80.9524) time: 0.8900 data: 0.8611 max mem: 6925
Test: [30/50] eta: 0:00:26 loss: 2.1054 (1.9758) acc1: 53.6000 (56.1548) acc5: 79.2000 (80.2839) time: 1.0503 data: 1.0217 max mem: 6925
Test: [40/50] eta: 0:00:11 loss: 1.9926 (1.9906) acc1: 54.4000 (55.8439) acc5: 78.4000 (79.6878) time: 0.9374 data: 0.9076 max mem: 6925
Test: [49/50] eta: 0:00:01 loss: 1.9616 (1.9823) acc1: 55.2000 (56.0000) acc5: 78.4000 (79.7920) time: 0.5062 data: 0.4757 max mem: 6925