This is quick evaluation of different lr_policies with SGD solver performance on ImageNet-2012.

The architecture is similar to CaffeNet, but has differences:

Non-linearity: ReLU Augmentation: random crop 128x128 from 144xN image, 50% random horizontal flip.

Name	Accuracy	LogLoss	Comments
Step 100K	0.471	2.36	Default caffenet solver, max_iter=320K
Poly lr, p=0.5, sqrt	0.483	2.29	bvlc_quick_googlenet_solver, All the way worse than "step", leading at finish
Poly lr, p=2.0, sqr	0.483	2.299
Poly lr, p=1.0, linear	0.493	2.24
Poly lr, p=1.0, linear	0.466	2.39	max_iter=160K
Exp, 0.035	0.441	2.53	max_iter=160K, stepsize=2K, gamma=0.915, same as in base_dereyly

Name	Accuracy	LogLoss	Comments
Step 100K	0.527	2.09	Default caffenet solver, max_iter=320K
Poly lr, p=1.0, linear	0.496	2.24	max_iter=105K,
Poly lr, p=1.0, start_lr=0.02	0.505	2.21	max_iter=105K
Exp, 0.035	0.506	2.19	max_iter=160K, stepsize=2K, gamma=0.915, same as in base_dereyly

Step100K == reference caffenet solver. base_lr: 0.01 lr_policy: "step" gamma: 0.1 momentum: 0.9 weight_decay: 0.0005

"Poly_sqrt" == bvlc_googlenet_quick_solver lr_policy: "poly" power: 0.5

"Poly_sqr" lr_policy: "poly" power: 2.0

"Linear" lr_policy: "poly" power: 1.0

"Linear_2x_faster" max_iter: 160000

"0035_exp_160K" max_iter: 160000 base_lr: 0.035 lr_policy: "step" stepsize: 2000 gamma: 0.915

"Linear_3x_faster" max_iter: 105000

See learning_rate graph

Provide feedback

Saved searches