Loss Increases after some epochs #7603

ktiwary2 · 2017-08-11T15:54:13Z

I have tried different convolutional neural network codes and I am running into a similar issue. The network starts out training well and decreases the loss but after sometime the loss just starts to increase. I have shown an example below:
Epoch 15/800
1562/1562 [==============================] - 49s - loss: 0.9050 - acc: 0.6827 - val_loss: 0.7667 - val_acc: 0.7323
Epoch 16/800
1562/1562 [==============================] - 49s - loss: 0.8906 - acc: 0.6864 - val_loss: 0.7404 - val_acc: 0.7434
Epoch 380/800
1562/1562 [==============================] - 49s - loss: 1.5519 - acc: 0.4880 - val_loss: 1.4250 - val_acc: 0.5233
Epoch 381/800
1562/1562 [==============================] - 48s - loss: 1.5416 - acc: 0.4897 - val_loss: 1.5032 - val_acc: 0.4868
Epoch 800/800
1562/1562 [==============================] - 49s - loss: 1.8483 - acc: 0.3402 - val_loss: 1.9454 - val_acc: 0.2398

I have tried this on different cifar10 architectures I have found on githubs. I am training this on a GPU Titan-X Pascal. This only happens when I train the network in batches and with data augmentation. I have changed the optimizer, the initial learning rate etc. I have also attached a link to the code. I just want a cifar10 model with good enough accuracy for my tests, so any help will be appreciated. The code is from this:
https://github.com/fchollet/keras/blob/master/examples/cifar10_cnn.py

mahnerak · 2017-08-11T15:58:34Z

I believe that you have tried different optimizers, but please try raw SGD with smaller initial learning rate.
Most likely the optimizer gains high momentum and continues to move along wrong direction since some moment.

ktiwary2 · 2017-08-11T16:03:30Z

So something like this?
lrate = 0.001
decay = lrate/epochs
sgd = SGD(lr=lrate, momentum=0.90, decay=decay, nesterov=False)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])

mahnerak · 2017-08-11T16:06:04Z

No, without any momentum and decay, just a raw SGD.

model.compile(loss='categorical_crossentropy', optimizer='SGD', metrics=['accuracy'])

ktiwary2 · 2017-08-11T23:14:08Z

Thanks, that works. I was wondering if you know why that is?

mahnerak · 2017-08-12T06:51:38Z

Look, when using raw SGD, you pick a gradient of loss function w.r.t. parameters (the direction which increases function value) and go to opposite direction little bit (in order to minimize the loss function).
There are different optimizers built on top of SGD using some ideas (momentum, learning rate decay, etc...) to make convergence faster.
If you look how momentum works, you'll understand where's the problem. In the beginning, the optimizer may go in same direction (not wrong) some long time, which will cause very big momentum. Then the opposite direction of gradient may not match with momentum causing optimizer "climb hills" (get higher loss values) some time, but it may eventually fix himself.
(I encourage you to see how momentum works)
https://en.wikipedia.org/wiki/Stochastic_gradient_descent#Momentum

ktiwary2 · 2017-08-12T16:23:43Z

Ok, I will definitely keep this in mind in the future. Thanks for the help.

fatemaaa · 2018-11-20T14:54:16Z

Hello,
I'm using CNN for regression and I'm using MAE metric to evaluate the performance of the model. But I noted that the Loss, Val_loss, Mean absolute value and Val_Mean absolute value are not changed after some epochs.

vjbharani · 2019-01-04T01:24:02Z

My loss was at 0.05 but after some epoch it went up to 15 , even with a raw SGD. High epoch dint effect with Adam but only with SGD optimiser.
Pls help

kouohhashi · 2019-02-12T01:12:19Z

@mahnerak
Hi thank you for your explanation. I experienced similar problem.

BTW, I have an question about "but it may eventually fix himself".
Does it mean loss can start going down again after many more epochs even with momentum, at least theoretically?

Thanks in advance.

mahnerak · 2019-02-12T08:55:56Z

Hi @kouohhashi,
I suggest you reading Distill publication: https://distill.pub/2017/momentum/

Authors mention "It is possible, however, to construct very specific counterexamples where momentum does not converge, even on convex functions."
Please also take a look https://arxiv.org/abs/1408.3595 for more details.

programehr · 2019-04-05T17:18:42Z

Look, when using raw SGD, you pick a gradient of loss function w.r.t. parameters (the direction which increases function value) and go to opposite direction little bit (in order to minimize the loss function).
There are different optimizers built on top of SGD using some ideas (momentum, learning rate decay, etc...) to make convergence faster.
If you look how momentum works, you'll understand where's the problem. In the beginning, the optimizer may go in same direction (not wrong) some long time, which will cause very big momentum. Then the opposite direction of gradient may not match with momentum causing optimizer "climb hills" (get higher loss values) some time, but it may eventually fix himself.
(I encourage you to see how momentum works)
https://en.wikipedia.org/wiki/Stochastic_gradient_descent#Momentum

Are you suggesting that momentum be removed altogether or for troubleshooting? If you mean the latter how should one use momentum after debugging?
Thanks.

erolgerceker · 2020-09-09T06:50:18Z

increase the batch-size. and be aware of the memory

mayank010698 · 2021-11-01T05:42:53Z

@erolgerceker how does increasing the batch size help with Adam ?

ktiwary2 closed this as completed Aug 12, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loss Increases after some epochs #7603

Loss Increases after some epochs #7603

ktiwary2 commented Aug 11, 2017

mahnerak commented Aug 11, 2017 •

edited

Loading

ktiwary2 commented Aug 11, 2017

mahnerak commented Aug 11, 2017

ktiwary2 commented Aug 11, 2017

mahnerak commented Aug 12, 2017 •

edited

Loading

ktiwary2 commented Aug 12, 2017

fatemaaa commented Nov 20, 2018

vjbharani commented Jan 4, 2019 •

edited

Loading

kouohhashi commented Feb 12, 2019

mahnerak commented Feb 12, 2019

programehr commented Apr 5, 2019

erolgerceker commented Sep 9, 2020

mayank010698 commented Nov 1, 2021

Loss Increases after some epochs #7603

Loss Increases after some epochs #7603

Comments

ktiwary2 commented Aug 11, 2017

mahnerak commented Aug 11, 2017 • edited Loading

ktiwary2 commented Aug 11, 2017

mahnerak commented Aug 11, 2017

ktiwary2 commented Aug 11, 2017

mahnerak commented Aug 12, 2017 • edited Loading

ktiwary2 commented Aug 12, 2017

fatemaaa commented Nov 20, 2018

vjbharani commented Jan 4, 2019 • edited Loading

kouohhashi commented Feb 12, 2019

mahnerak commented Feb 12, 2019

programehr commented Apr 5, 2019

erolgerceker commented Sep 9, 2020

mayank010698 commented Nov 1, 2021

mahnerak commented Aug 11, 2017 •

edited

Loading

mahnerak commented Aug 12, 2017 •

edited

Loading

vjbharani commented Jan 4, 2019 •

edited

Loading