Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loss Increases after some epochs #7603

Closed
ktiwary2 opened this issue Aug 11, 2017 · 13 comments
Closed

Loss Increases after some epochs #7603

ktiwary2 opened this issue Aug 11, 2017 · 13 comments

Comments

@ktiwary2
Copy link

I have tried different convolutional neural network codes and I am running into a similar issue. The network starts out training well and decreases the loss but after sometime the loss just starts to increase. I have shown an example below:
Epoch 15/800
1562/1562 [==============================] - 49s - loss: 0.9050 - acc: 0.6827 - val_loss: 0.7667 - val_acc: 0.7323
Epoch 16/800
1562/1562 [==============================] - 49s - loss: 0.8906 - acc: 0.6864 - val_loss: 0.7404 - val_acc: 0.7434
Epoch 380/800
1562/1562 [==============================] - 49s - loss: 1.5519 - acc: 0.4880 - val_loss: 1.4250 - val_acc: 0.5233
Epoch 381/800
1562/1562 [==============================] - 48s - loss: 1.5416 - acc: 0.4897 - val_loss: 1.5032 - val_acc: 0.4868
Epoch 800/800
1562/1562 [==============================] - 49s - loss: 1.8483 - acc: 0.3402 - val_loss: 1.9454 - val_acc: 0.2398

I have tried this on different cifar10 architectures I have found on githubs. I am training this on a GPU Titan-X Pascal. This only happens when I train the network in batches and with data augmentation. I have changed the optimizer, the initial learning rate etc. I have also attached a link to the code. I just want a cifar10 model with good enough accuracy for my tests, so any help will be appreciated. The code is from this:
https://github.com/fchollet/keras/blob/master/examples/cifar10_cnn.py

@mahnerak
Copy link

mahnerak commented Aug 11, 2017

I believe that you have tried different optimizers, but please try raw SGD with smaller initial learning rate.
Most likely the optimizer gains high momentum and continues to move along wrong direction since some moment.

@ktiwary2
Copy link
Author

So something like this?
lrate = 0.001
decay = lrate/epochs
sgd = SGD(lr=lrate, momentum=0.90, decay=decay, nesterov=False)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])

@mahnerak
Copy link

No, without any momentum and decay, just a raw SGD.

model.compile(loss='categorical_crossentropy', optimizer='SGD', metrics=['accuracy'])

@ktiwary2
Copy link
Author

Thanks, that works. I was wondering if you know why that is?

@mahnerak
Copy link

mahnerak commented Aug 12, 2017

Look, when using raw SGD, you pick a gradient of loss function w.r.t. parameters (the direction which increases function value) and go to opposite direction little bit (in order to minimize the loss function).
There are different optimizers built on top of SGD using some ideas (momentum, learning rate decay, etc...) to make convergence faster.
If you look how momentum works, you'll understand where's the problem. In the beginning, the optimizer may go in same direction (not wrong) some long time, which will cause very big momentum. Then the opposite direction of gradient may not match with momentum causing optimizer "climb hills" (get higher loss values) some time, but it may eventually fix himself.
(I encourage you to see how momentum works)
https://en.wikipedia.org/wiki/Stochastic_gradient_descent#Momentum

@ktiwary2
Copy link
Author

Ok, I will definitely keep this in mind in the future. Thanks for the help.

@fatemaaa
Copy link

Hello,
I'm using CNN for regression and I'm using MAE metric to evaluate the performance of the model. But I noted that the Loss, Val_loss, Mean absolute value and Val_Mean absolute value are not changed after some epochs.

@vjbharani
Copy link

vjbharani commented Jan 4, 2019

My loss was at 0.05 but after some epoch it went up to 15 , even with a raw SGD. High epoch dint effect with Adam but only with SGD optimiser.
Pls help

@kouohhashi
Copy link

@mahnerak
Hi thank you for your explanation. I experienced similar problem.

BTW, I have an question about "but it may eventually fix himself".
Does it mean loss can start going down again after many more epochs even with momentum, at least theoretically?

Thanks in advance.

@mahnerak
Copy link

Hi @kouohhashi,
I suggest you reading Distill publication: https://distill.pub/2017/momentum/

Authors mention "It is possible, however, to construct very specific counterexamples where momentum does not converge, even on convex functions."
Please also take a look https://arxiv.org/abs/1408.3595 for more details.

@programehr
Copy link

Look, when using raw SGD, you pick a gradient of loss function w.r.t. parameters (the direction which increases function value) and go to opposite direction little bit (in order to minimize the loss function).
There are different optimizers built on top of SGD using some ideas (momentum, learning rate decay, etc...) to make convergence faster.
If you look how momentum works, you'll understand where's the problem. In the beginning, the optimizer may go in same direction (not wrong) some long time, which will cause very big momentum. Then the opposite direction of gradient may not match with momentum causing optimizer "climb hills" (get higher loss values) some time, but it may eventually fix himself.
(I encourage you to see how momentum works)
https://en.wikipedia.org/wiki/Stochastic_gradient_descent#Momentum

Are you suggesting that momentum be removed altogether or for troubleshooting? If you mean the latter how should one use momentum after debugging?
Thanks.

@erolgerceker
Copy link

increase the batch-size. and be aware of the memory

@mayank010698
Copy link

@erolgerceker how does increasing the batch size help with Adam ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants