-
Notifications
You must be signed in to change notification settings - Fork 19.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loss Increases after some epochs #7603
Comments
I believe that you have tried different optimizers, but please try raw SGD with smaller initial learning rate. |
So something like this? |
No, without any momentum and decay, just a raw SGD. model.compile(loss='categorical_crossentropy', optimizer='SGD', metrics=['accuracy']) |
Thanks, that works. I was wondering if you know why that is? |
Look, when using raw SGD, you pick a gradient of loss function w.r.t. parameters (the direction which increases function value) and go to opposite direction little bit (in order to minimize the loss function). |
Ok, I will definitely keep this in mind in the future. Thanks for the help. |
Hello, |
My loss was at 0.05 but after some epoch it went up to 15 , even with a raw SGD. High epoch dint effect with Adam but only with SGD optimiser. |
@mahnerak BTW, I have an question about "but it may eventually fix himself". Thanks in advance. |
Hi @kouohhashi, Authors mention "It is possible, however, to construct very specific counterexamples where momentum does not converge, even on convex functions." |
Are you suggesting that momentum be removed altogether or for troubleshooting? If you mean the latter how should one use momentum after debugging? |
increase the batch-size. and be aware of the memory |
@erolgerceker how does increasing the batch size help with Adam ? |
I have tried different convolutional neural network codes and I am running into a similar issue. The network starts out training well and decreases the loss but after sometime the loss just starts to increase. I have shown an example below:
Epoch 15/800
1562/1562 [==============================] - 49s - loss: 0.9050 - acc: 0.6827 - val_loss: 0.7667 - val_acc: 0.7323
Epoch 16/800
1562/1562 [==============================] - 49s - loss: 0.8906 - acc: 0.6864 - val_loss: 0.7404 - val_acc: 0.7434
Epoch 380/800
1562/1562 [==============================] - 49s - loss: 1.5519 - acc: 0.4880 - val_loss: 1.4250 - val_acc: 0.5233
Epoch 381/800
1562/1562 [==============================] - 48s - loss: 1.5416 - acc: 0.4897 - val_loss: 1.5032 - val_acc: 0.4868
Epoch 800/800
1562/1562 [==============================] - 49s - loss: 1.8483 - acc: 0.3402 - val_loss: 1.9454 - val_acc: 0.2398
I have tried this on different cifar10 architectures I have found on githubs. I am training this on a GPU Titan-X Pascal. This only happens when I train the network in batches and with data augmentation. I have changed the optimizer, the initial learning rate etc. I have also attached a link to the code. I just want a cifar10 model with good enough accuracy for my tests, so any help will be appreciated. The code is from this:
https://github.com/fchollet/keras/blob/master/examples/cifar10_cnn.py
The text was updated successfully, but these errors were encountered: