-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Accuracy Layer on GPU interferes with training #5981
Comments
I can reproduce this behavior for the GPU implementation. CPU does not seem to be affected. I'll be looking into this today; in the meantime, could you take a look too, @shaibagon? EDIT: My suspicion after half an hour of tinkering with this: could it be that this memory actually is used for something? That is, we use it as a temporary memory but Caffe actually does propagate back from here? |
Removing the However, forcing: |
@Noiredd if you set |
@shaibagon Of course, this was just to prove that the problem is indeed there. I can come up with a fix in a while - unless you want to take it from here? Since you fathered this PR ;) |
@Noiredd if it is okay with you, I'd appreciate if you can take it from here. I am not as available as I used to be for caffe :( |
@vlomonaco Check PR #5987 - does it solve the issue for you? |
Hi @Noiredd thank you for the fix in less than 24hrs! It works! |
Hi, I have the exact same problem, somehow @Noiredd 's fix didn't work for me. Besides, I have my Accuracy layer only for the Test phase. I don't know why I am having this problem. My batchsize is not small and I have enough space, which rules out other reasons I have come across. |
Issue summary
Using the "Accuracy" layer in the "Training net" on GPU breaks the training. The layer somehow interferes with the gradient. Loss explodes quickly and Train/Test Accuracies stall to 1.
Steps to reproduce
My system configuration
Operating system: Ubuntu 14.04.5 LTS
CUDA version (if applicable): 7.0
CUDNN version (if applicable): 4.7
BLAS: libblas.so.3
Python version: 3.5
How to fix it
All these three solutions can fix it:
The text was updated successfully, but these errors were encountered: