-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
loss nan #39
Comments
Hi, @liyangliu , could you share your training command and training log? I did not encounter this problem. |
python trainval_net.py |
Hi, I met this problem several times before. But it was solved when I run it again without any setting changes. |
@liyangliu I think we might have slightly different initializations. If you encounter this again, one way is to clamp the gradient for res101 as well by comment this line. @gyxoned, have you successfully trained the model and get similar performance as reported in our tables? |
@jwyang Yes, I have trained resnet101 on coco successfully, and the performance is similar with reported. |
@gyxoned sounds great! |
modify these 4 lines!delete “- 1”!
according to http://caffecn.cn/?/question/1055 https://stackoverflow.com/questions/38513739/warning-during-py-faster-rcnn-training-on-custom-datasets |
In my case, clamp the gradient for res101 is the correct solution of nan loss.
|
should we delete "-1" on the experiments on VOC? |
Hello @jwyang, I downloaded your pytorch faster rcnn yesterday, only change the coco 2014 dataset path to my local one and trained exactly the same setting as you (large image scale, lr = 0.01, 2 images per gpu and 8 gpus, res101, using caffe pretrained models given by you) but got NAN loss after a few iteration. Have you come across this problem? The loss will not be NAN if I set class_agnostic=True. Can you please help me a little? Thanks.
The text was updated successfully, but these errors were encountered: