-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loss NAN #15
Comments
It seems there were some nan out from first conv:
What could be the reason? |
Check if the object labels are all correct, and whether the initial weight file match the proto file(for example, the proto maybe merge BN layers to convs). |
@makefile Thanks for your reply, same data I have trained on zf backbone, it's ok at least for 120000 iterations. From I can notice, it's still have BatchNorm layer, pretrained models simply from hekaiming's repo of resnet50. Also, I found this could also cause Nan output of BatchNorm layer. I try fix but no luck. It's still got Nan after several iterations. Model start like this:
|
It's hard to figure out the exact problem according to your information. Please check again the params setting or the proto definition or any other things. |
I am training only with VOC, what I mean is that, does rfcn-res50 or any other proto with resnets is OK to train? Have u tested with those configurations? |
It is ok to train with the |
Just got loss nan on fasterrcnn with resnet50, with VOC data, not sure which reason for this. |
Can you paste more snippet here(or in pastebin) for analysis, such as config,data label and proto. |
For sure, start with train.sh:
Solver.proto:
The model is resnet50 with fastercnn and I haven't change anything, voc_config.json:
Also haven't change much. |
There seems have no problem, I cannot figure it out either. You can also try with D-X-Y's repo, since my repo is based on his and changed some code. |
Try to train on Resnet50 FasterRCNN on VOC, got nan loss in the begin of training process:
The text was updated successfully, but these errors were encountered: