-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Floating Point Exception #740
Comments
@wubaorong Hi, I am getting the same error, did you have any chance to solve it? |
@karaspd I'm sorry ,I still don't get the solution. |
@wubaorong Hi, I finally find the issue, in my case it was the train prototxt! (strangely I do not see any difference inside the prototxt after I updated it) However, I did test with pascal-voc first and it was working, but I had issues with the new dataset. I checked all the annotation files, even modified the codes for loading dataset and at the end it was not about data. |
@karaspd Even I train faster_rcnn with VOC2007 dataset,I got same error,do you know the reason? |
@karaspd When I use train the train_net.py to train the model,the following is result : |
@wubaorong may I see your train.prototxt file? |
@karaspd I want to train the faster rcnn. So I don't alter any file, I download the train.prototxt file on the github directly.The following is my train.prototxt : #========= conv1-conv5 ============ layer { #========= RPN ============ layer { #layer { name: "rpn_conv/3x3"type: "Convolution"bottom: "conv5"top: "rpn_conv/3x3"param { lr_mult: 1.0 }param { lr_mult: 2.0 }convolution_param {num_output: 192kernel_size: 3 pad: 1 stride: 1weight_filler { type: "gaussian" std: 0.01 }bias_filler { type: "constant" value: 0 }}#} name: "rpn_conv/5x5"type: "Convolution"bottom: "conv5"top: "rpn_conv/5x5"param { lr_mult: 1.0 }param { lr_mult: 2.0 }convolution_param {num_output: 64kernel_size: 5 pad: 2 stride: 1weight_filler { type: "gaussian" std: 0.0036 }bias_filler { type: "constant" value: 0 }}#} name: "rpn/output"type: "Concat"bottom: "rpn_conv/3x3"bottom: "rpn_conv/5x5"top: "rpn/output"#} name: "rpn_relu/output"type: "ReLU"bottom: "rpn/output"top: "rpn/output"#} layer { #========= RoI Proposal ============ layer { top: 'rpn_scores'python_param { name: 'debug-data'type: 'Python'bottom: 'data'bottom: 'rpn_rois'bottom: 'rpn_scores'python_param {module: 'rpn.debug_layer'layer: 'RPNDebugLayer'}#} #========= RCNN ============ layer { |
@wubaorong are you trying to retrain ZF model from pretrained parameters with pascal-voc? What is the command you use to train? |
@wubaorong check this issue as well #65. See if any of the solution works in your case. |
@karaspd Thank you very much. I have solved this problem with your help. Now I can train the net, but when I use ZF net to train faster rcnn end2end, in log folder without log file, but if I train faster rcnn alt_opt, the log file could autosaved. Do you know the reasons? |
@karaspd I have solved the log problem. I have a new question, if I want to use the model that have iterated 10000 times to continue the train, Where I should to alter? Is that the faster_rcnn_end2end.sh file? |
@wubaorong yes you need to change faster_rcnn_end2end.sh if you are training with faster rcnn. If you want to continue training after some snapshot you had before, you can use |
@karaspd I change the faster_rcnn_end2end.sh as you said, but I met a error: |
Check my comment here |
@wubaorong @karaspd @meetshah1995 @Dectinc I0312 16:25:25.883342 2983 sgd_solver.cpp:106] Iteration 0, lr = 0.0005 I try to change lr from 0.001 to 0.0005,but it didn't work.I also change RNG_SEED,and it also didn't work. |
@zqdeepbluesky You can try to change lr smaller,like 0.0001.I solve my problem by reducing lr. |
Have anyone solved the problem? I get the same error at iteration 5800 while using the learning rate at 0.001 and at iteration 18800 while using 0.0001..If someone have solved the problem, please help me to solve it. |
I have met a similar problem, try reducing the "__C.TRAIN.RPN_BATCHSIZE" in /lib/fast-rcnn/config.py , maybe it will work.
…----- 原始邮件 -----
发件人: "amlandas78" <[email protected]>
收件人: "rbgirshick/py-faster-rcnn" <[email protected]>
抄送: "Subscribed" <[email protected]>
发送时间: 星期四, 2018年 5 月 10日 下午 6:25:54
主题: Re: [rbgirshick/py-faster-rcnn] Floating Point Exception (#740)
Have anyone solved the problem? I get the same error at iteration 5800 while using the learning rate at 0.001 and at iteration 18800 while using 0.0001..If someone have solved the problem, please help me to solve it.
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
#740 (comment)
|
when I train the network,it broken down after print one log:
/home/wu/faster_rcnn/py-faster-rcnn/tools/../lib/fast_rcnn/bbox_transform.py:48: RuntimeWarning: overflow encountered in exp
pred_w = np.exp(dw) * widths[:, np.newaxis]
/home/wu/faster_rcnn/py-faster-rcnn/tools/../lib/fast_rcnn/bbox_transform.py:48: RuntimeWarning: overflow encountered in multiply
pred_w = np.exp(dw) * widths[:, np.newaxis]
/home/wu/faster_rcnn/py-faster-rcnn/tools/../lib/fast_rcnn/bbox_transform.py:49: RuntimeWarning: overflow encountered in exp
pred_h = np.exp(dh) * heights[:, np.newaxis]
/home/wu/faster_rcnn/py-faster-rcnn/tools/../lib/fast_rcnn/bbox_transform.py:49: RuntimeWarning: overflow encountered in multiply
pred_h = np.exp(dh) * heights[:, np.newaxis]
/home/wu/faster_rcnn/py-faster-rcnn/tools/../lib/rpn/proposal_layer.py:175: RuntimeWarning: invalid value encountered in greater_equal
keep = np.where((ws >= min_size) & (hs >= min_size))[0]
Floating Point Exception
The text was updated successfully, but these errors were encountered: