At training the loss bbox_loss is always zero #266

cyberdecker · 2016-07-20T14:55:22Z

I'm trying to train a ZF network for a custom dataset (following instructions from here), using the command:

./tools/train_faster_rcnn_alt_opt.py --gpu 0 \
    --net_name custom --weights data/imagenet_models/ZF.v2.caffemodel \
    --imdb custom_train --cfg config.yml

In the RPN stage, the loss seems ok, but when it arrive at the stage 1 of Fast R-CNN model, the loss is NAN:

I0720 10:42:34.174381  3892 solver.cpp:228] Iteration 0, loss = nan
I0720 10:42:34.174423  3892 solver.cpp:244]     Train net output #0: bbox_loss = nan (* 1 = nan loss)
I0720 10:42:34.174432  3892 solver.cpp:244]     Train net output #1: cls_loss = 1.04477 (* 1 = 1.04477 loss)
I0720 10:42:34.174438  3892 sgd_solver.cpp:106] Iteration 0, lr = 0.0001
I0720 10:42:42.240824  3892 solver.cpp:228] Iteration 20, loss = nan
I0720 10:42:42.240862  3892 solver.cpp:244]     Train net output #0: bbox_loss = nan (* 1 = nan loss)
I0720 10:42:42.240870  3892 solver.cpp:244]     Train net output #1: cls_loss = 0.130728 (* 1 = 0.130728 loss)

What it means? Is the network learning something or not working?

The text was updated successfully, but these errors were encountered:

cyberdecker · 2016-07-21T19:04:21Z

After figuring out the NAN, I adjusted the learning rate and also a smaller dataset to be sure if the network is learning anything. At least I don't have anymore the NAN values in loss. But I keep having the loss of bbox_loss at zero:

I0721 15:41:54.542963  2262 sgd_solver.cpp:106] Iteration 2420, lr = 0.001
I0721 15:41:56.630101  2262 solver.cpp:228] Iteration 2440, loss = 0.139153
I0721 15:41:56.630146  2262 solver.cpp:244]     Train net output #0: bbox_loss = 0 (* 1 = 0 loss)
I0721 15:41:56.630153  2262 solver.cpp:244]     Train net output #1: cls_loss = 0.139153 (* 1 = 0.139153 loss)
I0721 15:41:56.630161  2262 sgd_solver.cpp:106] Iteration 2440, lr = 0.001
I0721 15:41:58.699415  2262 solver.cpp:228] Iteration 2460, loss = 0.13915
I0721 15:41:58.699458  2262 solver.cpp:244]     Train net output #0: bbox_loss = 0 (* 1 = 0 loss)
I0721 15:41:58.699466  2262 solver.cpp:244]     Train net output #1: cls_loss = 0.13915 (* 1 = 0.13915 loss)
I0721 15:41:58.699473  2262 sgd_solver.cpp:106] Iteration 2460, lr = 0.001
I0721 15:42:00.765265  2262 solver.cpp:228] Iteration 2480, loss = 0.139147
I0721 15:42:00.765313  2262 solver.cpp:244]     Train net output #0: bbox_loss = 0 (* 1 = 0 loss)
I0721 15:42:00.765321  2262 solver.cpp:244]     Train net output #1: cls_loss = 0.139147 (* 1 = 0.139147 loss)

The loss of bbox should be different of zero, right?

maxteleg · 2016-08-25T06:02:51Z

Hi I've met the same issue here. Do you figure out how to solve this? Is there anything to do with the image dataset or the annotation? Thank you!

cyberdecker · 2016-08-26T22:57:55Z

Hi,
I have padded the images, because only cropped images get this issue for some reason. I think is maybe the coordinates.
So you need to pad the images before the training.

maxteleg · 2016-08-29T05:11:38Z

Thank you! I solved this. There are some negative bbox values in my dataset. I just get rid of those data and everything works fine now.

Mato98 · 2016-11-29T19:41:55Z

Hi, what is meant by "pad the images" ? During my training the values for bbox_loss and rpn_loss_bbox are always 0. I tried different datasets but no change. Do these values have to be different from 0?

Results with this model are not satisfying, very big boxes und low scores for classes!

any advices?

jinyu121 · 2017-01-08T13:19:34Z

Hi~ I have the same problem, or even worse.

Because I can not use GPU, I setup the environment following this blog and this blog , and run the demo using ./experiments/scripts/faster_rcnn_alt_opt.sh 0 VGG16 pascal_voc (You can ignore the GPU_ID = 0)

This is the training log:

Solving...
I0108 19:52:36.975473  3821 solver.cpp:229] Iteration 0, loss = 1.35416
I0108 19:52:36.975520  3821 solver.cpp:245]     Train net output #0: rpn_cls_loss = 0.725054 (* 1 = 0.725054 loss)
I0108 19:52:36.975533  3821 solver.cpp:245]     Train net output #1: rpn_loss_bbox = 0.629103 (* 1 = 0.629103 loss)
I0108 19:52:36.975540  3821 sgd_solver.cpp:106] Iteration 0, lr = 0.001
I0108 20:02:41.254108  3821 solver.cpp:229] Iteration 20, loss = -nan
I0108 20:02:41.254150  3821 solver.cpp:245]     Train net output #0: rpn_cls_loss = -nan (* 1 = -nan loss)
I0108 20:02:41.254163  3821 solver.cpp:245]     Train net output #1: rpn_loss_bbox = -nan (* 1 = -nan loss)
I0108 20:02:41.254170  3821 sgd_solver.cpp:106] Iteration 20, lr = 0.001
I0108 20:12:35.014883  3821 solver.cpp:229] Iteration 40, loss = -nan
I0108 20:12:35.014927  3821 solver.cpp:245]     Train net output #0: rpn_cls_loss = -nan (* 1 = -nan loss)
I0108 20:12:35.014940  3821 solver.cpp:245]     Train net output #1: rpn_loss_bbox = -nan (* 1 = -nan loss)

So, how can I fix it? Is there anything relation to the CPU MOD files?

UPDATE: I changed the base_lr to 0.00001 (1e-5), and it worked.

Soda-Wong · 2017-03-23T11:38:56Z

Hi~
@jinyu121 @Max-intel @cyberdecker @Mato98
I meet the seem problem too.My results and AP is zero. I use the voc2007 image data so I think there is
no problem with "pad the images".And why is this helpful to change the base_lr to 0.00001 (1e-5)?
Any advices?thanks!

jinyu121 · 2017-03-24T12:18:43Z

If you are using CPU mode, please use this pull request.

Soda-Wong · 2017-03-25T03:49:03Z

@jinyu121 thank you~
but I am using the GPU mode,and I changed the base_lr to 1e-5,now the result are not 0 but like 0.0004.

nmahesh01 · 2017-05-11T00:17:09Z

@Mato98 @Soda-Wong Hi, i am facing the exact same issue with my bbox_loss = 0. Even using a LR of 0.00005 is not changing it! Any ideas or solutions?
I am using VOC2007 + VOC2012 training data

starxhong · 2018-03-09T11:07:22Z

Have anybody solved the problem? I Meet this issue when i run a R-FCN trainning with OHEM, loss_bbox = 0 (* 1 = 0 loss) from the very first iteration. However, when i run code without OHEM, everything is ok and i got a MAP of 0.78. So i'm sure there is nothing to do with my data annotation. WHAT else can cause this?

mukeshmithrakumar · 2018-07-28T00:15:28Z

Having the same issue, kinda got it down to the anchor boxes. the IoU between the anchor boxes and the ground truth is always zero so it always gives a negative, hence everything ends up zero. Still looking for a solution, will appreciate if you guys found one @VersionHX

starxhong · 2018-07-28T06:15:35Z

@mukeshmithrakumar yeah i have solved the problem with the solution here. It seems to be some version conflict of Numpy.

mukeshmithrakumar · 2018-07-29T04:41:25Z

Thanks @VersionHX

cyberdecker changed the title ~~NAN value in loss at training fast rcnn stage~~ At training the loss bbox_loss is always zero Jul 21, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

At training the loss bbox_loss is always zero #266

At training the loss bbox_loss is always zero #266

cyberdecker commented Jul 20, 2016

cyberdecker commented Jul 21, 2016

maxteleg commented Aug 25, 2016

cyberdecker commented Aug 26, 2016

maxteleg commented Aug 29, 2016

Mato98 commented Nov 29, 2016 •

edited

Loading

jinyu121 commented Jan 8, 2017 •

edited

Loading

Soda-Wong commented Mar 23, 2017 •

edited

Loading

jinyu121 commented Mar 24, 2017

Soda-Wong commented Mar 25, 2017

nmahesh01 commented May 11, 2017

starxhong commented Mar 9, 2018

mukeshmithrakumar commented Jul 28, 2018

starxhong commented Jul 28, 2018

mukeshmithrakumar commented Jul 29, 2018

At training the loss bbox_loss is always zero #266

At training the loss bbox_loss is always zero #266

Comments

cyberdecker commented Jul 20, 2016

cyberdecker commented Jul 21, 2016

maxteleg commented Aug 25, 2016

cyberdecker commented Aug 26, 2016

maxteleg commented Aug 29, 2016

Mato98 commented Nov 29, 2016 • edited Loading

jinyu121 commented Jan 8, 2017 • edited Loading

Soda-Wong commented Mar 23, 2017 • edited Loading

jinyu121 commented Mar 24, 2017

Soda-Wong commented Mar 25, 2017

nmahesh01 commented May 11, 2017

starxhong commented Mar 9, 2018

mukeshmithrakumar commented Jul 28, 2018

starxhong commented Jul 28, 2018

mukeshmithrakumar commented Jul 29, 2018

Mato98 commented Nov 29, 2016 •

edited

Loading

jinyu121 commented Jan 8, 2017 •

edited

Loading

Soda-Wong commented Mar 23, 2017 •

edited

Loading