Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Floating point exception #159

Closed
morusu opened this issue Apr 26, 2016 · 20 comments
Closed

Floating point exception #159

morusu opened this issue Apr 26, 2016 · 20 comments

Comments

@morusu
Copy link

morusu commented Apr 26, 2016

after thousands iterations, faster-rcnn throw a error "Floating point exception " at ./experiments/scripts/faster_rcnn_end2end.sh . I search the error saying about i/0 or i%0, anyone encountered this?

@wait1988
Copy link

I encountered a similar problem.

Solving...
I0428 15:05:27.513572 6443 solver.cpp:242] Iteration 0, loss = 4.65389
I0428 15:05:27.513619 6443 solver.cpp:258] Train net output #0: loss_bbox = 0.190101 (* 1 = 0.190101 loss)
I0428 15:05:27.513628 6443 solver.cpp:258] Train net output #1: loss_cls = 3.44897 (* 1 = 3.44897 loss)
I0428 15:05:27.513635 6443 solver.cpp:258] Train net output #2: rpn_cls_loss = 0.900724 (* 1 = 0.900724 loss)
I0428 15:05:27.513643 6443 solver.cpp:258] Train net output #3: rpn_loss_bbox = 0.119607 (* 1 = 0.119607 loss)
I0428 15:05:27.513656 6443 solver.cpp:571] Iteration 0, lr = 0.001
./experiments/scripts/faster_rcnn_end2end.sh: line 57: 6443 Floating point exception(core dumped) ./tools/train_net.py --gpu ${GPU_ID} --solver models/${PT_DIR}/${NET}/faster_rcnn_end2end/solver.prototxt --weights data/imagenet_models/${NET}.v2.caffemodel --imdb ${TRAIN_IMDB} --iters ${ITERS} --cfg experiments/cfgs/faster_rcnn_end2end.yml ${EXTRA_ARGS}

@weichengkuo
Copy link

weichengkuo commented May 4, 2016

I got the same error and it turned out that I was feeding in empty boxes array. Filtering out roidb properly fixed my problem.

@wait1988
Copy link

wait1988 commented May 4, 2016

what does "filtering out roidb properly" mean?Would you please give us more details?

@smasoudn
Copy link

smasoudn commented May 9, 2016

I've got the same error. By changing the RNG_SEED default value I get error in different iterations. Have you guys found the solution yet? @weichengkuo , I would be thankful if you please elaborate a little bit more. Where should I filter the empty boxes? Thanks!

@smichalowski
Copy link

smichalowski commented May 9, 2016

take a look #65

@weichengkuo
Copy link

It's possible that some layer of your faster RCNN receive no boxes at some iteration. I ran into this error multiple times and it's often due to empty boxes. Filtering roidb means to remove the roidb elements that could cause this problem.

@daf11865
Copy link

how to solve, please?

@morusu
Copy link
Author

morusu commented Jun 2, 2016

pad 0 the original image to reasonable aspect ratio (600*1000) will solve this problem.

@morusu morusu closed this as completed Jun 2, 2016
@LiberiFatali
Copy link

@morusu So where do we need to modify to 'pad 0s the original image' ?

@buaaliyi
Copy link

buaaliyi commented Jun 20, 2016

How to fix the code to do 'pad 0 the original image', or still need to preprocess the images first?.
Can you give us an example? Thanks

@morusu
Copy link
Author

morusu commented Jun 22, 2016

@buaaliyi @LiberiFatali preprocess the images first, pad 0 to images' right-side or down-side to reasonable aspect ratio will be fine.

@LiberiFatali
Copy link

I got this error while using old code. This problem is solved for me by applying

def filter_roidb(roidb):
"""Remove roidb entries that have no usable RoIs."""

in https://github.com/rbgirshick/py-faster-rcnn/blob/d66cc2bff142ca07f521db06ca3e9e10dbc8df20/lib/fast_rcnn/train.py

@vra
Copy link

vra commented Nov 18, 2016

@LiberiFatali Thanks, your solution solved my problem!

@fernandorovai
Copy link

fernandorovai commented Dec 14, 2016

@vra Where did you apply the filter_roidb function? It is already called in train_net() function (fast_rcnn/train.py). I am facing the same problem as @morusu described. Suddenly my loss goes to nan (overflow encountered in exp). I am using PascalVoc dataset and have no clue about the problem. Anyone solved this issue? Thank you!

@vra
Copy link

vra commented Dec 15, 2016

Hi @fernandorovai ,
Sorry I should make it more clearly. I am using RstarCNN, which uses rgb's fast_rcnn reop in it. In fast-rcnn, there is no filter_roidb function. When I added this function in it, my problem solved.
Did you try to descend your learning rate? As far as I known, the nan problem is always related to a large learning rate.

@June-Jo
Copy link

June-Jo commented Jan 11, 2017

@vra Hello, does it go well when you add the filter_roidb to train.py? In my case, there is the function of filter_roidb, but I have the problem of 'floating point exception'. I tried to change the learning rate and the RNG_SEED, but it does not go well.

@hanjf12
Copy link

hanjf12 commented Apr 11, 2017

@hyunjun-jo hello,I have the same problem,too.I tried to change the learning rate and the RNG_SEED,but it does not go well,too.Have you solved the problem? thx

@zqdeepbluesky
Copy link

@morusu @wait1988 @weichengkuo @smasoudn @smichalowski
hi,when I train FPN on my own dataset,I met error:

I0312 16:25:25.883342 2983 sgd_solver.cpp:106] Iteration 0, lr = 0.0005
/home/zq/FPN/tools/../lib/rpn/proposal_layer.py:175: RuntimeWarning: invalid value encountered in greater_equal
keep = np.where((ws >= min_size) & (hs >= min_size))[0]
Floating point exception (core dumped)

I try to change lr from 0.001 to 0.0001,but it didn't work.I also change RNG_SEED,and it also didn't work.
I don't know how to solve it.please help me,thanks so much!

@amlandas78
Copy link

Have anyone solved the problem? I get the same error at iteration 5800 while using the learning rate at 0.001 and at iteration 18800 while using 0.0001..If someone have solved the problem, please help me to solve it.

@st20080675
Copy link

st20080675 commented Oct 17, 2019

I have solved my 'Floating point exception (core dumped)' problem by modifying the function 'is_valid' in function 'filter_roidb' in file da-faster-rcnn-master/lib/fast_rcnn/train.py:

def filter_roidb(roidb):
"""Remove roidb entries that have no usable RoIs."""

def is_valid(entry):
    # Valid images have:
    #   (1) At least one foreground RoI OR
    #   (2) At least one background RoI
    overlaps = entry['max_overlaps']
    # added to handle empty boxes, see https://github.com/rbgirshick/py-faster-rcnn/issues/159
    not_empty = np.zeros(len(entry['max_overlaps']), dtype=bool)
    cur_boxes = entry['boxes']
    for i in range(len(not_empty)):
        if (cur_boxes[i][2] - cur_boxes[i][0] > 1 and cur_boxes[i][3] - cur_boxes[i][1] > 1):
            not_empty[i] = True

    # find boxes with sufficient overlap
    fg_inds = np.where(overlaps >= cfg.TRAIN.FG_THRESH)[0]
    # Select background RoIs as those within [BG_THRESH_LO, BG_THRESH_HI)
    bg_inds = np.where((overlaps < cfg.TRAIN.BG_THRESH_HI) &
                       (overlaps >= cfg.TRAIN.BG_THRESH_LO) & not_empty)[0]
                           
    # image is only valid if such boxes exist
    valid = len(fg_inds) > 0 or len(bg_inds) > 0
   
    return valid

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests