Training custom dataset #75

JavHaro · 2018-02-19T09:22:16Z

Hi @jwyang,
As i mentioned in a previous post I would like to train a Faster-R-CNN model (vgg16) with my own dataset. I have followed this post based on the code of Ross Girshick but adapting it to your implementation. Now I'm trying to adapt the network model to my dataset but i don't know what should i modify to do it? Do you have any idea that could guide me?
Thanks!!

PS: I know that there is a closed issue regarding this but i posted here just in case you don't follow up closed issues.

JavHaro · 2018-02-21T11:23:41Z

Hi @jwyang
To train the network with my dataset I have created the annotations according to pascal_voc template, modified pascal_voc.py and voc_eval.py to addapt to my dataset classes (only background and cherry) and i have modified factory.py. Now, based on the post commented i should modify the models but i think that in your implementation the model is isolated from the dataset, isn't it?.
I have tried to train the model but despite no error is shown the output is strange and the model don't detect a cherry.
output:

[session 1][epoch 1][iter 0] loss: 1.4996, lr: 1.00e-03
fg/bg=(26/998), time cost: 2.685305
rpn_cls: 0.6953, rpn_box: 0.0309, rcnn_cls: 0.7547, rcnn_box 0.0187
[session 1][epoch 1][iter 100] loss: nan, lr: 1.00e-03
fg/bg=(783/241), time cost: 175.604011
rpn_cls: nan, rpn_box: nan, rcnn_cls: nan, rcnn_box nan

Any idea?
Thanks!!

jwyang · 2018-02-21T17:31:06Z

Hi, @JavHaro , it seems that the training collapsed, According to your output, I find it is weird that fg/bg = (783/241) since the ratio between fg and bg numbers should be not that high if you did not change hyperparameters. So I would suggest that you go back to check whether you training data from you customized data loader is good or not.

JavHaro · 2018-02-22T07:15:41Z

Thanks @jwyang !
In this case i have changed ANCHOR_SCALES & ANCHOR_RATIO because the background and green cherries are quite similar and there are so few pixel per fruit (20x20 aprox) so i wanted to fit the bounding box as much as possible. In any case, i think that the output is weird so i will check the data loader.
I'll be back when i have news.
Thanks once again.

Edit: i forgot to mention that i changed batch_size =4.

JavHaro · 2018-02-22T15:26:20Z

Hi @jwyang ,
It may be a silly question but i'm a newborn in deep learning world (and also in python and torch). It is supposed that i have to do something special with the dataset? I mean, i have a tagged dataset (in pascal format) all the images with the same aspect ratio (i know is not necessary but it helps me analyse greater images by dividing them into a grid). I've not prepared the images by normalizing them or something (maybe i can reach better results if i extract the mean from my dataset but it should work anyway, right?). I'm sorry to bother you one more time but I checked my code and i can't find anything weird.

Thanks!!

jwyang · 2018-02-22T18:21:24Z

@JavHaro , Typically, you need to subtract the mean to make the range suitable for the pretrained VGG or ResNet.

JavHaro · 2018-03-01T12:45:51Z

Thanks @jwyang ! I subtract the mean of pascal dataset but maybe i should subtract my dataset's mean. Another issue that i could have is that the number of pixels per object is quite low (aprox 30 pixels of width). Do you think that this could cause the network collapse?

CodeJjang · 2018-03-02T18:15:19Z

A related question: If I dont want to use a pretrained network (i.e resnet), can I switch "pretrained=true" to "pretrained=false" in the appropriate place? Will it work?

jwyang · 2018-03-04T02:31:07Z

Hi, @CodeJjang , yes, it will train from scratch if you set pretrained=False.

iabhi7 · 2018-03-05T12:12:13Z

@JavHaro Hi, were you able to successfully train your custom dataset model?

JavHaro · 2018-03-06T11:22:39Z

Hi @vibrantabhi19,
I haven't got it yet.

JavHaro · 2018-03-06T11:28:18Z

Hi @jwyang
I have noticed that when the annotations have xmin=0 (or ymin=0) the gt_boxes in rpn have the maximum value of the axis (in my case 600) which causes the error when calculating IoU (due to the negative value of xmax-xmin). I'm trying to identify the error in its origin. Do you know where it might be?
Thanks once again!

jwyang · 2018-03-07T18:11:28Z

@JavHaro Hi, I think I already fixed this bug, did you update your roibatchloader.py?

JavHaro · 2018-03-07T18:20:10Z

Hi @jwyang , no since a month ago or so. i will check it.
Thanks!!

zeehasham · 2018-03-07T20:31:58Z

@JavHaro How did you create ImageSet folder for your custom dataset? It has two folders Layout and Main. Did you create files for both? Also in Main folder how do you specify -1 class as you only have one class (I also have one class to detect). Kindly let me know. Thanks

Suxin5987THU · 2018-03-08T08:08:08Z

Hi @jwyang . I meet a similar situation with JavHaro. And I found that the training will be collapsed and all the loss will be nan when fg_rois_per_this_image is 0 in proposal_target_layer. Did you meet this case?

zeehasham · 2018-03-08T14:57:21Z

@Suxin5987THU I am still in a phase of preparing my own dataset in VOC format. Can anyone help me how to structure dataset especially ImageSet folder? Thanks

JavHaro · 2018-03-08T15:43:54Z

Hi @zeehasham I followed the instructions of this post. Answering your question, I created files for both of them by specifying in Main the class name and Layout without specifying it. I really don't know if it's necessary or if it works because I haven't had time to check the training results. As soon as I have time and check that everything is ok I will post a new message with the main modifications that I have had in case it serves as a guide for someone.

JavHaro · 2018-03-19T08:01:33Z

Hi @jwyang, @zeehasham and @Suxin5987THU
Finally i found the problem of the Nan. The problem is on the annotations when the min value is close to 0. I don't know why the value turns to the maximum value in loading (indeed maximum value minus two --> 65534). What i have done to fix it is to put an "if" checking the minimum values so if the value is bigger than 60000 i put it to 0. I know that isn't the most elegant solutions but it works.
I hope that this can help you.

CodeJjang · 2018-03-23T16:35:37Z

@JavHaro Can you show where is the fix? In what file + what is the fix exactly?

Karthik-Suresh93 · 2018-05-29T23:55:21Z

Hi @JavHaro,

I too am facing the same issue, could you please show exactly where to incorporate the fix. @CodeJjangor anybody else, if you know the fix, please let me know. Thanks

Ram-Godavarthi · 2018-06-12T13:12:26Z

Hello @jwyang @JavHaro
Could you please provide me the solution for this?
while training the network I am getting these run time errors.

I0612 13:01:18.071843 3126 sgd_solver.cpp:106] Iteration 1360, lr = 0.001
I0612 13:01:23.300282 3126 solver.cpp:229] Iteration 1380, loss = nan
I0612 13:01:23.300330 3126 solver.cpp:245] Train net output #0: bbox_loss = 0 (* 1 = 0 loss)
I0612 13:01:23.300346 3126 solver.cpp:245] Train net output #1: cls_loss = 0.0526416 (* 1 = 0.0526416 loss)
I0612 13:01:23.300359 3126 solver.cpp:245] Train net output #2: rpn_cls_loss = 3.32216 (* 1 = 3.32216 loss)
I0612 13:01:23.300371 3126 solver.cpp:245] Train net output #3: rpn_loss_bbox = nan (* 1 = nan loss)
I0612 13:01:23.300381 3126 sgd_solver.cpp:106] Iteration 1380, lr = 0.001

and this

bbox_transform.py:48: RuntimeWarning: overflow encountered in exp
pred_w = np.exp(dw) * widths[:, np.newaxis]

what should be done to get rid of these errors..
I am using on my custom data set

JingXiaolun · 2018-07-23T07:07:18Z

@JavHaro Can you show where is the fix? In what file + what is the fix exactly?

adamklec · 2018-07-25T14:09:40Z

@jwyang @JavHaro I just ran into this issue as well. Can you please post the fix?

JavHaro · 2018-07-26T08:06:11Z

Sorry @adamklec, @Karthik-Suresh93 & @1csu , i can't remember exactly the file or the exact fix. It was in the moment of loading annotations. The problem was that if an annotation index is quite close to 0 (in x or y axis), this annotations were transformed into the maximum value (i can't remember when or why). I just did something like this:

if ( x.min>=60000): x.min=0
if ( y.min>=60000): y.min=0
I can't remember the exact name of the variable.

If you perform a check like this before using annotations the problem should be fixed.
Sorry for not being able to give you more clues about the exact fix. I should be post it when i fix it. Now I'm working in other projects and I forgot almost everything about this one.
BR

Hackerlil · 2019-02-26T11:57:39Z

Maybe in pascal_voc.py ,there are some code about get bbox coordinates

benjmcarr · 2019-04-02T16:32:31Z

There's an overflow when setting boxes[ix, :] = [x1, y1, x2, y2] to a negative value in _load_pascal_annotation in pascal_voc.py. You can avoid this by clipping the values at

faster-rcnn.pytorch/lib/datasets/pascal_voc.py

Lines 234 to 237 in 7d106c9

    
           x1 = float(bbox.find('xmin').text) - 1 
        
           y1 = float(bbox.find('ymin').text) - 1 
        
           x2 = float(bbox.find('xmax').text) - 1 
        
           y2 = float(bbox.find('ymax').text) - 1

like so,

x1 = max(float(bbox.find('xmin').text) - 1, 0)
y1 = max(float(bbox.find('ymin').text) - 1, 0)
x2 = max(float(bbox.find('xmax').text) - 1, 0)
y2 = max(float(bbox.find('ymax').text) - 1, 0)

amirmgh1375 · 2019-04-15T09:23:51Z

@benjmcarr
Great 👍
it worked for me. the big problem solved
its just for dataset annotations.
but i changed your code as follows :

x1 = max(float(bbox.find('xmin').text), 0)
y1 = max(float(bbox.find('ymin').text) , 0)
x2 = max(float(bbox.find('xmax').text) , 0)
y2 = max(float(bbox.find('ymax').text) , 0)

Thanks alot : )

chensonglu · 2019-05-06T06:11:32Z

@benjmcarr thanks, it works for me. And we also need to delete the cached gt to create new gt.

JavHaro mentioned this issue Mar 1, 2018

Training Loss : Nan #79

Open

jwyang closed this as completed Mar 26, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training custom dataset #75

Training custom dataset #75

JavHaro commented Feb 19, 2018

JavHaro commented Feb 21, 2018 •

edited

Loading

jwyang commented Feb 21, 2018

JavHaro commented Feb 22, 2018 •

edited

Loading

JavHaro commented Feb 22, 2018

jwyang commented Feb 22, 2018

JavHaro commented Mar 1, 2018

CodeJjang commented Mar 2, 2018

jwyang commented Mar 4, 2018

iabhi7 commented Mar 5, 2018

JavHaro commented Mar 6, 2018

JavHaro commented Mar 6, 2018

jwyang commented Mar 7, 2018

JavHaro commented Mar 7, 2018

zeehasham commented Mar 7, 2018 •

edited

Loading

Suxin5987THU commented Mar 8, 2018

zeehasham commented Mar 8, 2018

JavHaro commented Mar 8, 2018

JavHaro commented Mar 19, 2018 •

edited

Loading

CodeJjang commented Mar 23, 2018

Karthik-Suresh93 commented May 29, 2018

Ram-Godavarthi commented Jun 12, 2018

JingXiaolun commented Jul 23, 2018

adamklec commented Jul 25, 2018

JavHaro commented Jul 26, 2018

Hackerlil commented Feb 26, 2019

benjmcarr commented Apr 2, 2019

amirmgh1375 commented Apr 15, 2019 •

edited

Loading

chensonglu commented May 6, 2019

Training custom dataset #75

Training custom dataset #75

Comments

JavHaro commented Feb 19, 2018

JavHaro commented Feb 21, 2018 • edited Loading

jwyang commented Feb 21, 2018

JavHaro commented Feb 22, 2018 • edited Loading

JavHaro commented Feb 22, 2018

jwyang commented Feb 22, 2018

JavHaro commented Mar 1, 2018

CodeJjang commented Mar 2, 2018

jwyang commented Mar 4, 2018

iabhi7 commented Mar 5, 2018

JavHaro commented Mar 6, 2018

JavHaro commented Mar 6, 2018

jwyang commented Mar 7, 2018

JavHaro commented Mar 7, 2018

zeehasham commented Mar 7, 2018 • edited Loading

Suxin5987THU commented Mar 8, 2018

zeehasham commented Mar 8, 2018

JavHaro commented Mar 8, 2018

JavHaro commented Mar 19, 2018 • edited Loading

CodeJjang commented Mar 23, 2018

Karthik-Suresh93 commented May 29, 2018

Ram-Godavarthi commented Jun 12, 2018

JingXiaolun commented Jul 23, 2018

adamklec commented Jul 25, 2018

JavHaro commented Jul 26, 2018

Hackerlil commented Feb 26, 2019

benjmcarr commented Apr 2, 2019

amirmgh1375 commented Apr 15, 2019 • edited Loading

chensonglu commented May 6, 2019

JavHaro commented Feb 21, 2018 •

edited

Loading

JavHaro commented Feb 22, 2018 •

edited

Loading

zeehasham commented Mar 7, 2018 •

edited

Loading

JavHaro commented Mar 19, 2018 •

edited

Loading

amirmgh1375 commented Apr 15, 2019 •

edited

Loading