Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training custom dataset #75

Closed
JavHaro opened this issue Feb 19, 2018 · 28 comments
Closed

Training custom dataset #75

JavHaro opened this issue Feb 19, 2018 · 28 comments

Comments

@JavHaro
Copy link

JavHaro commented Feb 19, 2018

Hi @jwyang,
As i mentioned in a previous post I would like to train a Faster-R-CNN model (vgg16) with my own dataset. I have followed this post based on the code of Ross Girshick but adapting it to your implementation. Now I'm trying to adapt the network model to my dataset but i don't know what should i modify to do it? Do you have any idea that could guide me?
Thanks!!

PS: I know that there is a closed issue regarding this but i posted here just in case you don't follow up closed issues.

@JavHaro
Copy link
Author

JavHaro commented Feb 21, 2018

Hi @jwyang
To train the network with my dataset I have created the annotations according to pascal_voc template, modified pascal_voc.py and voc_eval.py to addapt to my dataset classes (only background and cherry) and i have modified factory.py. Now, based on the post commented i should modify the models but i think that in your implementation the model is isolated from the dataset, isn't it?.
I have tried to train the model but despite no error is shown the output is strange and the model don't detect a cherry.
output:

[session 1][epoch 1][iter 0] loss: 1.4996, lr: 1.00e-03
fg/bg=(26/998), time cost: 2.685305
rpn_cls: 0.6953, rpn_box: 0.0309, rcnn_cls: 0.7547, rcnn_box 0.0187
[session 1][epoch 1][iter 100] loss: nan, lr: 1.00e-03
fg/bg=(783/241), time cost: 175.604011
rpn_cls: nan, rpn_box: nan, rcnn_cls: nan, rcnn_box nan

Any idea?
Thanks!!

@jwyang
Copy link
Owner

jwyang commented Feb 21, 2018

Hi, @JavHaro , it seems that the training collapsed, According to your output, I find it is weird that fg/bg = (783/241) since the ratio between fg and bg numbers should be not that high if you did not change hyperparameters. So I would suggest that you go back to check whether you training data from you customized data loader is good or not.

@JavHaro
Copy link
Author

JavHaro commented Feb 22, 2018

Thanks @jwyang !
In this case i have changed ANCHOR_SCALES & ANCHOR_RATIO because the background and green cherries are quite similar and there are so few pixel per fruit (20x20 aprox) so i wanted to fit the bounding box as much as possible. In any case, i think that the output is weird so i will check the data loader.
I'll be back when i have news.
Thanks once again.

Edit: i forgot to mention that i changed batch_size =4.

@JavHaro
Copy link
Author

JavHaro commented Feb 22, 2018

Hi @jwyang ,
It may be a silly question but i'm a newborn in deep learning world (and also in python and torch). It is supposed that i have to do something special with the dataset? I mean, i have a tagged dataset (in pascal format) all the images with the same aspect ratio (i know is not necessary but it helps me analyse greater images by dividing them into a grid). I've not prepared the images by normalizing them or something (maybe i can reach better results if i extract the mean from my dataset but it should work anyway, right?). I'm sorry to bother you one more time but I checked my code and i can't find anything weird.

Thanks!!

@jwyang
Copy link
Owner

jwyang commented Feb 22, 2018

@JavHaro , Typically, you need to subtract the mean to make the range suitable for the pretrained VGG or ResNet.

@JavHaro
Copy link
Author

JavHaro commented Mar 1, 2018

Thanks @jwyang ! I subtract the mean of pascal dataset but maybe i should subtract my dataset's mean. Another issue that i could have is that the number of pixels per object is quite low (aprox 30 pixels of width). Do you think that this could cause the network collapse?

@CodeJjang
Copy link

A related question: If I dont want to use a pretrained network (i.e resnet), can I switch "pretrained=true" to "pretrained=false" in the appropriate place? Will it work?

@jwyang
Copy link
Owner

jwyang commented Mar 4, 2018

Hi, @CodeJjang , yes, it will train from scratch if you set pretrained=False.

@iabhi7
Copy link

iabhi7 commented Mar 5, 2018

@JavHaro Hi, were you able to successfully train your custom dataset model?

@JavHaro
Copy link
Author

JavHaro commented Mar 6, 2018

Hi @vibrantabhi19,
I haven't got it yet.

@JavHaro
Copy link
Author

JavHaro commented Mar 6, 2018

Hi @jwyang
I have noticed that when the annotations have xmin=0 (or ymin=0) the gt_boxes in rpn have the maximum value of the axis (in my case 600) which causes the error when calculating IoU (due to the negative value of xmax-xmin). I'm trying to identify the error in its origin. Do you know where it might be?
Thanks once again!

@jwyang
Copy link
Owner

jwyang commented Mar 7, 2018

@JavHaro Hi, I think I already fixed this bug, did you update your roibatchloader.py?

@JavHaro
Copy link
Author

JavHaro commented Mar 7, 2018

Hi @jwyang , no since a month ago or so. i will check it.
Thanks!!

@zeehasham
Copy link

zeehasham commented Mar 7, 2018

@JavHaro How did you create ImageSet folder for your custom dataset? It has two folders Layout and Main. Did you create files for both? Also in Main folder how do you specify -1 class as you only have one class (I also have one class to detect). Kindly let me know. Thanks

@Suxin5987THU
Copy link

Hi @jwyang . I meet a similar situation with JavHaro. And I found that the training will be collapsed and all the loss will be nan when fg_rois_per_this_image is 0 in proposal_target_layer. Did you meet this case?

@zeehasham
Copy link

@Suxin5987THU I am still in a phase of preparing my own dataset in VOC format. Can anyone help me how to structure dataset especially ImageSet folder? Thanks

@JavHaro
Copy link
Author

JavHaro commented Mar 8, 2018

Hi @zeehasham I followed the instructions of this post. Answering your question, I created files for both of them by specifying in Main the class name and Layout without specifying it. I really don't know if it's necessary or if it works because I haven't had time to check the training results. As soon as I have time and check that everything is ok I will post a new message with the main modifications that I have had in case it serves as a guide for someone.

@JavHaro
Copy link
Author

JavHaro commented Mar 19, 2018

Hi @jwyang, @zeehasham and @Suxin5987THU
Finally i found the problem of the Nan. The problem is on the annotations when the min value is close to 0. I don't know why the value turns to the maximum value in loading (indeed maximum value minus two --> 65534). What i have done to fix it is to put an "if" checking the minimum values so if the value is bigger than 60000 i put it to 0. I know that isn't the most elegant solutions but it works.
I hope that this can help you.

@CodeJjang
Copy link

@JavHaro Can you show where is the fix? In what file + what is the fix exactly?

@jwyang jwyang closed this as completed Mar 26, 2018
@Karthik-Suresh93
Copy link

Hi @JavHaro,

I too am facing the same issue, could you please show exactly where to incorporate the fix. @CodeJjangor anybody else, if you know the fix, please let me know. Thanks

@Ram-Godavarthi
Copy link

Hello @jwyang @JavHaro
Could you please provide me the solution for this?
while training the network I am getting these run time errors.

I0612 13:01:18.071843 3126 sgd_solver.cpp:106] Iteration 1360, lr = 0.001
I0612 13:01:23.300282 3126 solver.cpp:229] Iteration 1380, loss = nan
I0612 13:01:23.300330 3126 solver.cpp:245] Train net output #0: bbox_loss = 0 (* 1 = 0 loss)
I0612 13:01:23.300346 3126 solver.cpp:245] Train net output #1: cls_loss = 0.0526416 (* 1 = 0.0526416 loss)
I0612 13:01:23.300359 3126 solver.cpp:245] Train net output #2: rpn_cls_loss = 3.32216 (* 1 = 3.32216 loss)
I0612 13:01:23.300371 3126 solver.cpp:245] Train net output #3: rpn_loss_bbox = nan (* 1 = nan loss)
I0612 13:01:23.300381 3126 sgd_solver.cpp:106] Iteration 1380, lr = 0.001

and this

bbox_transform.py:48: RuntimeWarning: overflow encountered in exp
pred_w = np.exp(dw) * widths[:, np.newaxis]

what should be done to get rid of these errors..
I am using on my custom data set

@JingXiaolun
Copy link

@JavHaro Can you show where is the fix? In what file + what is the fix exactly?

@adamklec
Copy link

@jwyang @JavHaro I just ran into this issue as well. Can you please post the fix?

@JavHaro
Copy link
Author

JavHaro commented Jul 26, 2018

Sorry @adamklec, @Karthik-Suresh93 & @1csu , i can't remember exactly the file or the exact fix. It was in the moment of loading annotations. The problem was that if an annotation index is quite close to 0 (in x or y axis), this annotations were transformed into the maximum value (i can't remember when or why). I just did something like this:

if ( x.min>=60000): x.min=0
if ( y.min>=60000): y.min=0
I can't remember the exact name of the variable.

If you perform a check like this before using annotations the problem should be fixed.
Sorry for not being able to give you more clues about the exact fix. I should be post it when i fix it. Now I'm working in other projects and I forgot almost everything about this one.
BR

@Hackerlil
Copy link

Maybe in pascal_voc.py ,there are some code about get bbox coordinates

@benjmcarr
Copy link

There's an overflow when setting boxes[ix, :] = [x1, y1, x2, y2] to a negative value in _load_pascal_annotation in pascal_voc.py. You can avoid this by clipping the values at

x1 = float(bbox.find('xmin').text) - 1
y1 = float(bbox.find('ymin').text) - 1
x2 = float(bbox.find('xmax').text) - 1
y2 = float(bbox.find('ymax').text) - 1
like so,

x1 = max(float(bbox.find('xmin').text) - 1, 0)
y1 = max(float(bbox.find('ymin').text) - 1, 0)
x2 = max(float(bbox.find('xmax').text) - 1, 0)
y2 = max(float(bbox.find('ymax').text) - 1, 0)

@amirmgh1375
Copy link

amirmgh1375 commented Apr 15, 2019

@benjmcarr
Great 👍
it worked for me. the big problem solved
its just for dataset annotations.
but i changed your code as follows :

x1 = max(float(bbox.find('xmin').text), 0)
y1 = max(float(bbox.find('ymin').text) , 0)
x2 = max(float(bbox.find('xmax').text) , 0)
y2 = max(float(bbox.find('ymax').text) , 0)

Thanks alot : )

@chensonglu
Copy link

@benjmcarr thanks, it works for me. And we also need to delete the cached gt to create new gt.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests