Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finetuning issues, loss: -nan after 100 iterations #644

Closed
wendlerc opened this issue Jul 8, 2014 · 5 comments
Closed

Finetuning issues, loss: -nan after 100 iterations #644

wendlerc opened this issue Jul 8, 2014 · 5 comments

Comments

@wendlerc
Copy link

wendlerc commented Jul 8, 2014

Hello,

First of all, I know that several issues regarding that topic already exists, unfortunately none of them provided sufficient information for me to solve my problem.

I was trying to reuse the pretrained imagenet model in order to solve a binary classification task. So what did I do:

  1. I took the aeroplane images and labels from the pascal VOC2007 dataset and converted it into a suitable format (resize the images to 256x256 and put the label 1 for aeroplane and 0 for not aeroplane) for convert_imageset.bin
  2. I generated the leveldb and mean.protobin files using convert_imageset.bin (in create_imagenet.sh) and make_imagenet_mean.sh
  3. I renamed the last fully connected layer from the imagenet_train/val.prototxt and reduced it's number of outputs to 2, as a solver I took the solver definition from the pascal-finetune example.
  4. I called finetune_net.bin

What I got was the following:

  1. The tuning was really slow (my imageset consists of ~2500 images and I run the solver in cpumode)
  2. After a certain amount of iterations the loss becomes -nan

I0707 17:01:49.294651 13063 solver.cpp:106] Iteration 0, Testing net
I0707 17:20:26.931828 13063 solver.cpp:142] Test score #0: 0.002
I0707 17:20:26.931887 13063 solver.cpp:142] Test score #1: 1.84863
I0707 22:03:18.925554 13063 solver.cpp:237] Iteration 100, lr = 0.001
I0707 22:03:19.511451 13063 solver.cpp:87] Iteration 100, loss = -nan

Additionally I made a few runs with slightly different network definitions, e.g. use all the imagenet layers and put an extra fully connected layer with 2 outputs on top or use just 1 output, but these failed as well with the same output.

I did not find much documentation on finetuning, except the slides in the presentation and several issues #31 #328 #140 and more.

I am new to caffe and it is my first time that I work with neural networks, therefore please don't be afraid of writing detailed answers. E.g. Is it sufficient to just reduce the number of outputs of the last fully connected layer in order to make the imagenet suitable for a binary classification task?

Best regards,

Chris

@sguada
Copy link
Contributor

sguada commented Jul 8, 2014

Try a smaller base_lr

On Tuesday, July 8, 2014, mezN [email protected] wrote:

Hello,

First of all, I know that several issues regarding that topic already
exists, unfortunately none of them provided sufficient information for me
to solve my problem.

I was trying to reuse the pretrained imagenet model in order to solve a
binary classification task. So what did I do:

  1. I took the aeroplane images and labels from the pascal VOC2007 dataset
    and converted it into a suitable format for the convert_imageset.bin
  2. I generated the leveldb and mean.protobin files using
    convert_imageset.bin (in create_imagenet.sh) and make_imagenet_mean.sh
  3. I renamed the last fully connected layer from the
    imagenet_train/val.prototxt and reduced it's number of outputs to 2, as a
    solver I took the solver definition from the pascal-finetune example.
  4. I called finetune_net.bin

What I got was the following:

  1. The tuning was really slow (my imageset consists of ~2500 images and I
    run the solver in cpumode)
  2. After a certain amount of iterations the loss becomes -nan

I0707 17:01:49.294651 13063 solver.cpp:106] Iteration 0, Testing net
I0707 17:20:26.931828 13063 solver.cpp:142] Test score #0: 0.002
I0707 17:20:26.931887 13063 solver.cpp:142] Test score #1
#1: 1.84863
I0707 22:03:18.925554 13063 solver.cpp:237] Iteration 100, lr = 0.001
I0707 22:03:19.511451 13063 solver.cpp:87] Iteration 100, loss = -nan

I did not find much documentation on finetuning, except the slides in the
presentation and several issues #31
#31 #328
#328 #140
#140 and more.

I am new to caffe and it is my first time that I work with neural
networks, therefore please don't be afraid of writing detailed answers.
E.g. Is it sufficient to just reduce the number of outputs of the last
fully connected layer in order to make the imagenet suitable for a binary
classification task?

Best regards,

Chris


Reply to this email directly or view it on GitHub
#644.

Sergio

@wendlerc
Copy link
Author

wendlerc commented Jul 8, 2014

I reduced the size of my dataset and also changed the proportion e.g. 50 aeroplanes 100 no-aeroplanes, also I reduced the learning rate and now it seems to work. Thanks, would you mind telling me what exactly momentum and weight_decay does? I understood all parameters of the solver.prototxt except them now :)

@sguada
Copy link
Contributor

sguada commented Jul 8, 2014

Take a look here
http://leon.bottou.org/research/stochastic

Sergio

2014-07-08 6:35 GMT-07:00 mezN [email protected]:

I reduced the size of my dataset and also changed the proportion e.g. 50
aeroplanes 100 no-aeroplanes, also I reduced the learning rate and now it
seems to work. Thanks, would you mind telling me what exactly momentum and
weight_decay does? I understood all parameters of the solver.prototxt
except them now :)


Reply to this email directly or view it on GitHub
#644 (comment).

@wendlerc
Copy link
Author

wendlerc commented Jul 8, 2014

thanks, I closed this issue :)

@caffecuda
Copy link

@Mezn Hi, can you please look at #631? Many thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants