Add VGG-16 net as one of the default network #159

jmozah · 2015-07-03T11:50:28Z

SImilar to LeNet,AlexNext, GoogLeNet... it would be good if VGG net is also added as one of the default networks to select from..

lukeyeager · 2015-07-06T21:34:54Z

Last I checked, there wasn't a publicly available version of their train_val.prototxt. Lots of people have asked for it:
https://gist.github.com/ksimonyan/fd8800eeb36e276cd6f9#comment-1430126
https://gist.github.com/ksimonyan/211839e770f7b538e2d8#comment-1346808
https://gist.github.com/ksimonyan/3785162f95cd2d5fee77#comment-1316301

I think they probably just don't have it anymore. If you want to put together a version that successfully trains on multiple datasets successfully, then can test it and get it added to DIGITS.

jmozah · 2015-07-07T04:55:55Z

Look at the bottom of this link.. @karathy has a link there
https://gist.github.com/ksimonyan/211839e770f7b538e2d8#file-readme-md

I will try and see if i can succesfully train a version

serafett · 2015-07-16T17:20:25Z

Hi @jmozah

Were you able to train VGG successfully? I think training using the pretrained model works but training from scratch does not converge.

If anyone has successfully trained VGG16 or VGG19 from scratch, can you share your solver and train_val files?

jmozah · 2015-07-17T01:32:03Z

No... The network failed after 1 epoc... Will check it next week and update

Sent from my iPhone

On 16-Jul-2015, at 10:50 PM, serafett [email protected] wrote:

Hi @jmozah

Were you able to train VGG successfully? I think training using the pretrained model works but training from scratch does not converge.

If anyone has successfully trained VGG16 or VGG19 from scratch, can you share your solver and train_val files?

—
Reply to this email directly or view it on GitHub.

saeedizadi · 2015-08-14T12:54:20Z

@jmozah
Any success?

jmozah · 2015-08-18T12:07:10Z

No... not yet

groar · 2015-09-08T15:56:39Z

I use a train_val that I updated from an old one. It works with the VGG 19 layers (with a very small batch). https://gist.github.com/groar/d455ebe671b2f1807659

I used it for fine-tuning, but never tried to train it from scratch. I could try.

lukeyeager · 2015-12-15T23:34:40Z

Update on this:

@graphific uploaded a train_val.prototxt in the comments for this gist. I tried it on a 20-class subset of ImageNet (which should be easier to solve than the full imagenet dataset) and it totally failed to train (whereas AlexNet and GoogLeNet converge quickly every time).

So, still no luck here :-/

gheinrich · 2015-12-16T13:10:15Z

It would probably help to add Xavier weight initialization for this kind of deep network. With the default weight initialization the odds of hitting a vanishing gradient in the first layers are high.

lfrdm · 2016-01-21T14:15:56Z

Hi guys. Don't know if you still got problems with converging vggnet but for me initializing the weights did the trick, as @gheinrich suggested. Though, I used the standard initialization like it is done in the AlexNet.

gheinrich · 2016-01-21T14:23:03Z

Thanks! Can you post your .prototxt? Did you use Gaussian intialization? Xavier or MSRA initializations should perform better (and you don't have to specify the standard deviation of the distribution on these). Some toy examples there.

lfrdm · 2016-01-21T14:42:58Z

You can find my .prototxt here. Yes, i used Gaussian. I trained on about 100.000 images (80% train, 20% val) with 64x64p with a batch size of 100. I used standard SGD, Gamma and LR. The dataset is private, dont know if it works on imagenet but i guess so. Note that the last output is 2 due to a binary class problem, for imagenet the fc8 layer should have an output of 1000.

I just noticed, that I used the VGGNet from BMVC-2014. Sorry for that. I will give feedback after I tryed it with the 16 layer network on the same dataset.

lfrdm · 2016-01-22T13:28:44Z

As @gheinrich suggested the VGGNet with 16 layers converges with the "xavier" weight initialization. You can find my train_val.prototxt file here. Note that I didnt train on the ImageNet dataset, but I had faced the same problem with convergence and was able to fix it with the "xavier" weight initialization. Parameters: Batch: 100, Image: 64x64, SGD: 6%, Gamma: 0.5, LR: 0.05. The last output is 2 due to a binary class problem, for ImageNet the fc8 layer should have an output of 1000.

gheinrich · 2016-01-22T13:41:56Z

Thanks for the update. That is nicely in line with the VGG paper:

Quote:

The initialisation of the network weights is important, since bad initialisation can stall learning due
to  the  instability  of  gradient in  deep  nets.   To  circumvent this problem,  we  began with  training
the configuration A (Table 1), shallow enough to be trained with random initialisation. Then, when
training deeper architectures, we initialised the first four convolutional layers and the last three fully-
connected layers with the layers of net A (the intermediate layers were initialised randomly). We did
not decrease the learning rate for the pre-initialised layers, allowing them to change during learning.
For random initialisation (where applicable), we sampled the weights from a normal distribution
with the zero mean and 10e-2 variance. The biases were initialised with zero. It is worth
noting that after the paper submission we found that it is possible to initialise the weights without
pre-training by using the random initialisation procedure of Glorot & Bengio (2010).

GiuliaP · 2016-03-15T10:29:03Z

Hi, I tried the train_val.prototxt posted by @lfrdm and it works, thanks. I added the lr_mult=10/20 and decay_mult=1/0 params for the weights/biases to the fc8 layer . I was now wondering why these params are missing in the train_val.prototxt and whether setting them to the same values as, e.g., in CaffeNet, as I have done for fc8, may make sense.

GiuliaP · 2016-03-29T09:39:10Z

@igorbb you're right, in the train_val.prototxt, in all the pooling
layers, the "pool: MAX" parameter is repeated twice. It must be a typo.
After correcting this it seems to work.

Il 23/03/16 00:36, igorbb ha scritto:

Hwy @GiuliaP https://github.com/GiuliaP I am getting a parser error
with @lfrdm https://github.com/lfrdm version. Can you share your gist ?

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#159 (comment)

hariprasadravi · 2016-06-23T07:02:20Z

Hi I'm new to DIGITS and I'm experimenting with some datasets. When I tried the train_val.prototxt posted by @lfrdm with the changes mentioned by @GiuliaP (removing repeated pool :Max) I got this error message. Am I going wrong somewhere? Alex and GoogLeNet seem to be working fine.

ERROR: Check failed: error == cudaSuccess (2 vs. 0) out of memory

relu2_2 needs backward computation.
conv2_2 needs backward computation.
relu2_1 needs backward computation.
conv2_1 needs backward computation.
pool1 needs backward computation.
relu1_2 needs backward computation.
conv1_2 needs backward computation.
relu1_1 needs backward computation.
conv1_1 needs backward computation.
label_data_1_split does not need backward computation.
data does not need backward computation.
This network produces output accuracy
This network produces output loss
Network initialization done.
Solver scaffolding done.
Starting Optimization
Solving
Learning Rate Policy: step
Iteration 0, Testing net (#0)
Check failed: error == cudaSuccess (2 vs. 0) out of memory

GiuliaP · 2016-06-23T07:16:21Z

You have to reduce the batch size (both train and test/val): as it says, the GPU is out of memory.

hariprasadravi · 2016-06-23T08:42:33Z

@GiuliaP Reduced it and works well now. Thank you.

jmozah · 2016-06-23T12:45:48Z

Did it converge?

./Zahoor@iPhone

On 23-Jun-2016, at 2:12 PM, Hariprasad Ravishankar [email protected] wrote:

@GiuliaP Reduced it and works well now. Thank you.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

hariprasadravi · 2016-06-25T13:01:18Z

yes it did. I ran it for 10 epochs on a data set consisting of 10k color images with a batch size of 10. It took an hour to complete and gave me a validation accuracy of 92%.

ghost · 2016-06-30T20:51:16Z

Hi,
I'm trying to use VGG in DIGITS. When I tried to create the model, I get the following error:

_

ERROR: Layer 'loss' references bottom 'label' at the TEST stage however this blob is not included at that stage. Please consider using an include directive to limit the scope of this layer.

_

I just copied the train_val.prototxt provided by lfrdm to custom network and deleted the duplicated pool: MAX. Any idea?
Thanks in advance,
M

lukeyeager · 2016-07-05T18:04:57Z

@mizadyya Read the documentation on how custom networks in DIGITS work by clicking on the blue question mark above the box.

You probably want to add something like this to your loss layer:

  exclude { stage: "deploy" }

Example:
https://github.com/NVIDIA/DIGITS/blob/digits-4.0/digits/standard-networks/caffe/lenet.prototxt#L162-L184

ghost · 2016-07-05T19:54:33Z

@lukeyeager I also needed to add softmax layer to the end, in addition to softmax with loss. Now it's running fine. Thanks

jmozah · 2016-07-07T05:31:12Z

How much memory does it consume... Fits in 4gb card?

./Zahoor@iPhone

On 07-Jul-2016, at 9:15 AM, Ishant Mrinal Haloi [email protected] wrote:

I have tested this in Imagenet, it converges https://github.com/n3011/VGG_19_layers_Network

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

Motherboard · 2016-09-13T12:46:05Z

I couldn't make it work with batches as big as 5 256x256 images on a K520 with 4GB... And it also takes 5 days for 10 epochs on 18k images (finetuning)... maybe something is wrong with my EC2? GPU utilization is 99% constantly, memory peaked during initialization to near 100%, but quickly dropped to 60%... although larger batches made it fail for lack of memory (ended up using batches of 3)...

mrgloom · 2016-09-16T11:11:12Z

Also can't train VGG-16. Maybe it's because small batch size or solver settings(I use default DIGITS settings)?
My dataset is from this kaggle competition: https://www.kaggle.com/c/dogs-vs-cats
Here is my network definition: https://gist.github.com/mrgloom/fec835c5570e739eff8c18a343bdd7db

mrgloom · 2016-09-16T20:58:39Z

Seems that was small batch problem, I successfully trained VGG-16 with batch size 24 and batch accumulation 2, so as I understand my batch size was 48?

Here is the models and logs downloaded from DIGITS:
https://github.com/mrgloom/kaggle-dogs-vs-cats-solution/tree/master/learning_from_scratch/Models/VGG-16
https://github.com/mrgloom/kaggle-dogs-vs-cats-solution/tree/master/learning_from_scratch/Models/VGG-19

HolmesShuan · 2016-10-02T06:23:07Z

Here is my prototxt, seems to work correctly.

eamadord · 2016-10-18T07:59:42Z

Hi, I'm fairly new to DIGITS and to Caffe, and I have been trying to finetune VGG for the past few weeks without results. I used the prototxt posted by @lfrdm , setting the lr_mult parameters of the last layer to the values suggested by @GiuliaP and the lr_mult of the rest of the layers to 0. However, when running it in DIGITS it does not converge, it goes from 20% acc to 55% and it stays like that during the whole training. I've tried with several learning rates, from 0,01 to 0,0005 without success. My dataset consists on 8500 images for training and 1700 for validation, splitted into 5 classes. Could anyone give me a hand on this?

gheinrich · 2016-10-18T08:29:36Z

Hi @Elviish since your question isn't related to getting VGG to load in DIGITS but how to train it, can you post this question on the DIGITS users list (https://groups.google.com/forum/#!forum/digits-users).

aytackanaci · 2017-02-10T18:37:09Z

Hi @lfrdm, I was looking for train_val files for vgg from bmvc 2014. I see that you have two commits for that file. Is the older one for bmvc version?

aaron276h · 2017-09-04T21:25:10Z

@lfrdm any chance you could post your prototxt file for VGG again, seems to be down, Thanks!

gaving · 2017-11-02T14:38:00Z

Echoing a request for this prototxt file for VGG.. can't seem to find one!

lukeyeager added the enhancement label Oct 12, 2015

lukeyeager mentioned this issue Oct 12, 2015

Remove VGG network stub #358

Merged

lukeyeager mentioned this issue Jan 21, 2016

VGG16 does not converge #535

Closed

mrgloom mentioned this issue Sep 17, 2016

Why training loss doesn't decrease at the very beginning? BVLC/caffe#2051

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add VGG-16 net as one of the default network #159

Add VGG-16 net as one of the default network #159

jmozah commented Jul 3, 2015

lukeyeager commented Jul 6, 2015

jmozah commented Jul 7, 2015

serafett commented Jul 16, 2015

jmozah commented Jul 17, 2015

saeedizadi commented Aug 14, 2015

jmozah commented Aug 18, 2015

groar commented Sep 8, 2015

lukeyeager commented Dec 15, 2015

gheinrich commented Dec 16, 2015

lfrdm commented Jan 21, 2016

gheinrich commented Jan 21, 2016

lfrdm commented Jan 21, 2016

lfrdm commented Jan 22, 2016

gheinrich commented Jan 22, 2016

GiuliaP commented Mar 15, 2016

GiuliaP commented Mar 29, 2016

hariprasadravi commented Jun 23, 2016

GiuliaP commented Jun 23, 2016

hariprasadravi commented Jun 23, 2016

jmozah commented Jun 23, 2016

hariprasadravi commented Jun 25, 2016

ghost commented Jun 30, 2016

lukeyeager commented Jul 5, 2016 •

edited

Loading

ghost commented Jul 5, 2016 •

edited by ghost

Loading

jmozah commented Jul 7, 2016

Motherboard commented Sep 13, 2016 •

edited

Loading

mrgloom commented Sep 16, 2016

mrgloom commented Sep 16, 2016 •

edited

Loading

HolmesShuan commented Oct 2, 2016

eamadord commented Oct 18, 2016

gheinrich commented Oct 18, 2016

aytackanaci commented Feb 10, 2017

aaron276h commented Sep 4, 2017

gaving commented Nov 2, 2017

Add VGG-16 net as one of the default network #159

Add VGG-16 net as one of the default network #159

Comments

jmozah commented Jul 3, 2015

lukeyeager commented Jul 6, 2015

jmozah commented Jul 7, 2015

serafett commented Jul 16, 2015

jmozah commented Jul 17, 2015

saeedizadi commented Aug 14, 2015

jmozah commented Aug 18, 2015

groar commented Sep 8, 2015

lukeyeager commented Dec 15, 2015

gheinrich commented Dec 16, 2015

lfrdm commented Jan 21, 2016

gheinrich commented Jan 21, 2016

lfrdm commented Jan 21, 2016

lfrdm commented Jan 22, 2016

gheinrich commented Jan 22, 2016

GiuliaP commented Mar 15, 2016

GiuliaP commented Mar 29, 2016

hariprasadravi commented Jun 23, 2016

GiuliaP commented Jun 23, 2016

hariprasadravi commented Jun 23, 2016

jmozah commented Jun 23, 2016

hariprasadravi commented Jun 25, 2016

ghost commented Jun 30, 2016

lukeyeager commented Jul 5, 2016 • edited Loading

ghost commented Jul 5, 2016 • edited by ghost Loading

jmozah commented Jul 7, 2016

Motherboard commented Sep 13, 2016 • edited Loading

mrgloom commented Sep 16, 2016

mrgloom commented Sep 16, 2016 • edited Loading

HolmesShuan commented Oct 2, 2016

eamadord commented Oct 18, 2016

gheinrich commented Oct 18, 2016

aytackanaci commented Feb 10, 2017

aaron276h commented Sep 4, 2017

gaving commented Nov 2, 2017

lukeyeager commented Jul 5, 2016 •

edited

Loading

ghost commented Jul 5, 2016 •

edited by ghost

Loading

Motherboard commented Sep 13, 2016 •

edited

Loading

mrgloom commented Sep 16, 2016 •

edited

Loading