-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Add VGG-16 net as one of the default network #159
Comments
Last I checked, there wasn't a publicly available version of their I think they probably just don't have it anymore. If you want to put together a version that successfully trains on multiple datasets successfully, then can test it and get it added to DIGITS. |
Look at the bottom of this link.. @karathy has a link there I will try and see if i can succesfully train a version |
Hi @jmozah Were you able to train VGG successfully? I think training using the pretrained model works but training from scratch does not converge. If anyone has successfully trained VGG16 or VGG19 from scratch, can you share your solver and train_val files? |
No... The network failed after 1 epoc... Will check it next week and update Sent from my iPhone
|
@jmozah |
No... not yet |
I use a train_val that I updated from an old one. It works with the VGG 19 layers (with a very small batch). https://gist.github.com/groar/d455ebe671b2f1807659 I used it for fine-tuning, but never tried to train it from scratch. I could try. |
Update on this: @graphific uploaded a train_val.prototxt in the comments for this gist. I tried it on a 20-class subset of ImageNet (which should be easier to solve than the full imagenet dataset) and it totally failed to train (whereas AlexNet and GoogLeNet converge quickly every time). So, still no luck here :-/ |
It would probably help to add Xavier weight initialization for this kind of deep network. With the default weight initialization the odds of hitting a vanishing gradient in the first layers are high. |
Hi guys. Don't know if you still got problems with converging vggnet but for me initializing the weights did the trick, as @gheinrich suggested. Though, I used the standard initialization like it is done in the AlexNet. |
Thanks! Can you post your |
You can find my .prototxt here. Yes, i used Gaussian. I trained on about 100.000 images (80% train, 20% val) with 64x64p with a batch size of 100. I used standard SGD, Gamma and LR. The dataset is private, dont know if it works on imagenet but i guess so. Note that the last output is 2 due to a binary class problem, for imagenet the fc8 layer should have an output of 1000. I just noticed, that I used the VGGNet from BMVC-2014. Sorry for that. I will give feedback after I tryed it with the 16 layer network on the same dataset. |
As @gheinrich suggested the VGGNet with 16 layers converges with the "xavier" weight initialization. You can find my train_val.prototxt file here. Note that I didnt train on the ImageNet dataset, but I had faced the same problem with convergence and was able to fix it with the "xavier" weight initialization. Parameters: Batch: 100, Image: 64x64, SGD: 6%, Gamma: 0.5, LR: 0.05. The last output is 2 due to a binary class problem, for ImageNet the fc8 layer should have an output of 1000. |
Thanks for the update. That is nicely in line with the VGG paper: Quote:
|
Hi, I tried the |
@igorbb you're right, in the train_val.prototxt, in all the pooling Il 23/03/16 00:36, igorbb ha scritto:
|
Hi I'm new to DIGITS and I'm experimenting with some datasets. When I tried the train_val.prototxt posted by @lfrdm with the changes mentioned by @GiuliaP (removing repeated pool :Max) I got this error message. Am I going wrong somewhere? Alex and GoogLeNet seem to be working fine. ERROR: Check failed: error == cudaSuccess (2 vs. 0) out of memory relu2_2 needs backward computation. |
You have to reduce the batch size (both train and test/val): as it says, the GPU is out of memory. |
@GiuliaP Reduced it and works well now. Thank you. |
Did it converge? ./Zahoor@iPhone
|
yes it did. I ran it for 10 epochs on a data set consisting of 10k color images with a batch size of 10. It took an hour to complete and gave me a validation accuracy of 92%. |
Hi, _
_ I just copied the train_val.prototxt provided by lfrdm to custom network and deleted the duplicated pool: MAX. Any idea? |
@mizadyya Read the documentation on how custom networks in DIGITS work by clicking on the blue question mark above the box. You probably want to add something like this to your loss layer: exclude { stage: "deploy" } |
@lukeyeager I also needed to add softmax layer to the end, in addition to softmax with loss. Now it's running fine. Thanks |
How much memory does it consume... Fits in 4gb card? ./Zahoor@iPhone
|
I couldn't make it work with batches as big as 5 256x256 images on a K520 with 4GB... And it also takes 5 days for 10 epochs on 18k images (finetuning)... maybe something is wrong with my EC2? GPU utilization is 99% constantly, memory peaked during initialization to near 100%, but quickly dropped to 60%... although larger batches made it fail for lack of memory (ended up using batches of 3)... |
Also can't train VGG-16. Maybe it's because small batch size or solver settings(I use default DIGITS settings)? |
Seems that was small batch problem, I successfully trained VGG-16 with batch size 24 and batch accumulation 2, so as I understand my batch size was 48? Here is the models and logs downloaded from DIGITS: |
Here is my prototxt, seems to work correctly. |
Hi, I'm fairly new to DIGITS and to Caffe, and I have been trying to finetune VGG for the past few weeks without results. I used the prototxt posted by @lfrdm , setting the lr_mult parameters of the last layer to the values suggested by @GiuliaP and the lr_mult of the rest of the layers to 0. However, when running it in DIGITS it does not converge, it goes from 20% acc to 55% and it stays like that during the whole training. I've tried with several learning rates, from 0,01 to 0,0005 without success. My dataset consists on 8500 images for training and 1700 for validation, splitted into 5 classes. Could anyone give me a hand on this? |
Hi @Elviish since your question isn't related to getting VGG to load in DIGITS but how to train it, can you post this question on the DIGITS users list (https://groups.google.com/forum/#!forum/digits-users). |
Hi @lfrdm, I was looking for train_val files for vgg from bmvc 2014. I see that you have two commits for that file. Is the older one for bmvc version? |
@lfrdm any chance you could post your prototxt file for VGG again, seems to be down, Thanks! |
Echoing a request for this prototxt file for VGG.. can't seem to find one! |
SImilar to LeNet,AlexNext, GoogLeNet... it would be good if VGG net is also added as one of the default networks to select from..
The text was updated successfully, but these errors were encountered: