-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Caffe Timings for GoogleNet, VGG, AlexNet with cuDNN #1317
Comments
How do we interpret the numbers at the top from the numbers at the bottom? e.g. 22-36% slower on VGG_19 on a single GPU, but above you say 13.8x forward, 22x backward. In your individual timings you show GoogleNet with cuDNN faster, but at the top slower. |
In the top, I'm comparing the timings of GoogleNet and VGG models with respect to Caffe_reference model. That's means that GoogleNet with cuDNN is 2.8 times (3.6 times) slower in the forward (backward) than Caffe_reference with cuDNN. But I also added the comparison between GoogleNet with cuDNN vs GoogleNet without cuDNN and with cuDNN is 1.6 times (1.4 times) faster in the forward (backward) than without cuDNN. These analysis means that cuDNN helps for GoogleNet and Caffe_reference but hurst for VGG models. And that Caffe_reference is the fastest (although not the best), GoogleNet is pretty fast (state of the art) and VGG is pretty slow (but also state of the art). |
Now I understand, wording was a little confusing. ;-) VGG is an expensive network to train in several dimensions, but has neat attirbutes. Interesting that GoogleNet isn't "that bad" for training time (and memory footprint). |
Yeah, I didn't want to mean that VGG are bad networks, they are great, and we have seen great results using them, they just require too many parameters, a lot of memory and are slow to train and test, but results are good 👍 |
@sguada, thanks for the comparison. Can you please share the train/test networks definitions? |
@sguada, good job with the comparison! I can confirm that I obtained similar timings when playing with GoogLeNet architecture (when comparing it to the caffe_reference model). I am curious about the GoogLeNet training procedure: assuming that you use the batch size of 128, how long does it take you to see any progress in learning (how many iterations)? And if you do see the progress, can I also ask about 1) the learning rate used, and 2) weights initialization? I would be glad to exchange some experience on that... |
@mkudelski I've implemented the GoogLeNet as well and am getting the same training times as reported by @sguada. The implementation is straight forward as described in their paper. The weights initialization is "xavier" and that's about it, works out of the box! you can see progress straight away after a couple of hundreds of iterations, if you don't than there's something wrong. |
@amiralush Thanks for the info! BTW, do you also train on Tesla K40, with batch size of 128? Or maybe you use a smaller batch? |
Yes, 128 batch size, TeslaK40. P.S |
@amiralush The very last question ralated to the plot: what was the learning rate value for this particular learning curve (I assume the rate was constant during first 20000 iterations...)? Thanks again :-) |
@amiralush thanks for confirming my timings. It seems that before you uploaded a different graph containing the plots of other networks. Would like to talk about them? Are you plotting train or test loss? @mkudelski For training GoogleNet I used batch_size: 32, as reported in the paper, I used batch_size: 128 for timing to make it easy the comparison. |
Hi guys, could you post prototxts you used for googlenet (or add it to examples?) Thank you! |
@amiralush Thank you, will try this out!! |
@sguada Hi sguada, would you please describe how to initialize the first four layers of Net D in VGG's paper? I just wonder that how could we initialize Net D with Net A, since the number of parameters in each layer of those two nets are different. Thanks a lot. |
CONCAT layer costs extra memory. |
@sguada Hi, would you share your prototxt and model of googlenet. And did you train the net in caffe manner? If not, could you share your training method with us? |
Take a look at #1598 for my replica of GoogleNet, including the prototxt, solver and model. |
Hi Sergio! I am testing your GoogleNet implementation with a personal dataset and I am running out of memory (I am using a GeRorce GTX 760 card with 2048 MB). I have already tried to reduce the batch size (I even tested with I would appreciate your help, thanks! |
@andresromero |
Thanks @ducha-aiki it worked! |
Hi @andresromero |
Hi, I am training the VGG16 model with a K20 4 GB card, but it works just for batch sizes <= 10. How can I train the model with greater batch sizes ? |
Turn off testing -> ~2 times less memory consumption. |
Does anyone have any ideas why using cuDNN makes things slower for some networks (e.g., VGG)? |
Hi, I want to know why forward pass is faster than backward pass. If you know about that, please tell me about that. thank you! |
@sjlee7748 check here What I have observed from my is that backward pass in faster than forward pass without cuDNN and other way around if you compiled using cuDNN. I guess it depends upon the implementation. hope it helps. |
Question: I'd like to train and test your GoogLeNet replica for my application where I have 512x512 grayscale images that can have one of four possible classifications, so can you point me in the direction of what I would need to modify in the prototxt for this situation? As you can see, I am new to this. |
As part of my on going training of GoogleNet in Caffe (the wining entry of ImageNet-2014) I was doing some timings, and these are my findings:
[Comparison with Caffe_reference]
These experiments are run in one K40c using batch_size:128 in a server with 8-GPUs running other tasks.
[Comparison with cuDNN vs without cuDNN]
For the VGG networks I need to use batch_size: 64 to be able to fit them in memory, so I multiplied the times by 2.
The text was updated successfully, but these errors were encountered: