Caffe Timings for GoogleNet, VGG, AlexNet with cuDNN #1317

sguada · 2014-10-17T22:01:08Z

As part of my on going training of GoogleNet in Caffe (the wining entry of ImageNet-2014) I was doing some timings, and these are my findings:

[Comparison with Caffe_reference]

GoogleNet with cuDNN is (2.8x forward, 3.6x backward) slower than caffe_reference with cuDNN.
VGGNet_16Layers without cuDNN is (11.5x forward, 18.7x backward) slower than caffe_reference with cuDNN.
VGGNet_19Layers without cuDNN is (13.8x forward, 22x backward) slower than caffe_reference with cuDNN.

These experiments are run in one K40c using batch_size:128 in a server with 8-GPUs running other tasks.

[Comparison with cuDNN vs without cuDNN]

caffe_reference with cuDNN is (1.4x forward, 1.28x backward) faster than without cuDNN
- Average Forward pass: 200.792 ms.
- Average Backward pass: 310.973 ms.
- Average Forward-Backward: 511.953 ms.
caffe_reference without cuDNN
- Average Forward pass: 281.24 ms.
- Average Backward pass: 398.719 ms.
- Average Forward-Backward: 680.189 ms.
GoogleNet with cuDNN is (1.6x forward, 1.4x backward) faster than without cuDNN
- Average Forward pass: 562.841 ms.
- Average Backward pass: 1123.84 ms.
- Average Forward-Backward: 1688.8 ms.
GoogleNet without cuDNN
- Average Forward pass: 922.007 ms.
- Average Backward pass: 1533.55 ms.
- Average Forward-Backward: 2455.89 ms.

For the VGG networks I need to use batch_size: 64 to be able to fit them in memory, so I multiplied the times by 2.

VGG_16Layers with cuDNN is (1.2x, 1.12x) slower than without cuDNN
- Average Forward pass: 2772 ms.
- Average Backward pass: 6546.86 ms.
- Average Forward-Backward: 9324.94 ms.
VGG_16Layers without cuDNN
- Average Forward pass: 2298.68 ms.
- Average Backward pass: 5825.2 ms.
- Average Forward-Backward: 8124.48 ms.
VGG_19Layers with cuDNN is (1.22x, 1.36x) slower than without cuDNN
- Average Forward pass: 3387.08 ms.
- Average Backward pass: 7928.3 ms.
- Average Forward-Backward: 11316.92 ms.
VGG_19Layers without cuDNN
- Average Forward pass: 2769.9 ms.
- Average Backward pass: 6850.64 ms.
- Average Forward-Backward: 9623.26 ms.

thatguymike · 2014-10-17T22:22:27Z

How do we interpret the numbers at the top from the numbers at the bottom? e.g. 22-36% slower on VGG_19 on a single GPU, but above you say 13.8x forward, 22x backward. In your individual timings you show GoogleNet with cuDNN faster, but at the top slower.

sguada · 2014-10-17T22:28:14Z

In the top, I'm comparing the timings of GoogleNet and VGG models with respect to Caffe_reference model.

That's means that GoogleNet with cuDNN is 2.8 times (3.6 times) slower in the forward (backward) than Caffe_reference with cuDNN.

But I also added the comparison between GoogleNet with cuDNN vs GoogleNet without cuDNN and with cuDNN is 1.6 times (1.4 times) faster in the forward (backward) than without cuDNN.

These analysis means that cuDNN helps for GoogleNet and Caffe_reference but hurst for VGG models.

And that Caffe_reference is the fastest (although not the best), GoogleNet is pretty fast (state of the art) and VGG is pretty slow (but also state of the art).

thatguymike · 2014-10-17T22:33:13Z

Now I understand, wording was a little confusing. ;-) VGG is an expensive network to train in several dimensions, but has neat attirbutes. Interesting that GoogleNet isn't "that bad" for training time (and memory footprint).

sguada · 2014-10-17T22:35:22Z

Yeah, I didn't want to mean that VGG are bad networks, they are great, and we have seen great results using them, they just require too many parameters, a lot of memory and are slow to train and test, but results are good 👍

amiralush · 2014-10-18T18:50:17Z

@sguada, thanks for the comparison. Can you please share the train/test networks definitions?
Also, if you can also display the loss/accuracy vs #iterations it would be helfpful. Have you noticed a different convergence rate or other differences when switching between caffe & cudnn?

mkudelski · 2014-10-21T07:29:32Z

@sguada, good job with the comparison! I can confirm that I obtained similar timings when playing with GoogLeNet architecture (when comparing it to the caffe_reference model).

I am curious about the GoogLeNet training procedure: assuming that you use the batch size of 128, how long does it take you to see any progress in learning (how many iterations)? And if you do see the progress, can I also ask about 1) the learning rate used, and 2) weights initialization? I would be glad to exchange some experience on that...

amiralush · 2014-10-21T07:43:16Z

@mkudelski I've implemented the GoogLeNet as well and am getting the same training times as reported by @sguada. The implementation is straight forward as described in their paper. The weights initialization is "xavier" and that's about it, works out of the box! you can see progress straight away after a couple of hundreds of iterations, if you don't than there's something wrong.
I've attached my training log for the first 20K iterations, this is for an imagenet scale dataset, not imagenet.

mkudelski · 2014-10-21T08:57:14Z

@amiralush Thanks for the info! BTW, do you also train on Tesla K40, with batch size of 128? Or maybe you use a smaller batch?

amiralush · 2014-10-21T09:42:56Z

Yes, 128 batch size, TeslaK40.

P.S
If you're short on memory you can trim some of the inception modules and losses. The convergence is pretty robust from my experiments.

mkudelski · 2014-10-21T10:32:55Z

@amiralush The very last question ralated to the plot: what was the learning rate value for this particular learning curve (I assume the rate was constant during first 20000 iterations...)? Thanks again :-)

sguada · 2014-10-21T13:03:52Z

@amiralush thanks for confirming my timings. It seems that before you uploaded a different graph containing the plots of other networks. Would like to talk about them? Are you plotting train or test loss?

@mkudelski For training GoogleNet I used batch_size: 32, as reported in the paper, I used batch_size: 128 for timing to make it easy the comparison.

okn2020 · 2014-10-23T14:20:58Z

Hi guys, could you post prototxts you used for googlenet (or add it to examples?) Thank you!

amiralush · 2014-10-27T09:27:14Z

@okn2020 I've made a PR #1367

okn2020 · 2014-10-27T11:23:23Z

@amiralush Thank you, will try this out!!

shengen · 2014-11-10T08:58:21Z

@sguada Hi sguada, would you please describe how to initialize the first four layers of Net D in VGG's paper? I just wonder that how could we initialize Net D with Net A, since the number of parameters in each layer of those two nets are different. Thanks a lot.

futurely · 2014-11-17T01:07:18Z

#1169 (comment)

mavenlin · 2014-11-25T15:19:58Z

CONCAT layer costs extra memory.
Put convolution result directly in the concatenated memory in a strided manner is fully doable with cudnn.

yulingzhou · 2014-12-12T09:34:37Z

@sguada Hi, would you share your prototxt and model of googlenet. And did you train the net in caffe manner? If not, could you share your training method with us?

sguada · 2014-12-19T15:11:53Z

Take a look at #1598 for my replica of GoogleNet, including the prototxt, solver and model.

andresromero · 2015-02-05T13:33:36Z

Hi Sergio!

I am testing your GoogleNet implementation with a personal dataset and I am running out of memory (I am using a GeRorce GTX 760 card with 2048 MB).

I have already tried to reduce the batch size (I even tested with batch_size: 1) but it is still running out memory. I was just wondering which Nvidia card did you use for your tests or how can I change my configuration files to run the GoogleNet on my card (for the AlexNet it runs flawlessly using batch_size: 96).

I would appreciate your help, thanks!

ducha-aiki · 2015-02-05T13:37:29Z

@andresromero
Try to turn off testing while learning. (comment out test_iter and test_interval in solver)

andresromero · 2015-02-06T14:20:30Z

Thanks @ducha-aiki it worked!

yuLiu24 · 2015-02-27T14:09:25Z

Hi @andresromero
I met the same problem with you. For VGGnet, I use the Titan(6GB) card, and the batchsize =1,
but it always runs out of memory. How to solve it ? (comment out test_iter and test_interval in solve does not work)

jmendozais · 2015-04-09T02:16:05Z

Hi, I am training the VGG16 model with a K20 4 GB card, but it works just for batch sizes <= 10. How can I train the model with greater batch sizes ?

ducha-aiki · 2015-04-09T06:01:07Z

Turn off testing -> ~2 times less memory consumption.

yuhan210 · 2016-03-17T21:01:35Z

Does anyone have any ideas why using cuDNN makes things slower for some networks (e.g., VGG)?

sjlee0407 · 2016-08-02T11:11:27Z

Hi, I want to know why forward pass is faster than backward pass. If you know about that, please tell me about that. thank you!

gurkirt · 2016-10-14T18:48:53Z

@sjlee7748 check here

What I have observed from my is that backward pass in faster than forward pass without cuDNN and other way around if you compiled using cuDNN.

I guess it depends upon the implementation.

hope it helps.
Cheers,
Gurkirt

raequin · 2016-11-11T16:47:40Z

Question: I'd like to train and test your GoogLeNet replica for my application where I have 512x512 grayscale images that can have one of four possible classifications, so can you point me in the direction of what I would need to modify in the prototxt for this situation? As you can see, I am new to this.

sguada added the in progress label Oct 17, 2014

sguada changed the title ~~Caffe Timings for VGG, GoogleNet, AlexNet with cuDNN~~ Caffe Timings for GoogleNet, VGG, AlexNet with cuDNN Oct 17, 2014

sguada mentioned this issue Oct 17, 2014

Caffe costs extra GPU memory #1242

Closed

amiralush mentioned this issue Oct 30, 2014

GoogLeNet training in Caffe #1367

Closed

sguada mentioned this issue Nov 30, 2014

Modular model definitions #1290

Closed

sguada closed this as completed Dec 19, 2014

sguada mentioned this issue Dec 19, 2014

Added bvlc_googlenet prototxt and weights #1598

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Caffe Timings for GoogleNet, VGG, AlexNet with cuDNN #1317

Caffe Timings for GoogleNet, VGG, AlexNet with cuDNN #1317

sguada commented Oct 17, 2014

thatguymike commented Oct 17, 2014

sguada commented Oct 17, 2014

thatguymike commented Oct 17, 2014

sguada commented Oct 17, 2014

amiralush commented Oct 18, 2014

mkudelski commented Oct 21, 2014

amiralush commented Oct 21, 2014

mkudelski commented Oct 21, 2014

amiralush commented Oct 21, 2014

mkudelski commented Oct 21, 2014

sguada commented Oct 21, 2014

okn2020 commented Oct 23, 2014

amiralush commented Oct 27, 2014

okn2020 commented Oct 27, 2014

shengen commented Nov 10, 2014

futurely commented Nov 17, 2014

mavenlin commented Nov 25, 2014

yulingzhou commented Dec 12, 2014

sguada commented Dec 19, 2014

andresromero commented Feb 5, 2015

ducha-aiki commented Feb 5, 2015

andresromero commented Feb 6, 2015

yuLiu24 commented Feb 27, 2015

jmendozais commented Apr 9, 2015

ducha-aiki commented Apr 9, 2015

yuhan210 commented Mar 17, 2016

sjlee0407 commented Aug 2, 2016

gurkirt commented Oct 14, 2016

raequin commented Nov 11, 2016

Caffe Timings for GoogleNet, VGG, AlexNet with cuDNN #1317

Caffe Timings for GoogleNet, VGG, AlexNet with cuDNN #1317

Comments

sguada commented Oct 17, 2014

thatguymike commented Oct 17, 2014

sguada commented Oct 17, 2014

thatguymike commented Oct 17, 2014

sguada commented Oct 17, 2014

amiralush commented Oct 18, 2014

mkudelski commented Oct 21, 2014

amiralush commented Oct 21, 2014

mkudelski commented Oct 21, 2014

amiralush commented Oct 21, 2014

mkudelski commented Oct 21, 2014

sguada commented Oct 21, 2014

okn2020 commented Oct 23, 2014

amiralush commented Oct 27, 2014

okn2020 commented Oct 27, 2014

shengen commented Nov 10, 2014

futurely commented Nov 17, 2014

mavenlin commented Nov 25, 2014

yulingzhou commented Dec 12, 2014

sguada commented Dec 19, 2014

andresromero commented Feb 5, 2015

ducha-aiki commented Feb 5, 2015

andresromero commented Feb 6, 2015

yuLiu24 commented Feb 27, 2015

jmendozais commented Apr 9, 2015

ducha-aiki commented Apr 9, 2015

yuhan210 commented Mar 17, 2016

sjlee0407 commented Aug 2, 2016

gurkirt commented Oct 14, 2016

raequin commented Nov 11, 2016