there were no acceleration effect using Multi GPU #877

alphadl · 2018-08-03T12:38:18Z

during using multiGPU training , I fond that there is no speeding effect~~
(those two experiments equipped same hyper-parameters and frames, in addition to same preprocessed data)

Experiment1: ( batch_size: 280 using 1 GPU ) average 14~18 seconds / 50 batch

[2018-08-03 18:07:31,781 INFO] Step 50/55000; acc: 1.33; ppl: 24568.27; xent: 10.11; lr: 1.00000; 23006/21669 tok/s; 18 sec
[2018-08-03 18:07:46,144 INFO] Step 100/55000; acc: 3.70; ppl: 68305.64; xent: 11.13; lr: 1.00000; 12894/13920 tok/s; 32 sec
[2018-08-03 18:07:59,929 INFO] Step 150/55000; acc: 7.27; ppl: 6525.28; xent: 8.78; lr: 1.00000; 10488/9970 tok/s; 46 sec

Experiment2: (batch_size:280 using 8 GPU) average 33~44 seconds / 50 batch

[2018-08-03 18:11:14,990 INFO] Step 50/55000; acc: 5.32; ppl: 11557.90; xent: 9.36; lr: 1.00000; 53166/56374 tok/s; 44 sec
[2018-08-03 18:11:48,445 INFO] Step 100/55000; acc: 6.32; ppl: 5841.12; xent: 8.67; lr: 1.00000; 68049/73648 tok/s; 77 sec
[2018-08-03 18:12:22,650 INFO] Step 150/55000; acc: 6.93; ppl: 4129.08; xent: 8.33; lr: 1.00000; 59233/68822 tok/s; 111 sec

vince62s · 2018-08-03T12:46:36Z

your batch size is 8 times bigger, so it is faster.
you'll see it with the tok/s numbers.

alphadl · 2018-08-03T12:53:26Z

ummm .. both of these batch_size are 280 , I just changed the number of GPU ( 1 vs 8 ). what I can see is tok/s was became bigger and training_time every 50 steps was longer

alphadl · 2018-08-03T12:57:10Z

what's the meaning of ##n_src_words## and ##n_words## every step? why with the number of gpu increasing, the number of ##n_src_words## and ##n_words## show same trend ?

alphadl · 2018-08-03T13:00:33Z

I suppose that with the GPU increasing , more sentences and words were loaded at each step. and the time each step using will reduce the eight times , but ... conversely each step 's time consuming is increasing

vince62s · 2018-08-03T13:36:44Z

you're are mistaken.
when you pass batch_size 280 on 8 GPUs, the actual batch_size is 8x280.

alphadl · 2018-08-03T14:01:15Z

So , I can reduce the train_steps to 1/8 and still achieve comparable results?

BTW, if I wanna to figure out the acceleration effect, Should I set the batch_size to 280/8 when training at multi GPU to make comparison with training at single GPU?

vince62s · 2018-08-03T14:37:10Z

read more here for instace: tensorflow/tensor2tensor#444
closing this for now.

vince62s closed this as completed Aug 3, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

there were no acceleration effect using Multi GPU #877

there were no acceleration effect using Multi GPU #877

alphadl commented Aug 3, 2018

vince62s commented Aug 3, 2018

alphadl commented Aug 3, 2018

alphadl commented Aug 3, 2018

alphadl commented Aug 3, 2018

vince62s commented Aug 3, 2018

alphadl commented Aug 3, 2018 •

edited

Loading

vince62s commented Aug 3, 2018

there were no acceleration effect using Multi GPU #877

there were no acceleration effect using Multi GPU #877

Comments

alphadl commented Aug 3, 2018

vince62s commented Aug 3, 2018

alphadl commented Aug 3, 2018

alphadl commented Aug 3, 2018

alphadl commented Aug 3, 2018

vince62s commented Aug 3, 2018

alphadl commented Aug 3, 2018 • edited Loading

vince62s commented Aug 3, 2018

alphadl commented Aug 3, 2018 •

edited

Loading