Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

there were no acceleration effect using Multi GPU #877

Closed
alphadl opened this issue Aug 3, 2018 · 7 comments
Closed

there were no acceleration effect using Multi GPU #877

alphadl opened this issue Aug 3, 2018 · 7 comments

Comments

@alphadl
Copy link
Contributor

alphadl commented Aug 3, 2018

during using multiGPU training , I fond that there is no speeding effect~~
(those two experiments equipped same hyper-parameters and frames, in addition to same preprocessed data)

Experiment1: ( batch_size: 280 using 1 GPU ) average 14~18 seconds / 50 batch

[2018-08-03 18:07:31,781 INFO] Step 50/55000; acc: 1.33; ppl: 24568.27; xent: 10.11; lr: 1.00000; 23006/21669 tok/s; 18 sec
[2018-08-03 18:07:46,144 INFO] Step 100/55000; acc: 3.70; ppl: 68305.64; xent: 11.13; lr: 1.00000; 12894/13920 tok/s; 32 sec
[2018-08-03 18:07:59,929 INFO] Step 150/55000; acc: 7.27; ppl: 6525.28; xent: 8.78; lr: 1.00000; 10488/9970 tok/s; 46 sec

Experiment2: (batch_size:280 using 8 GPU) average 33~44 seconds / 50 batch

[2018-08-03 18:11:14,990 INFO] Step 50/55000; acc: 5.32; ppl: 11557.90; xent: 9.36; lr: 1.00000; 53166/56374 tok/s; 44 sec
[2018-08-03 18:11:48,445 INFO] Step 100/55000; acc: 6.32; ppl: 5841.12; xent: 8.67; lr: 1.00000; 68049/73648 tok/s; 77 sec
[2018-08-03 18:12:22,650 INFO] Step 150/55000; acc: 6.93; ppl: 4129.08; xent: 8.33; lr: 1.00000; 59233/68822 tok/s; 111 sec

@vince62s
Copy link
Member

vince62s commented Aug 3, 2018

your batch size is 8 times bigger, so it is faster.
you'll see it with the tok/s numbers.

@alphadl
Copy link
Contributor Author

alphadl commented Aug 3, 2018

ummm .. both of these batch_size are 280 , I just changed the number of GPU ( 1 vs 8 ). what I can see is tok/s was became bigger and training_time every 50 steps was longer

@alphadl
Copy link
Contributor Author

alphadl commented Aug 3, 2018

what's the meaning of ##n_src_words## and ##n_words## every step? why with the number of gpu increasing, the number of ##n_src_words## and ##n_words## show same trend ?

@alphadl
Copy link
Contributor Author

alphadl commented Aug 3, 2018

I suppose that with the GPU increasing , more sentences and words were loaded at each step. and the time each step using will reduce the eight times , but ... conversely each step 's time consuming is increasing

@vince62s
Copy link
Member

vince62s commented Aug 3, 2018

you're are mistaken.
when you pass batch_size 280 on 8 GPUs, the actual batch_size is 8x280.

@alphadl
Copy link
Contributor Author

alphadl commented Aug 3, 2018

So , I can reduce the train_steps to 1/8 and still achieve comparable results?

BTW, if I wanna to figure out the acceleration effect, Should I set the batch_size to 280/8 when training at multi GPU to make comparison with training at single GPU?

@vince62s
Copy link
Member

vince62s commented Aug 3, 2018

read more here for instace: tensorflow/tensor2tensor#444
closing this for now.

@vince62s vince62s closed this as completed Aug 3, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants