Large training batches on limited GPU hardware #750

fstahlberg · 2018-04-26T17:33:21Z

This PR adds a LargebatchAdam optimizer, which accumulates gradients over n batches and applies the Adam learning rule every n batches on the accumulated gradients. This makes it possible to arbitrarily increase the effective batch size / number of GPUs at cost of more training iterations. This technique is useful if the number of physical GPUs is limited or the GPU memory does not allow to increase the batch size any further. Large batch / multi-GPU training is often important for Transformer training as reported here.

googlebot · 2018-04-26T17:33:24Z

We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google.
In order to pass this check, please resolve this problem and have the pull request author add another comment and the bot will run again. If the bot doesn't comment, it means it doesn't think anything has changed.

stefan-it · 2018-04-26T19:06:45Z

@fstahlberg Looks really cool! Could you provide some results of experiments, e.g. translation problem with BLEU score and a comparison with one vs. multiple GPUs?

…n times more training steps

fstahlberg · 2018-04-27T07:23:14Z

@stefan-it We'll have an ACL2018 paper on this with a small discussion. We'll post the camera-ready version on arXiv in the next few days, and I'll link it from here.

…n times more training steps

fstahlberg · 2018-04-29T08:01:32Z

Closing this in favour of #754 as some commits were submitted with an invalid e-mail address which caused problems with the Google CLA, and my attempts to amend the e-mail address in previous commits were not successful.

fstahlberg changed the title ~~Large training batches on limit GPU hardware~~ Large training batches on limited GPU hardware Apr 26, 2018

LargebatchAdam optimizer for simulating n times more GPUs at cost of …

7793e2b

…n times more training steps

fstahlberg and others added 4 commits April 29, 2018 00:03

LargebatchAdam optimizer for simulating n times more GPUs at cost of …

d5ae429

…n times more training steps

Merge branch 'master' of https://github.com/ucam-smt/tensor2tensor

2ab1f7a

LargebatchAdam optimizer for simulating n times more GPUs at cost of …

7d940c3

…n times more training steps

Merge branch 'master' of https://github.com/ucam-smt/tensor2tensor

9c4ca6b

fstahlberg mentioned this pull request Apr 29, 2018

Large training batches on limited GPU hardware #754

Merged

fstahlberg closed this Apr 29, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large training batches on limited GPU hardware #750

Large training batches on limited GPU hardware #750

fstahlberg commented Apr 26, 2018

googlebot commented Apr 26, 2018

stefan-it commented Apr 26, 2018

fstahlberg commented Apr 27, 2018

fstahlberg commented Apr 29, 2018

Large training batches on limited GPU hardware #750

Large training batches on limited GPU hardware #750

Conversation

fstahlberg commented Apr 26, 2018

googlebot commented Apr 26, 2018

stefan-it commented Apr 26, 2018

fstahlberg commented Apr 27, 2018

fstahlberg commented Apr 29, 2018