Skip to content
This repository has been archived by the owner on Jul 7, 2023. It is now read-only.

Large training batches on limited GPU hardware #750

Closed
wants to merge 5 commits into from
Closed

Large training batches on limited GPU hardware #750

wants to merge 5 commits into from

Conversation

fstahlberg
Copy link
Contributor

This PR adds a LargebatchAdam optimizer, which accumulates gradients over n batches and applies the Adam learning rule every n batches on the accumulated gradients. This makes it possible to arbitrarily increase the effective batch size / number of GPUs at cost of more training iterations. This technique is useful if the number of physical GPUs is limited or the GPU memory does not allow to increase the batch size any further. Large batch / multi-GPU training is often important for Transformer training as reported here.

@googlebot
Copy link

We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google.
In order to pass this check, please resolve this problem and have the pull request author add another comment and the bot will run again. If the bot doesn't comment, it means it doesn't think anything has changed.

@fstahlberg fstahlberg changed the title Large training batches on limit GPU hardware Large training batches on limited GPU hardware Apr 26, 2018
@stefan-it
Copy link
Contributor

@fstahlberg Looks really cool! Could you provide some results of experiments, e.g. translation problem with BLEU score and a comparison with one vs. multiple GPUs?

@fstahlberg
Copy link
Contributor Author

@stefan-it We'll have an ACL2018 paper on this with a small discussion. We'll post the camera-ready version on arXiv in the next few days, and I'll link it from here.

@fstahlberg
Copy link
Contributor Author

Closing this in favour of #754 as some commits were submitted with an invalid e-mail address which caused problems with the Google CLA, and my attempts to amend the e-mail address in previous commits were not successful.

@fstahlberg fstahlberg closed this Apr 29, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants