-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data parallel training support #79
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…NCCL flag when constructing GeNNModel
…y calculate metrics
…is batched at all * divide batch size by number of ranks
tnowotny
approved these changes
Dec 7, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wow - that looks surprisingly simple and elegant.
I was first confused where the checkpoints will go but I see now that only rank 0 is writing so that's fine.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Pleasingly, this was actually very easy to do! Basically:
mpi4py
implementation for now as that's what my old code usedCompiledNetwork
does some basic stuff if a communicator is provided:Compiler
subdivides batches across ranks if a communicator is provided and turns on the magic NCCL flag so GeNN generates the additional bits of code (NCCL multi-GPU reductions genn#449)SparseCategoricalAccuracy
get passed the communicator and use it toOther than that, it's all just passing the communicator around and a few places where the 'full' batch size is used rather than the scaled down one e.g. in the EventProp compiler to scale stuff. I've also added a couple of additional examples (at some point I need to tidy the examples up a bit) which demonstrate how you need to change your code to run across multiple GPUs - mostly just splitting the dataset and turning off progress bars etc apart from on the first rank.