-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ddp_fix #270
ddp_fix #270
Conversation
Single node multi gpu works (tested with 2 and 4 gpu) |
Confirmed it works multi-node with exactly the same performance as an equivalent multi-gpu single node config :) (e.g. 2 node 1 gpu == 1 node 2 gpu) |
@smartdanny is it ready? |
Basically this commit can be considered ready. I wanted to also make some of the examples like mnist and knight ddp friendly "plug and play" but I am having a lot of trouble with the batch sampler. It seems pytorch lightning has an extremely hacky way to allow a sampler to work in ddp (see Lightning-AI/pytorch-lightning#13640). I can get it working, but it would probably take a bit of work, and im not sure if its worth it at the moment. The options are:
Let me know what you think is best @mosheraboh @SagiPolaczek |
@smartdanny , we want one example with our batch sampler that works - I guess that it means to use datamodule. |
Last thing is just to make a Knight ddp ready with sampler my making an datamodule. Feel free to merge or I can also add it to this PR. @mosheraboh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
Go ahead and merge it.
No description provided.