-
Notifications
You must be signed in to change notification settings - Fork 334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add DistributedGeoSamplers #305
Comments
Can you give some examples? I would like to see how they implement this. |
Okay, so all of the examples above are using
The
TL;DR: So this feature is not used for reproducibility/determinism, but for randomness across replicas. It's unclear to me why PyTorch needs this to prevent subsequent iterations from having the same ordering. Until I understand that, I'm not sure whether we need this feature or not. Our samplers definitely weren't created with distributed sampling in mind, but they should still work since our datasets don't have a finite length that needs to be subsampled (other than P.S. The other non-distributed samplers still have a |
PyTorch Lightning automatically wraps samplers as DistributedSamplers so we don't need to handle any of this since we use PL. You would only need to mess with this if you were rolling your own distributed training scripts. See https://pytorch-lightning.readthedocs.io/en/latest/common/trainer.html#replace-sampler-ddp |
@RitwikGupta does PL satisfy your use case? |
@adamjstewart not entirely, I'd have to refactor this entire FB codebase into PyTorch Lightning, which would be a massive pain. |
Adding a set_epoch method to our samplers wouldn't actually solve this. The above links use DistributedSamplers which splits up dataset indices to be sampled across nodes/gpus. I've done this for another project but we would need to create our own modification of a DistributedSamplerWrapper similarly to this Edit: |
Right, I should say that |
I think this is still an issue. I am unable to train using multiple gpus when I am leveraging a RandomBatchGeoSampler |
set_epoch
is a used by a lot of codebases in an effort to be deterministicThe text was updated successfully, but these errors were encountered: