Skip to content

Commit

Permalink
Update DDP backend if dist.is_nccl_available() (#3705)
Browse files Browse the repository at this point in the history
  • Loading branch information
glenn-jocher authored Jun 20, 2021
1 parent fbf41e0 commit e8810a5
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion train.py
Original file line number Diff line number Diff line change
Expand Up @@ -539,7 +539,7 @@ def main(opt):
assert torch.cuda.device_count() > LOCAL_RANK, 'insufficient CUDA devices for DDP command'
torch.cuda.set_device(LOCAL_RANK)
device = torch.device('cuda', LOCAL_RANK)
dist.init_process_group(backend="gloo", timeout=timedelta(seconds=60))
dist.init_process_group(backend="nccl" if dist.is_nccl_available() else "gloo", timeout=timedelta(seconds=60))
assert opt.batch_size % WORLD_SIZE == 0, '--batch-size must be multiple of CUDA device count'
assert not opt.image_weights, '--image-weights argument is not compatible with DDP training'

Expand Down

0 comments on commit e8810a5

Please sign in to comment.