-
Notifications
You must be signed in to change notification settings - Fork 901
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About GPU utilization #965
Comments
Why is it necessary to use transform_models_if_DDP(models) after accelerator.prepare? |
I think this may also break the multi-machine communication. Maybe the new PR can fix this. I only have two GPUs on one machine and can't test in multi-machine training. Could you check if the update works on the multi-machine training? |
I removed |
In multi-machine, multi-GPU training, the InfiniBand (IB) network shows no traffic, suggesting that avg_loss = accelerator.gather(loss).mean() might not be used.
The text was updated successfully, but these errors were encountered: