-
-
Notifications
You must be signed in to change notification settings - Fork 16.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi GPU RuntimeError: Expected device cuda:0 but got device cuda:7 #15
Comments
Hello @zidanexu, thank you for your interest in our work! Please visit our Custom Training Tutorial to get started, and see our Google Colab Notebook, Docker Image, and GCP Quickstart Guide for example environments. If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you. If this is a custom model or data training question, please note that Ultralytics does not provide free personal support. As a leader in vision ML and AI, we do offer professional consulting, from simple expert advice up to delivery of fully customized, end-to-end production solutions for our clients, such as:
For more information please visit https://www.ultralytics.com. |
@zidanexu thank you for your bug report. We can successfully reproduce this issue. It appears to be caused by self.grid, a Detect() layer list, which is sent to a device during training. It is not transferred like normal parameters/buffers because it is not in the layer buffer list as it is a list rather than a tensor. We will look into this. |
this error still exists on multi GPU training. on pytorch 1.5
it seems not fix yet, any ideas? |
Also changed constants to hyperparameters
hi @glenn-jocher
I try to reproduce training result.
using command above , 8 GPU Tela P40.
when finish 1 epoch training., The test process broken.
The text was updated successfully, but these errors were encountered: