-
-
Notifications
You must be signed in to change notification settings - Fork 16.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError: CUDA error: unspecified launch failure #1752
Comments
Here is the error for single GPU training:
I think the problem comes from the torch.cuda.synchronize() |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Hello @jerryWTMH ! Did you find a solution to this error? |
Hi, I also encounter this issue randomly when trying to run experiments on GPUs that already have processes running on them, but to me it happens for this reason:
I think the problem comes from the fact we are all using CUDA 10.x, and I don't think they will come back to this issue because it is outdated. We should probably update CUDA to a more recent version |
❔Question
I used the DDP mode to train my data but whenever I tried the code couldn't run over 30 epochs and it would show RuntimeError: CUDA error: unspecified launch failure. I have tried to train with a single GPU or even redownload the yolov5 repository but the problem still exists.
Here is the information about my training:
training data: 35000 images
validation data: 9100 images
Python: 3.8.3
torch: 1.7.1
CUDA: 10.2
Additional context
The whole content of the error:
The text was updated successfully, but these errors were encountered: