RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR #2236

abuelgasimsaadeldin · 2021-02-17T07:59:43Z

❔Question

I receive the following run time error when trying to run detect.py using yolov5s.pt pre-trained weights locally and using a webcam as source. This is my first time experiencing this issue and I had no problems previously using the same virtual conda environment. I just recently installed CUDA 10.2 together with CuDNN and i'm not sure if that could have been the cause but any help would be much appreciated. Thank you.

Additional context

My environment:
Python- 3.8.0
torch- 1.7.0+cu101
torchvision- 0.8.1+cu101

You can try to repro this exception using the following code snippet. If that doesn't trigger the error, please include your original repro script when reporting this issue.

import torch
torch.backends.cuda.matmul.allow_tf32 = True
torch.backends.cudnn.benchmark = True
torch.backends.cudnn.deterministic = False
torch.backends.cudnn.allow_tf32 = True
data = torch.randn([1, 256, 30, 40], dtype=torch.half, device='cuda', requires_grad=True)
net = torch.nn.Conv2d(256, 255, kernel_size=[1, 1], padding=[0, 0], stride=[1, 1], dilation=[1, 1], groups=1)
net = net.cuda().half()
out = net(data)
out.backward(torch.randn_like(out))
torch.cuda.synchronize()

ConvolutionParams
data_type = CUDNN_DATA_HALF
padding = [0, 0, 0]
stride = [1, 1, 0]
dilation = [1, 1, 0]
groups = 1
deterministic = false
allow_tf32 = true
input: TensorDescriptor 00000174F151A320
type = CUDNN_DATA_HALF
nbDims = 4
dimA = 1, 256, 30, 40,
strideA = 307200, 1200, 40, 1,
output: TensorDescriptor 00000174F151B660
type = CUDNN_DATA_HALF
nbDims = 4
dimA = 1, 255, 30, 40,
strideA = 306000, 1200, 40, 1,
weight: FilterDescriptor 00000174F14EA8E0
type = CUDNN_DATA_HALF
tensor_format = CUDNN_TENSOR_NCHW
nbDims = 4
dimA = 255, 256, 1, 1,
Pointer addresses:
input: 0000000923400000
output: 000000091D513000
weight: 000000091D5D3800

abuelgasimsaadeldin · 2021-02-17T10:26:17Z

UPDATE: I have looked into PR #1555 , and have tried changing pin_memory=True to pin_memory=False in utils/datasets.py but the error still persists.

I have tried commenting the line pin_memory=True out entirely and my webcam then turns on for a few seconds (no detections being made) and then I get the following Error message:

Traceback (most recent call last):
File "detect.py", line 175, in
detect()
File "detect.py", line 75, in detect
pred = non_max_suppression(pred, opt.conf_thres, opt.iou_thres, classes=opt.classes, agnostic=opt.agnostic_nms)
File "C:\Users\chiaw\yolov5\utils\general.py", line 417, in non_max_suppression
x = x[xc[xi]] # confidence
RuntimeError: CUDA error: an illegal instruction was encountered

glenn-jocher · 2021-02-17T19:47:46Z

@abuelgasimsaadeldin I would try CPU inference, and if that works fine there is likely a CUDA environment issue:
python detect.py --device cpu

abuelgasimsaadeldin · 2021-02-18T06:32:56Z

Hi @glenn-jocher,

Thanks for the prompt reply, I did try running using cpu and have no issue, I'm wondering if this is a bug or it's just something wrong with my system (I have created several virtual environments with the exact same library versions and reproduced this exact same error).

I am also curious as to what the supported CUDA version for yolov5 is as it seems I have not come across it anywhere in the repo. Is it fine using version 10.2 of pytorch and torchvision or should I downgrade to version 10.1 or are both the same.

Thanks again @glenn-jocher!

glenn-jocher · 2021-02-18T08:05:04Z

@abuelgasimsaadeldin CUDA compatibility is a pytorch matter, not a YOLOv5 matter. For exact install instructions for your specific CUDA version see https://pytorch.org/get-started/locally/, or use on of our verified environments below.

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Google Colab and Kaggle notebooks with free GPU:
Google Cloud Deep Learning VM. See GCP Quickstart Guide
Amazon Deep Learning AMI. See AWS Quickstart Guide
Docker Image. See Docker Quickstart Guide

Status

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are passing. These tests evaluate proper operation of basic YOLOv5 functionality, including training (train.py), testing (test.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu.

github-actions · 2021-03-21T00:45:02Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

abuelgasimsaadeldin added the question Further information is requested label Feb 17, 2021

github-actions bot added the Stale Stale and schedule for closing soon label Mar 21, 2021

github-actions bot closed this as completed Mar 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR #2236

RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR #2236

abuelgasimsaadeldin commented Feb 17, 2021 •

edited

Loading

abuelgasimsaadeldin commented Feb 17, 2021

glenn-jocher commented Feb 17, 2021

abuelgasimsaadeldin commented Feb 18, 2021

glenn-jocher commented Feb 18, 2021 •

edited by UltralyticsAssistant

Loading

github-actions bot commented Mar 21, 2021

RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR #2236

RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR #2236

Comments

abuelgasimsaadeldin commented Feb 17, 2021 • edited Loading

❔Question

Additional context

abuelgasimsaadeldin commented Feb 17, 2021

glenn-jocher commented Feb 17, 2021

abuelgasimsaadeldin commented Feb 18, 2021

glenn-jocher commented Feb 18, 2021 • edited by UltralyticsAssistant Loading

Environments

Status

github-actions bot commented Mar 21, 2021

abuelgasimsaadeldin commented Feb 17, 2021 •

edited

Loading

glenn-jocher commented Feb 18, 2021 •

edited by UltralyticsAssistant

Loading