Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: CUDA error: no kernel image is available for execution on the device (nms_cuda at /tmp/pip-req-build-9d9zypi6/torchvision/csrc/cuda/nms_cuda.cu:127) #281

Closed
DLLXW opened this issue Jul 3, 2020 · 3 comments

Comments

@DLLXW
Copy link

DLLXW commented Jul 3, 2020

Has anyone meet this error?
`/home/admins/anaconda3/envs/yolov4/bin/python /home/admins/qyl/yolo/yolov5/train.py
Apex recommended for faster mixed precision training: https://github.com/NVIDIA/apex
{'lr0': 0.01, 'momentum': 0.937, 'weight_decay': 0.0005, 'giou': 0.05, 'cls': 0.58, 'cls_pw': 1.0, 'obj': 1.0, 'obj_pw': 1.0, 'iou_t': 0.2, 'anchor_t': 4.0, 'fl_gamma': 0.0, 'hsv_h': 0.014, 'hsv_s': 0.68, 'hsv_v': 0.36, 'degrees': 0.0, 'translate': 0.0, 'scale': 0.5, 'shear': 0.0}
Namespace(adam=False, batch_size=32, bucket='', cache_images=False, cfg='models/yolov5s.yaml', data='data/trash.yaml', device='0', epochs=300, evolve=False, img_size=[416, 416], multi_scale=False, name='', noautoanchor=False, nosave=False, notest=False, rect=False, resume=False, single_cls=False, weights='')
Using CUDA device0 _CudaDeviceProperties(name='GeForce RTX 2070 SUPER', total_memory=7981MB)

Start Tensorboard with "tensorboard --logdir=runs", view at http://localhost:6006/

          from  n    params  module                                  arguments                     

0 -1 1 3520 models.common.Focus [3, 32, 3]
1 -1 1 18560 models.common.Conv [32, 64, 3, 2]
2 -1 1 19904 models.common.BottleneckCSP [64, 64, 1]
3 -1 1 73984 models.common.Conv [64, 128, 3, 2]
4 -1 1 161152 models.common.BottleneckCSP [128, 128, 3]
5 -1 1 295424 models.common.Conv [128, 256, 3, 2]
6 -1 1 641792 models.common.BottleneckCSP [256, 256, 3]
7 -1 1 1180672 models.common.Conv [256, 512, 3, 2]
8 -1 1 656896 models.common.SPP [512, 512, [5, 9, 13]]
9 -1 1 1248768 models.common.BottleneckCSP [512, 512, 1, False]
10 -1 1 131584 models.common.Conv [512, 256, 1, 1]
11 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
12 [-1, 6] 1 0 models.common.Concat [1]
13 -1 1 378624 models.common.BottleneckCSP [512, 256, 1, False]
14 -1 1 33024 models.common.Conv [256, 128, 1, 1]
15 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
16 [-1, 4] 1 0 models.common.Concat [1]
17 -1 1 95104 models.common.BottleneckCSP [256, 128, 1, False]
18 -1 1 18963 torch.nn.modules.conv.Conv2d [128, 147, 1, 1]
19 -2 1 147712 models.common.Conv [128, 128, 3, 2]
20 [-1, 14] 1 0 models.common.Concat [1]
21 -1 1 313088 models.common.BottleneckCSP [256, 256, 1, False]
22 -1 1 37779 torch.nn.modules.conv.Conv2d [256, 147, 1, 1]
23 -2 1 590336 models.common.Conv [256, 256, 3, 2]
24 [-1, 10] 1 0 models.common.Concat [1]
25 -1 1 1248768 models.common.BottleneckCSP [512, 512, 1, False]
26 -1 1 75411 torch.nn.modules.conv.Conv2d [512, 147, 1, 1]
27 [-1, 22, 18] 1 0 models.yolo.Detect [44, [[116, 90, 156, 198, 373, 326], [30, 61, 62, 45, 59, 119], [10, 13, 16, 30, 33, 23]]]
Model Summary: 191 layers, 7.37106e+06 parameters, 7.37106e+06 gradients

Optimizer groups: 62 .bias, 70 conv.weight, 59 other
Caching labels /home/admins/qyl/yolo/yolov5/trashdata/labels/train.npy (13442 found, 0 missing, 0 empty, 0 duplicate, for 13442 images): 100%|██████████| 13442/13442 [00:00<00:00, 19863.33it/s]
Caching labels /home/admins/qyl/yolo/yolov5/trashdata/labels/val.npy (1494 found, 0 missing, 0 empty, 0 duplicate, for 1494 images): 100%|██████████| 1494/1494 [00:00<00:00, 20504.88it/s]

Analyzing anchors... Best Possible Recall (BPR) = 0.9995
Image sizes 416 train, 416 test
Using 8 dataloader workers
Starting training for 300 epochs...

 Epoch   gpu_mem      GIoU       obj       cls     total   targets  img_size
 0/299    0.455G   0.08566    0.1208    0.1006    0.3071         4       416: 100%|██████████| 421/421 [02:18<00:00,  3.05it/s]
           Class      Images     Targets           P           R      [email protected]  [email protected]:.95:   0%|          | 0/47 [00:01<?, ?it/s]

Traceback (most recent call last):
File "/home/admins/qyl/yolo/yolov5/train.py", line 394, in
train(hyp)
File "/home/admins/qyl/yolo/yolov5/train.py", line 299, in train
dataloader=testloader)
File "/home/admins/qyl/yolo/yolov5/test.py", line 97, in test
output = non_max_suppression(inf_out, conf_thres=conf_thres, iou_thres=iou_thres, merge=merge)
File "/home/admins/qyl/yolo/yolov5/utils/utils.py", line 605, in non_max_suppression
i = torchvision.ops.boxes.nms(boxes, scores, iou_thres)
File "/home/admins/anaconda3/envs/yolov4/lib/python3.7/site-packages/torchvision/ops/boxes.py", line 33, in nms
return _C.nms(boxes, scores, iou_threshold)
RuntimeError: CUDA error: no kernel image is available for execution on the device (nms_cuda at /tmp/pip-req-build-9d9zypi6/torchvision/csrc/cuda/nms_cuda.cu:127)
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 0x6d (0x7f7399472e7d in /home/admins/anaconda3/envs/yolov4/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: nms_cuda(at::Tensor const&, at::Tensor const&, float) + 0x8d1 (0x7f7361174ece in /home/admins/anaconda3/envs/yolov4/lib/python3.7/site-packages/torchvision/_C.so)
frame #2: nms(at::Tensor const&, at::Tensor const&, float) + 0x183 (0x7f7361138ed7 in /home/admins/anaconda3/envs/yolov4/lib/python3.7/site-packages/torchvision/_C.so)
frame #3: + 0x79cf5 (0x7f7361152cf5 in /home/admins/anaconda3/envs/yolov4/lib/python3.7/site-packages/torchvision/_C.so)
frame #4: + 0x765b0 (0x7f736114f5b0 in /home/admins/anaconda3/envs/yolov4/lib/python3.7/site-packages/torchvision/_C.so)
frame #5: + 0x70d1e (0x7f7361149d1e in /home/admins/anaconda3/envs/yolov4/lib/python3.7/site-packages/torchvision/_C.so)
frame #6: + 0x70fc2 (0x7f7361149fc2 in /home/admins/anaconda3/envs/yolov4/lib/python3.7/site-packages/torchvision/_C.so)
frame #7: + 0x5be4a (0x7f7361134e4a in /home/admins/anaconda3/envs/yolov4/lib/python3.7/site-packages/torchvision/_C.so)
frame #8: _PyMethodDef_RawFastCallKeywords + 0x264 (0x55e0fbbf6c94 in /home/admins/anaconda3/envs/yolov4/bin/python)
frame #9: _PyCFunction_FastCallKeywords + 0x21 (0x55e0fbbf6db1 in /home/admins/anaconda3/envs/yolov4/bin/python)
frame #10: _PyEval_EvalFrameDefault + 0x4dee (0x55e0fbc625be in /home/admins/anaconda3/envs/yolov4/bin/python)
frame #11: _PyFunction_FastCallKeywords + 0xfb (0x55e0fbbf620b in /home/admins/anaconda3/envs/yolov4/bin/python)
frame #12: _PyEval_EvalFrameDefault + 0x4a59 (0x55e0fbc62229 in /home/admins/anaconda3/envs/yolov4/bin/python)
frame #13: _PyEval_EvalCodeWithName + 0x2f9 (0x55e0fbba62b9 in /home/admins/anaconda3/envs/yolov4/bin/python)
frame #14: _PyFunction_FastCallKeywords + 0x387 (0x55e0fbbf6497 in /home/admins/anaconda3/envs/yolov4/bin/python)
frame #15: _PyEval_EvalFrameDefault + 0x14ea (0x55e0fbc5ecba in /home/admins/anaconda3/envs/yolov4/bin/python)
frame #16: _PyEval_EvalCodeWithName + 0xb40 (0x55e0fbba6b00 in /home/admins/anaconda3/envs/yolov4/bin/python)
frame #17: _PyFunction_FastCallKeywords + 0x387 (0x55e0fbbf6497 in /home/admins/anaconda3/envs/yolov4/bin/python)
frame #18: _PyEval_EvalFrameDefault + 0x14ea (0x55e0fbc5ecba in /home/admins/anaconda3/envs/yolov4/bin/python)
frame #19: _PyEval_EvalCodeWithName + 0xb40 (0x55e0fbba6b00 in /home/admins/anaconda3/envs/yolov4/bin/python)
frame #20: _PyFunction_FastCallKeywords + 0x387 (0x55e0fbbf6497 in /home/admins/anaconda3/envs/yolov4/bin/python)
frame #21: _PyEval_EvalFrameDefault + 0x416 (0x55e0fbc5dbe6 in /home/admins/anaconda3/envs/yolov4/bin/python)
frame #22: _PyEval_EvalCodeWithName + 0x2f9 (0x55e0fbba62b9 in /home/admins/anaconda3/envs/yolov4/bin/python)
frame #23: PyEval_EvalCodeEx + 0x44 (0x55e0fbba71d4 in /home/admins/anaconda3/envs/yolov4/bin/python)
frame #24: PyEval_EvalCode + 0x1c (0x55e0fbba71fc in /home/admins/anaconda3/envs/yolov4/bin/python)
frame #25: + 0x22bf44 (0x55e0fbcbcf44 in /home/admins/anaconda3/envs/yolov4/bin/python)
frame #26: PyRun_FileExFlags + 0xa1 (0x55e0fbcc72b1 in /home/admins/anaconda3/envs/yolov4/bin/python)
frame #27: PyRun_SimpleFileExFlags + 0x1c3 (0x55e0fbcc74a3 in /home/admins/anaconda3/envs/yolov4/bin/python)
frame #28: + 0x2375d5 (0x55e0fbcc85d5 in /home/admins/anaconda3/envs/yolov4/bin/python)
frame #29: _Py_UnixMain + 0x3c (0x55e0fbcc86fc in /home/admins/anaconda3/envs/yolov4/bin/python)
frame #30: __libc_start_main + 0xf0 (0x7f73c9529830 in /lib/x86_64-linux-gnu/libc.so.6)
frame #31: + 0x1dc3c0 (0x55e0fbc6d3c0 in /home/admins/anaconda3/envs/yolov4/bin/python)

Process finished with exit code 1
`
pytorch1.3.1
torchvision0.4.2
cuda10.0

@github-actions
Copy link
Contributor

github-actions bot commented Jul 3, 2020

Hello @DLLXW, thank you for your interest in our work! Please visit our Custom Training Tutorial to get started, and see our Jupyter Notebook Open In Colab, Docker Image, and Google Cloud Quickstart Guide for example environments.

If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom model or data training question, please note that Ultralytics does not provide free personal support. As a leader in vision ML and AI, we do offer professional consulting, from simple expert advice up to delivery of fully customized, end-to-end production solutions for our clients, such as:

  • Cloud-based AI systems operating on hundreds of HD video streams in realtime.
  • Edge AI integrated into custom iOS and Android apps for realtime 30 FPS video inference.
  • Custom data training, hyperparameter evolution, and model exportation to any destination.

For more information please visit https://www.ultralytics.com.

@DLLXW
Copy link
Author

DLLXW commented Jul 3, 2020

Thanks!I have solved it,it seems pytorch1.3 doesn't work,when i change it to 1.4,it work well.

@glenn-jocher
Copy link
Member

@DLLXW requirements are shown in readme section, suggest following them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants