Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apollo 5.0 cuda error #11141

Closed
zhouyapengzi opened this issue Apr 27, 2020 · 3 comments
Closed

Apollo 5.0 cuda error #11141

zhouyapengzi opened this issue Apr 27, 2020 · 3 comments
Assignees
Labels
Module: Perception Indicates perception related issues

Comments

@zhouyapengzi
Copy link

Describe the bug
Cuda error when running Apollo5.0

To Reproduce
Steps to reproduce the behavior:

  1. in docker, build apollo
  2. in bazel-bin, run:
    ./modules/perception/camera/tools/offline/offline_obstacle_pipeline
rt_net.cc:31] [] cudnnConvolutionLayer.cpp (254) - Cuda Error in execute: 7 
rt_net.cc:31] [] cudnnConvolutionLayer.cpp (254) - Cuda Error in execute: 7 Segmentation fault (core dumped)

gdb debug info:

I0427 11:18:14.827083  5457 rt_net.cc:31] [] cudnnConvolutionLayer.cpp (254) - Cuda Error in execute: 7
I0427 11:18:14.834583  5457 rt_net.cc:31] [] cudnnConvolutionLayer.cpp (254) - Cuda Error in execute: 7

Program received signal SIGSEGV, Segmentation fault.
Python Exception <class 'IndexError'> list index out of range: 
0x00007fffc069ec8a in apollo::perception::inference::RTNet::Init (this=0x3f40bd0, shapes=std::map with 1 elements) at modules/perception/inference/tensorrt/rt_net.cc:724
724 modules/perception/inference/tensorrt/rt_net.cc: No such file or directory.

Screenshots
image

Desktop (please complete the following information):

  • OS: ubuntu 18.04 lts
  • Version: Apollo 5.0

Additional context
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Quadro RTX 4000 Off | 00000000:01:00.0 Off | N/A |
| 30% 30C P8 8W / 125W | 745MiB / 7982MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Quadro RTX 4000 Off | 00000000:04:00.0 Off | N/A |
| 30% 36C P8 13W / 125W | 11MiB / 7959MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 2342 C mainboard 734MiB |
+-----------------------------------------------------------------------------+

$nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Tue_Jan_10_13:22:03_CST_2017
Cuda compilation tools, release 8.0, V8.0.61

@muleisheng muleisheng added the Module: Perception Indicates perception related issues label Jun 18, 2020
@wangzhensuo
Copy link

I met the same error while running "gdb --args mainboard -d /apollo/modules/perception/production/dag/dag_streaming_perception.dag"
but I create the apollo environment in another PC ,it works OK.
I also don't know why.

error info
Program received signal SIGSEGV, Segmentation fault.
Python Exception <class 'IndexError'> list index out of range:
0x00007fff1c0e5280 in apollo::perception::inference::RTNet::Init (this=0x89cac10,
shapes=std::map with 1 elements) at modules/perception/inference/tensorrt/rt_net.cc:634
634 context_ = engine->createExecutionContext();
(gdb)
(gdb)
(gdb) bt
Python Exception <class 'IndexError'> list index out of range:
#0 0x00007fff1c0e5280 in apollo::perception::inference::RTNet::Init (this=0x89cac10,
shapes=std::map with 1 elements) at modules/perception/inference/tensorrt/rt_net.cc:634
#1 0x00007fff2eb149ac in apollo::perception::lidar::CNNSegmentation::Init (this=0x8a26bd0,
options=...) at modules/perception/lidar/lib/segmentation/cnnseg/cnn_segmentation.cc:96
#2 0x00007fff33258a33 in apollo::perception::lidar::LidarObstacleSegmentation::Init (this=0x634f180,
options=...) at modules/perception/lidar/app/lidar_obstacle_segmentation.cc:73
#3 0x00007fff3f4368de in apollo::perception::onboard::SegmentationComponent::InitAlgorithmPlugin (
this=0x5cb2540) at modules/perception/onboard/component/segmentation_component.cc:86
#4 0x00007fff3f4362e7 in apollo::perception::onboard::SegmentationComponent::Init (this=0x5cb2540)
at modules/perception/onboard/component/segmentation_component.cc:48
#5 0x00007fff3f44c2f5 in apollo::cyber::Component<apollo::drivers::PointCloud, apollo::cyber::NullType, apollo::cyber::NullType, apollo::cyber::NullType>::Initialize (this=0x5cb2540, config=...)
at ./cyber/component/component.h:172
#6 0x0000000000413ad1 in apollo::cyber::mainboard::ModuleController::LoadModule (this=0x7fffffffdb00,
dag_config=...) at cyber/mainboard/module_controller.cc:89
#7 0x0000000000413f12 in apollo::cyber::mainboard::ModuleController::LoadModule (this=0x7fffffffdb00,
path="/apollo/modules/perception/production/dag/dag_streaming_perception.dag")
at cyber/mainboard/module_controller.cc:114
#8 0x000000000041355c in apollo::cyber::mainboard::ModuleController::LoadAll (this=0x7fffffffdb00)
at cyber/mainboard/module_controller.cc:58
#9 0x00000000004106be in apollo::cyber::mainboard::ModuleController::Init (this=0x7fffffffdb00)
at ./cyber/mainboard/module_controller.h:55
#10 0x000000000040f2d2 in main (argc=3, argv=0x7fffffffdcb8) at cyber/mainboard/mainboard.cc:41

@wangzhensuo
Copy link

Describe the bug
Cuda error when running Apollo5.0

To Reproduce
Steps to reproduce the behavior:

1. in docker, build apollo

2. in bazel-bin, run:
   ./modules/perception/camera/tools/offline/offline_obstacle_pipeline
rt_net.cc:31] [] cudnnConvolutionLayer.cpp (254) - Cuda Error in execute: 7 
rt_net.cc:31] [] cudnnConvolutionLayer.cpp (254) - Cuda Error in execute: 7 Segmentation fault (core dumped)

gdb debug info:

I0427 11:18:14.827083  5457 rt_net.cc:31] [] cudnnConvolutionLayer.cpp (254) - Cuda Error in execute: 7
I0427 11:18:14.834583  5457 rt_net.cc:31] [] cudnnConvolutionLayer.cpp (254) - Cuda Error in execute: 7

Program received signal SIGSEGV, Segmentation fault.
Python Exception <class 'IndexError'> list index out of range: 
0x00007fffc069ec8a in apollo::perception::inference::RTNet::Init (this=0x3f40bd0, shapes=std::map with 1 elements) at modules/perception/inference/tensorrt/rt_net.cc:724
724 modules/perception/inference/tensorrt/rt_net.cc: No such file or directory.

Screenshots
image

Desktop (please complete the following information):

* OS:  ubuntu 18.04 lts

* Version: Apollo 5.0

Additional context
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Quadro RTX 4000 Off | 00000000:01:00.0 Off | N/A |
| 30% 30C P8 8W / 125W | 745MiB / 7982MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Quadro RTX 4000 Off | 00000000:04:00.0 Off | N/A |
| 30% 36C P8 13W / 125W | 11MiB / 7959MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 2342 C mainboard 734MiB |
+-----------------------------------------------------------------------------+

$nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Tue_Jan_10_13:22:03_CST_2017
Cuda compilation tools, release 8.0, V8.0.61

have you solve this problem?

@daohu527
Copy link
Contributor

It is recommended to migrate to the new version

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Module: Perception Indicates perception related issues
Projects
None yet
Development

No branches or pull requests

7 participants