Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rclpy segmentfault when Ctrl-C #215

Closed
reed-lau opened this issue Aug 8, 2018 · 6 comments · Fixed by #288
Closed

rclpy segmentfault when Ctrl-C #215

reed-lau opened this issue Aug 8, 2018 · 6 comments · Fixed by #288
Labels
bug Something isn't working

Comments

@reed-lau
Copy link
Contributor

reed-lau commented Aug 8, 2018

Bug report

Required Info:

  • Operating System:
    Ubuntu 16.04
  • Installation type:
    from source
  • Version or commit hash:
    0.5.3 eb97379
  • DDS implementation:
    opensplice
  • Client library (if applicable):
    rclpy

Steps to reproduce issue

change the demo_node_py talker's publish frequency to 20Hz and start the talker with command 'python3 talker.py'
then press Ctrl-C to interrupt it. in some case, it may cause segmentation fault.

the traceback of gdb :

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `python3 talker.py'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fb0ffd75049 in rmw_trigger_guard_condition (guard_condition_handle=0x7c001e7200096b00)
    at /home/tusimple/ros2_zoro/src/hpc-zoro/rmw_zoro_cpp/src/rmw_trigger_guard_condition.cpp:30
30	    if (guard_condition_handle->implementation_identifier != tusimple_zoro_identifier) {
[Current thread is 1 (Thread 0x7fb0fcc1d700 (LWP 28506))]
(gdb) bt
#0  0x00007fb0ffd75049 in rmw_trigger_guard_condition (guard_condition_handle=0x7c001e7200096b00)
    at /home/tusimple/ros2_zoro/src/hpc-zoro/rmw_zoro_cpp/src/rmw_trigger_guard_condition.cpp:30
#1  0x00007fb1011bec1a in rmw_trigger_guard_condition ()
   from /home/tusimple/ros2_bb/install/rmw_implementation/lib/librmw_implementation.so
#2  0x00007fb1013d0010 in rcl_trigger_guard_condition (guard_condition=0x29f9310)
    at /home/tusimple/ros2_bb/src/ros2/rcl/rcl/src/rcl/guard_condition.c:152
#3  0x00007fb101ca5ac6 in catch_function (signo=2) at /home/tusimple/ros2_bb/src/ros2/rclpy/rclpy/src/rclpy/_rclpy.c:47
#4  0x00007fb0fdbf6c7b in ?? () from /usr/lib/libddskernel.so
#5  0x00007fb0fdc34542 in ?? () from /usr/lib/libddskernel.so
#6  0x00007fb1031ee6ba in start_thread (arg=0x7fb0fcc1d700) at pthread_create.c:333
#7  0x00007fb102f2441d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
@dirk-thomas
Copy link
Member

In the issue template you mention OpenSplice but the stacktrace indicates the problem happens in the package rmw_zoro_cpp. Can you please clarify.

@dirk-thomas dirk-thomas added the more-information-needed Further information is required label Aug 8, 2018
@reed-lau
Copy link
Contributor Author

reed-lau commented Aug 8, 2018

sorry, it's our extension.
I test it on fastrtps, the same segmentfault occurs.
useful bt as follow,

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `python3 talker.py'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00007fa9496f96e7 in rmw_trigger_guard_condition () from /home/tusimple/ros2_bb/install/rmw_fastrtps_cpp/lib/librmw_fastrtps_cpp.so
[Current thread is 1 (Thread 0x7fa94d478700 (LWP 115401))]
(gdb) bt
#0 0x00007fa9496f96e7 in rmw_trigger_guard_condition () from /home/tusimple/ros2_bb/install/rmw_fastrtps_cpp/lib/librmw_fastrtps_cpp.so
#1 0x00007fa94b043c1a in rmw_trigger_guard_condition () from /home/tusimple/ros2_bb/install/rmw_implementation/lib/librmw_implementation.so
#2 0x00007fa94b255010 in rcl_trigger_guard_condition (guard_condition=0x2e36730) at /home/tusimple/ros2_bb/src/ros2/rcl/rcl/src/rcl/guard_condition.c:152
#3 0x00007fa94bb2aac6 in catch_function (signo=2) at /home/tusimple/ros2_bb/src/ros2/rclpy/rclpy/src/rclpy/_rclpy.c:47
#4
#5 0x00000000005228f5 in PyErr_Restore () at ../Python/errors.c:51
#6 0x000000000054a33c in builtin_hasattr_impl.isra.8 (name=, obj=) at ../Python/bltinmodule.c:1064
#7 builtin_hasattr.lto_priv () at ../Python/clinic/bltinmodule.c.h:327
#8 0x00000000004e9b7f in PyCFunction_Call () at ../Objects/methodobject.c:109
#9 0x00000000005372f4 in call_function (oparg=, pp_stack=0x7ffce7695970) at ../Python/ceval.c:4705
#10 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
#11 0x0000000000540199 in _PyEval_EvalCodeWithName () at ../Python/ceval.c:4018
#12 0x000000000053bd92 in fast_function (nk=, na=, n=, pp_stack=0x7ffce7695b80, func=) at ../Python/ceval.c:4813
#13 call_function (oparg=, pp_stack=0x7ffce7695b80) at ../Python/ceval.c:4730
#14 PyEval_EvalFrameEx () at ../Python/ceval.c:3236

@reed-lau
Copy link
Contributor Author

reed-lau commented Aug 8, 2018

here opensplice version, runtime error may occur

[INFO] [talker]: Publishing: "Hello World: 0"
[INFO] [talker]: Publishing: "Hello World: 1"
[INFO] [talker]: Publishing: "Hello World: 2"
[INFO] [talker]: Publishing: "Hello World: 3"
[INFO] [talker]: Publishing: "Hello World: 4"
[INFO] [talker]: Publishing: "Hello World: 5"
[INFO] [talker]: Publishing: "Hello World: 6"
^C[rcutils|error_handling.c:89] provided allocator is invalid, error state not updated
RuntimeError: Failed to trigger guard_condition: guard condition handle is null, at /home/tusimple/ros2_bb/src/ros2/rmw_opensplice/rmw_opensplice_cpp/src/rmw_trigger_guard_condition.cpp:33

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/tusimple/ros2_bb/install/rclpy/lib/python3.5/site-packages/rclpy/task.py", line 206, in call
self._handler.send(None)
File "/home/tusimple/ros2_bb/install/rclpy/lib/python3.5/site-packages/rclpy/executors.py", line 297, in handler
with work_tracker:
File "/home/tusimple/ros2_bb/install/rclpy/lib/python3.5/site-packages/rclpy/executors.py", line 57, in enter
self._num_work_executing += 1
File "/usr/lib/python3.5/threading.py", line 241, in exit
return self._lock.exit(*args)
SystemError: <built-in method exit of _thread.RLock object at 0x7f6cd22865a0> returned a result with an error set

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "talker.py", line 55, in
main()
File "talker.py", line 48, in main
rclpy.spin(node)
File "/home/tusimple/ros2_bb/install/rclpy/lib/python3.5/site-packages/rclpy/init.py", line 108, in spin
executor.spin_once()
File "/home/tusimple/ros2_bb/install/rclpy/lib/python3.5/site-packages/rclpy/executors.py", line 523, in spin_once
handler()
File "/home/tusimple/ros2_bb/install/rclpy/lib/python3.5/site-packages/rclpy/task.py", line 206, in call
self._handler.send(None)
KeyboardInterrupt

@reed-lau
Copy link
Contributor Author

reed-lau commented Aug 8, 2018

opensplice version, sometime

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `python3 talker.py'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x0000000000fb7dc0 in ?? ()
[Current thread is 1 (Thread 0x7f419d903700 (LWP 125294))]
(gdb) bt
#0 0x0000000000fb7dc0 in ?? ()
#1 0x00007f419fbe10dc in rcutils_set_error_state () from /home/tusimple/ros2_bb/install/rcutils/lib/librcutils.so
#2 0x00007f41a0209046 in rcl_trigger_guard_condition (guard_condition=0xfb7dd0) at /home/tusimple/ros2_bb/src/ros2/rcl/rcl/src/rcl/guard_condition.c:153
#3 0x00007f41a0adeac6 in catch_function (signo=2) at /home/tusimple/ros2_bb/src/ros2/rclpy/rclpy/src/rclpy/_rclpy.c:47
#4 0x00007f419e2d2c7b in ?? () from /usr/lib/libddskernel.so
#5 0x00007f419e310542 in ?? () from /usr/lib/libddskernel.so
#6 0x00007f41a20276ba in start_thread (arg=0x7f419d903700) at pthread_create.c:333
#7 0x00007f41a1d5d41d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

@dirk-thomas dirk-thomas added bug Something isn't working and removed more-information-needed Further information is required labels Aug 8, 2018
@dirk-thomas
Copy link
Member

I can reproduce the problem on the default platform Bionic wit FastRTPS.

@joncppl
Copy link

joncppl commented Oct 2, 2018

(Built from source, bouncy bolson release (rclpy==0.5.4), on Arch Linux w/ fastrtps)

I am experiencing the same behaviour with my rclpy nodes. They segfault when I ctrl-c them. My stacktrace looks a bit different than yours though.

This also sometimes happens even to ros2 topic echo ...

           PID: 3634 (ros2)
           UID: 1000 (jonathan)
           GID: 1000 (jonathan)
        Signal: 11 (SEGV)
     Timestamp: Tue 2018-10-02 01:31:54 PDT (11s ago)
  Command Line: /usr/bin/python /opt/ros/ws/install/bin/ros2 topic echo /camera
    Executable: /usr/bin/python3.7
 Control Group: /user.slice/user-1000.slice/[email protected]/gnome-terminal-server.service
          Unit: [email protected]
     User Unit: gnome-terminal-server.service
         Slice: user-1000.slice
     Owner UID: 1000 (jonathan)
       Boot ID: 6104f9e80f1d4aa4afd2f605783659eb
    Machine ID: 7b0b77548ea94cefaad990e1d7fe7e1e
      Hostname: MakiseLinux
       Storage: /var/lib/systemd/coredump/core.ros2.1000.6104f9e80f1d4aa4afd2f605783659eb.3634.1538469114000000.lz4
       Message: Process 3634 (ros2) of user 1000 dumped core.
                
                Stack trace of thread 3634:
                #0  0x00007fac3d2b21a0 n/a (_yaml.cpython-37m-x86_64-linux-gnu.so)
                #1  0x00007fac3e36c5ec rcl_trigger_guard_condition (librcl.so)
                #2  0x00007fac3e9289af catch_function (_rclpy.cpython-37m-x86_64-linux-gnu.so)
                #3  0x00007fac40159e00 __restore_rt (libc.so.6)
                #4  0x00007fac3ff717d0 _PyEval_EvalFrameDefault (libpython3.7m.so.1.0)
                #5  0x00007fac3ff2ac93 n/a (libpython3.7m.so.1.0)
                #6  0x00007fac3ff2b150 n/a (libpython3.7m.so.1.0)
                #7  0x00007fac3fefe9d8 _PyMethodDef_RawFastCallKeywords (libpython3.7m.so.1.0)
                #8  0x00007fac3fefed21 _PyCFunction_FastCallKeywords (libpython3.7m.so.1.0)
                #9  0x00007fac3ff76112 _PyEval_EvalFrameDefault (libpython3.7m.so.1.0)
                #10 0x00007fac3feb7f3b _PyFunction_FastCallDict (libpython3.7m.so.1.0)
                #11 0x00007fac3fec7108 n/a (libpython3.7m.so.1.0)
                #12 0x00007fac3fec72e9 PyObject_CallFunctionObjArgs (libpython3.7m.so.1.0)
                #13 0x00007fac3fec7330 n/a (libpython3.7m.so.1.0)
                #14 0x00007fac3ff6a94b _PyObject_GenericSetAttrWithDict (libpython3.7m.so.1.0)
                #15 0x00007fac3ff6acd6 PyObject_SetAttr (libpython3.7m.so.1.0)
                #16 0x00007fac3ff6f7f3 PyObject_SetAttrString (libpython3.7m.so.1.0)
                #17 0x00007fac3ce9a613 n/a (/home/jonathan/code/ros2ws/install/lib/python3.7/site-packages/image_msg/image_msg_s__rosidl_typesupport_c.cpython-37m-x86_64-linux-gnu.so)
                #18 0x00007fac3e92ef15 rclpy_take (_rclpy.cpython-37m-x86_64-linux-gnu.so)
                #19 0x00007fac3fefea88 _PyMethodDef_RawFastCallKeywords (libpython3.7m.so.1.0)
                #20 0x00007fac3fefed21 _PyCFunction_FastCallKeywords (libpython3.7m.so.1.0)
                #21 0x00007fac3ff7689c _PyEval_EvalFrameDefault (libpython3.7m.so.1.0)
                #22 0x00007fac3fefe2fb _PyFunction_FastCallKeywords (libpython3.7m.so.1.0)
                #23 0x00007fac3ff71d3d _PyEval_EvalFrameDefault (libpython3.7m.so.1.0)
                #24 0x00007fac3ff2b85e _PyGen_Send (libpython3.7m.so.1.0)
                #25 0x00007fac3fefe9d8 _PyMethodDef_RawFastCallKeywords (libpython3.7m.so.1.0)
                #26 0x00007fac3ff2a31f _PyMethodDescr_FastCallKeywords (libpython3.7m.so.1.0)
                #27 0x00007fac3ff7644d _PyEval_EvalFrameDefault (libpython3.7m.so.1.0)
                #28 0x00007fac3feb7f3b _PyFunction_FastCallDict (libpython3.7m.so.1.0)
                #29 0x00007fac3fec6dd8 _PyObject_Call_Prepend (libpython3.7m.so.1.0)
                #30 0x00007fac3ff1730f n/a (libpython3.7m.so.1.0)
                #31 0x00007fac3ff2a85c _PyObject_FastCallKeywords (libpython3.7m.so.1.0)
                #32 0x00007fac3ff765ba _PyEval_EvalFrameDefault (libpython3.7m.so.1.0)
                #33 0x00007fac3feb6ee9 _PyEval_EvalCodeWithName (libpython3.7m.so.1.0)
                #34 0x00007fac3fefe4a2 _PyFunction_FastCallKeywords (libpython3.7m.so.1.0)
                #35 0x00007fac3ff72abc _PyEval_EvalFrameDefault (libpython3.7m.so.1.0)

and

           PID: 23295 (python)
           UID: 1000 (jonathan)
           GID: 1000 (jonathan)
        Signal: 11 (SEGV)
     Timestamp: Tue 2018-10-02 01:23:06 PDT (14min ago)
  Command Line: python ./image_sender.py
    Executable: /usr/bin/python3.7
 Control Group: /user.slice/user-1000.slice/[email protected]/gnome-terminal-server.service
          Unit: [email protected]
     User Unit: gnome-terminal-server.service
         Slice: user-1000.slice
     Owner UID: 1000 (jonathan)
       Boot ID: 6104f9e80f1d4aa4afd2f605783659eb
    Machine ID: 7b0b77548ea94cefaad990e1d7fe7e1e
      Hostname: MakiseLinux
       Storage: /var/lib/systemd/coredump/core.python.1000.6104f9e80f1d4aa4afd2f605783659eb.23295.1538468586000000.lz4
       Message: Process 23295 (python) of user 1000 dumped core.
                
                Stack trace of thread 23295:
                #0  0x00007f5901f3325f raise (libpthread.so.0)
                #1  0x00007f5901f333c0 __restore_rt (libpthread.so.0)
                #2  0x00007f59012d6ab0 n/a (n/a)
                #3  0x00007f590107c5ec rcl_trigger_guard_condition (librcl.so)
                #4  0x00007f590109f9af catch_function (_rclpy.cpython-37m-x86_64-linux-gnu.so)
                #5  0x00007f5901d94e00 __restore_rt (libc.so.6)
                #6  0x00007f5901e5340b syscall (libc.so.6)
                #7  0x00007f58f7ee2461 g_cond_wait (libglib-2.0.so.0)
                #8  0x00007f58f7d7ea65 gst_app_sink_try_pull_sample (libgstapp-1.0.so.0)
                #9  0x00007f58fc7f56c3 n/a (libopencv_videoio.so.3.4)
                #10 0x00007f58fc7e499d _ZN2cv12VideoCapture4grabEv (libopencv_videoio.so.3.4)
                #11 0x00007f58fc7e4a3c _ZN2cv12VideoCapture4readERKNS_12_OutputArrayE (libopencv_videoio.so.3.4)
                #12 0x00007f58ffe13e9e n/a (cv2.cpython-37m-x86_64-linux-gnu.so)
                #13 0x00007f5901b39bc4 _PyMethodDef_RawFastCallKeywords (libpython3.7m.so.1.0)
                #14 0x00007f5901b6531f _PyMethodDescr_FastCallKeywords (libpython3.7m.so.1.0)
                #15 0x00007f5901bb144d _PyEval_EvalFrameDefault (libpython3.7m.so.1.0)
                #16 0x00007f5901af2f3b _PyFunction_FastCallDict (libpython3.7m.so.1.0)
                #17 0x00007f5901b01dd8 _PyObject_Call_Prepend (libpython3.7m.so.1.0)
                #18 0x00007f5901af366b PyObject_Call (libpython3.7m.so.1.0)
                #19 0x00007f5901bae3f1 _PyEval_EvalFrameDefault (libpython3.7m.so.1.0)
                #20 0x00007f5901b6685e _PyGen_Send (libpython3.7m.so.1.0)
                #21 0x00007f5901badf77 _PyEval_EvalFrameDefault (libpython3.7m.so.1.0)
                #22 0x00007f5901b6685e _PyGen_Send (libpython3.7m.so.1.0)
                #23 0x00007f5901badf77 _PyEval_EvalFrameDefault (libpython3.7m.so.1.0)
                #24 0x00007f5901b6685e _PyGen_Send (libpython3.7m.so.1.0)
                #25 0x00007f5901b399d8 _PyMethodDef_RawFastCallKeywords (libpython3.7m.so.1.0)
                #26 0x00007f5901b6531f _PyMethodDescr_FastCallKeywords (libpython3.7m.so.1.0)
                #27 0x00007f5901bb144d _PyEval_EvalFrameDefault (libpython3.7m.so.1.0)
                #28 0x00007f5901af2f3b _PyFunction_FastCallDict (libpython3.7m.so.1.0)
                #29 0x00007f5901b01dd8 _PyObject_Call_Prepend (libpython3.7m.so.1.0)
                #30 0x00007f5901b5230f n/a (libpython3.7m.so.1.0)
                #31 0x00007f5901b6585c _PyObject_FastCallKeywords (libpython3.7m.so.1.0)
                #32 0x00007f5901bb15ba _PyEval_EvalFrameDefault (libpython3.7m.so.1.0)
                #33 0x00007f5901af1ee9 _PyEval_EvalCodeWithName (libpython3.7m.so.1.0)
                #34 0x00007f5901b394a2 _PyFunction_FastCallKeywords (libpython3.7m.so.1.0)
                #35 0x00007f5901bacb92 _PyEval_EvalFrameDefault (libpython3.7m.so.1.0)

A quick look at where it died...

#4  <signal handler called>
(gdb) down 1
#3  0x00007fac3e9289af in catch_function (signo=2) at /opt/ros/ws/src/ros2/rclpy/rclpy/src/rclpy/_rclpy.c:47
47	    rcl_ret_t ret = rcl_trigger_guard_condition(g_sigint_gc_handle);
(gdb) list 47
42	/// Catch signals
43	static void catch_function(int signo)
44	{
45	  (void) signo;
46	  if (NULL != g_sigint_gc_handle) {
47	    rcl_ret_t ret = rcl_trigger_guard_condition(g_sigint_gc_handle);
48	    if (ret != RCL_RET_OK) {
49	      PyErr_Format(PyExc_RuntimeError,
50	        "Failed to trigger guard_condition: %s", rcl_get_error_string_safe());
51	      rcl_reset_error();

reed-lau added a commit to reed-lau/rclpy that referenced this issue Mar 18, 2019
@tfoote tfoote added the in review Waiting for review (Kanban column) label Mar 18, 2019
reed-lau added a commit to reed-lau/rclpy that referenced this issue Mar 19, 2019
sloretz pushed a commit that referenced this issue Mar 20, 2019
@sloretz sloretz removed the in review Waiting for review (Kanban column) label Mar 20, 2019
sloretz pushed a commit that referenced this issue Mar 29, 2019
sloretz pushed a commit that referenced this issue Mar 29, 2019
* fix #215

Signed-off-by: reed-lau <[email protected]>
Signed-off-by: Shane Loretz <[email protected]>
sloretz added a commit that referenced this issue Mar 29, 2019
* fix #215

Signed-off-by: reed-lau <[email protected]>
Signed-off-by: Shane Loretz <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants