Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ready reentrant Waitable objects can attempt to execute multiple times #1212

Closed
audrow opened this issue Jul 1, 2020 · 0 comments · Fixed by #1241 or ros2/system_tests#444
Closed
Assignees
Labels
bug Something isn't working

Comments

@audrow
Copy link
Member

audrow commented Jul 1, 2020

In the MultiThreadedExecutor, the run function is bound to each running thread; the run function builds and checks the wait set for ready executables and seeks to run them. The Executor::wait_for_work() function builds the wait set and then removes all items that do not return true for is_ready(), which for the SubscriptionIntraProcess class looks to see if there is a positive number of entries in the ring buffer by calling has_data().

The first executable in the wait set is then checked to see if it should be excluded based on its group. At this stage, an executable that belongs to the Reentrant callback group type is always returned to be executed, where as executables in the MutuallyExclusive callback group may not be, as only one callback from the group can be executed at once. From there, the executable is popped from the buffer and executed.

To execute the callback for the SubscriptionIntraProcess, the ring buffer dequeues an element. To dequeue, the ring buffer first checks that it contains a positive number of elements with has_data(); if there are zero elements, an error is thrown.

It is possible for the same ring buffer entry to be 'ready' in the Executor::wait_for_work() and no be not ready in Executor::execute_any_executable(), thus yielding the following error:

[ERROR] [1593631031.302392471] [rclcpp]: Calling dequeue on empty intra-process buffer
terminate called after throwing an instance of 'std::runtime_error'
  what():  Calling dequeue on empty intra-process buffer
Aborted (core dumped)
Backtrace
#0  rclcpp::experimental::buffers::RingBufferImplementation<std::unique_ptr<test_rclcpp::msg::UInt32_<std::allocator<void> >, std::default_delete<test_rclcpp::msg::UInt32_<std::allocator<void> > > > >::dequeue (this=0x5555558d0790)
    at /root/ros2_foxy/install/rclcpp/include/rclcpp/experimental/buffers/ring_buffer_implementation.hpp:78
#1  0x000055555562d453 in rclcpp::experimental::buffers::TypedIntraProcessBuffer<test_rclcpp::msg::UInt32_<std::allocator<void> >, std::allocator<void>, std::default_delete<test_rclcpp::msg::UInt32_<std::allocator<void> > >, std::unique_ptr<test_rclcpp::msg::UInt32_<std::allocator<void> >, std::default_delete<test_rclcpp::msg::UInt32_<std::allocator<void> > > > >::consume_unique_impl<std::unique_ptr<test_rclcpp::msg::UInt32_<std::allocator<void> >, std::default_delete<test_rclcpp::msg::UInt32_<std::allocator<void> > > > > (this=0x5555558d0950)
    at /root/ros2_foxy/install/rclcpp/include/rclcpp/experimental/buffers/intra_process_buffer.hpp:232
#2  0x0000555555627958 in rclcpp::experimental::buffers::TypedIntraProcessBuffer<test_rclcpp::msg::UInt32_<std::allocator<void> >, std::allocator<void>, std::default_delete<test_rclcpp::msg::UInt32_<std::allocator<void> > >, std::unique_ptr<test_rclcpp::msg::UInt32_<std::allocator<void> >, std::default_delete<test_rclcpp::msg::UInt32_<std::allocator<void> > > > >::consume_unique (this=0x5555558d0950)
    at /root/ros2_foxy/install/rclcpp/include/rclcpp/experimental/buffers/intra_process_buffer.hpp:122
#3  0x000055555562e4f1 in rclcpp::experimental::SubscriptionIntraProcess<test_rclcpp::msg::UInt32_<std::allocator<void> >, std::allocator<void>, std::default_delete<test_rclcpp::msg::UInt32_<std::allocator<void> > >, test_rclcpp::msg::UInt32_<std::allocator<void> > >::execute_impl<test_rclcpp::msg::UInt32_<std::allocator<void> > > (this=0x555555893690)
    at /root/ros2_foxy/install/rclcpp/include/rclcpp/experimental/subscription_intra_process.hpp:164
#4  0x000055555562c340 in rclcpp::experimental::SubscriptionIntraProcess<test_rclcpp::msg::UInt32_<std::allocator<void> >, std::allocator<void>, std::default_delete<test_rclcpp::msg::UInt32_<std::allocator<void> > >, test_rclcpp::msg::UInt32_<std::allocator<void> > >::execute (this=0x555555893690)
    at /root/ros2_foxy/install/rclcpp/include/rclcpp/experimental/subscription_intra_process.hpp:114
#5  0x00007ffff7d39d2f in rclcpp::Executor::execute_any_executable (this=0x7fffffff43c0, 
    any_exec=...) at /root/ros2_foxy/src/ros2/rclcpp/rclcpp/src/rclcpp/executor.cpp:324
#6  0x00007ffff7d436f7 in rclcpp::executors::MultiThreadedExecutor::run (this=0x7fffffff43c0)
    at /root/ros2_foxy/src/ros2/rclcpp/rclcpp/src/rclcpp/executors/multi_threaded_executor.cpp:103
#7  0x00007ffff7d4746a in std::__invoke_impl<void, void (rclcpp::executors::MultiThreadedExecutor::*&)(unsigned long), rclcpp::executors::MultiThreadedExecutor*&, unsigned long&> (__f=
    @0x5555558d3378: (void (rclcpp::executors::MultiThreadedExecutor::*)(rclcpp::executors::MultiThreadedExecutor * const, unsigned long)) 0x7ffff7d43440 <rclcpp::executors::MultiThreadedExecutor::run(unsigned long)>, __t=@0x5555558d3390: 0x7fffffff43c0) at /usr/include/c++/9/bits/invoke.h:73
#8  0x00007ffff7d47368 in std::__invoke<void (rclcpp::executors::MultiThreadedExecutor::*&)(unsigned long), rclcpp::executors::MultiThreadedExecutor*&, unsigned long&> (__fn=
    @0x5555558d3378: (void (rclcpp::executors::MultiThreadedExecutor::*)(rclcpp::executors::MultiThreadedExecutor * const, unsigned long)) 0x7ffff7d43440 <rclcpp::executors::MultiThreadedExecutor::run(unsigned long)>) at /usr/include/c++/9/bits/invoke.h:95
#9  0x00007ffff7d47262 in std::_Bind<void (rclcpp::executors::MultiThreadedExecutor::*(rclcpp::executors::MultiThreadedExecutor*, unsigned long))(unsigned long)>::__call<void, , 0ul, 1ul>(std::tuple<>&&, std::_Index_tuple<0ul, 1ul>) (this=0x5555558d3378, __args=...)
    at /usr/include/c++/9/functional:400
#10 0x00007ffff7d471b9 in std::_Bind<void (rclcpp::executors::MultiThreadedExecutor::*(rclcpp::executors::MultiThreadedExecutor*, unsigned long))(unsigned long)>::operator()<, void>() (
    this=0x5555558d3378) at /usr/include/c++/9/functional:484
#11 0x00007ffff7d47161 in std::__invoke_impl<void, std::_Bind<void (rclcpp::executors::MultiThreadedExecutor::*(rclcpp::executors::MultiThreadedExecutor*, unsigned long))(unsigned long)>>(std::__invoke_other, std::_Bind<void (rclcpp::executors::MultiThreadedExecutor::*(rclcpp::executors::MultiThreadedE--Type <RET> for more, q to quit, c to continue without paging--c
xecutor*, unsigned long))(unsigned long)>&&) (__f=...) at /usr/include/c++/9/bits/invoke.h:60
#12 0x00007ffff7d470f6 in std::__invoke<std::_Bind<void (rclcpp::executors::MultiThreadedExecutor::*(rclcpp::executors::MultiThreadedExecutor*, unsigned long))(unsigned long)>>(std::_Bind<void (rclcpp::executors::MultiThreadedExecutor::*(rclcpp::executors::MultiThreadedExecutor*, unsigned long))(unsigned long)>&&) (__fn=...) at /usr/include/c++/9/bits/invoke.h:95
#13 0x00007ffff7d47088 in std::thread::_Invoker<std::tuple<std::_Bind<void (rclcpp::executors::MultiThreadedExecutor::*(rclcpp::executors::MultiThreadedExecutor*, unsigned long))(unsigned long)> > >::_M_invoke<0ul>(std::_Index_tuple<0ul>) (this=0x5555558d3378) at /usr/include/c++/9/thread:244
#14 0x00007ffff7d47045 in std::thread::_Invoker<std::tuple<std::_Bind<void (rclcpp::executors::MultiThreadedExecutor::*(rclcpp::executors::MultiThreadedExecutor*, unsigned long))(unsigned long)> > >::operator()() (this=0x5555558d3378) at /usr/include/c++/9/thread:251
#15 0x00007ffff7d47016 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<std::_Bind<void (rclcpp::executors::MultiThreadedExecutor::*(rclcpp::executors::MultiThreadedExecutor*, unsigned long))(unsigned long)> > > >::_M_run() (this=0x5555558d3370) at /usr/include/c++/9/thread:195
#16 0x00007ffff763ccb4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#17 0x00007ffff7300609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#18 0x00007ffff7479103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

This error is likely caused by a logical race condition where a MultiThreadedExecutor can call is_ready() twice before execute() gets a chance to change the return value of is_ready(). This error likely occurs for objects that subclass Waitable and are in the reentrant callback group, since this group does not protect for more than one of its members going at the same time.

Note, I believe the above as well as the race condition described in #1007 contribute to this flaky error.

A possible solution would be to extend rclcpp's Waitable class with a method to exclude a Waitable object from being considered when building a wait set, such as rclpy's take_data method.

Steps to reproduce issue

One way to reproduce this error by running this function repeatedly. It may take a few hundred runs.

Another way is to compile the above function with the debug flag, open the program in gdb, and create a breakpoint the executor method (b executor()). When you hit the breakpoint, you will want to delete that breakpoint so you can step without jumping to other threads. Then when you step the other threads will finish (and finish the work on your current thread) so you will be able to step your way to the error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
1 participant