Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_client_scope_cpp can fail with exception #458

Closed
eboasson opened this issue Dec 20, 2020 · 1 comment · Fixed by #459
Closed

test_client_scope_cpp can fail with exception #458

eboasson opened this issue Dec 20, 2020 · 1 comment · Fixed by #459
Assignees
Labels
bug Something isn't working

Comments

@eboasson
Copy link

Bug report

Required Info:

Steps to reproduce issue

Use a hacked version of test_client_scope_server.cpp to get the signal delivery timing just right, e.g.:

diff --git a/test_rclcpp/test/test_client_scope_server.cpp b/test_rclcpp/test/test_client_scope_server.cpp
index 47a5703..b9dfae6 100644
--- a/test_rclcpp/test/test_client_scope_server.cpp
+++ b/test_rclcpp/test/test_client_scope_server.cpp
@@ -27,20 +27,45 @@ void handle_add_two_ints(
   response->sum = request->a + request->b;
 }
 
+#include <signal.h>
+#include <unistd.h>
+#include "rcutils/error_handling.h"
 int main(int argc, char ** argv)
 {
+sigset_t xs, sigint;
+sigemptyset(&sigint);
+sigaddset(&sigint, SIGINT);
+sigprocmask(SIG_BLOCK, &sigint, &xs);
+
   rclcpp::init(argc, argv);
 
   auto node = rclcpp::Node::make_shared("client_scope_server");
+rcutils_logging_set_logger_level("", RCUTILS_LOG_SEVERITY_DEBUG);
 
   auto service = node->create_service<test_rclcpp::srv::AddTwoInts>(
     "client_scope", handle_add_two_ints);
 
   rclcpp::WallRate loop_rate(30);
   while (rclcpp::ok()) {
+printf("unblock\n");
+std::cout.flush();
+sigprocmask(SIG_SETMASK, &xs, NULL);
+printf("sleep\n");
+std::cout.flush();
+usleep(1000000);
+printf("spin_some\n");
+std::cout.flush();
     rclcpp::spin_some(node);
     loop_rate.sleep();
+printf("block\n");
+std::cout.flush();
+sigprocmask(SIG_BLOCK, &sigint, &xs);
+printf("sleep\n");
+std::cout.flush();
+usleep(1000000);
   }
+printf("rclcpp shutdown\n");
+std::cout.flush();
   rclcpp::shutdown();
   return 0;
 }

then run the test as normal: ctest -V -R test_client_scope_cpp__rmw_cyclonedds_cpp (or some other RMW)

Expected behavior

Test passes.

Actual behavior

Exception caused by the main loop of the test program calling spin_some after the context was finalized by the deferred signal handler:

18: [INFO] [test_client_scope_server_cpp-1]: sending signal 'SIGINT' to process[test_client_scope_server_cpp-1]
18: [test_client_scope_server_cpp-1] unblock
18: [test_client_scope_server_cpp-1] sleep
18: [test_client_scope_server_cpp-1] [INFO] [1608413551.502214520] [rclcpp]: signal_handler(signal_value=2)
18: [test_client_scope_server_cpp-1] [DEBUG] [1608413551.502307520] [rclcpp]: signal_handler(): SIGINT received, notifying deferred signal handler
18: [test_client_scope_server_cpp-1] [DEBUG] [1608413551.502380020] [rclcpp]: deferred_signal_handler(): woken up due to SIGINT or uninstall
18: [test_client_scope_server_cpp-1] [DEBUG] [1608413551.502427437] [rclcpp]: deferred_signal_handler(): SIGINT received, shutting down
18: [test_client_scope_server_cpp-1] [DEBUG] [1608413551.502468354] [rclcpp]: deferred_signal_handler(): shutting down rclcpp::Context @ 0x129607e38, because it had shutdown_on_sigint == true
18: [test_client_scope_server_cpp-1] [DEBUG] [1608413551.502496396] [rcl]: Shutting down ROS client library, for context at address: 0x129607bd0
18: [test_client_scope_server_cpp-1] [DEBUG] [1608413551.502565021] [rcl]: Finalizing publisher
18: [test_client_scope_server_cpp-1] [DEBUG] [1608413551.502903730] [rcl]: Publisher finalized
18: [test_client_scope_server_cpp-1] [DEBUG] [1608413551.503439606] [rclcpp]: deferred_signal_handler(): waiting for SIGINT or uninstall
18: [test_client_scope_server_cpp-1] spin_some
18: [test_client_scope_server_cpp-1] 
18: [test_client_scope_server_cpp-1] >>> [rcutils|error_handling.c:108] rcutils_set_error_state()
18: [test_client_scope_server_cpp-1] This error state is being overwritten:
18: [test_client_scope_server_cpp-1] 
18: [test_client_scope_server_cpp-1]   'Handle's typesupport identifier (rosidl_typesupport_cpp) is not supported by this library, at /Users/erik/ros2_ws/src/ros2/rosidl_typesupport/rosidl_typesupport_cpp/src/type_support_dispatch.hpp:120'
18: [test_client_scope_server_cpp-1] 
18: [test_client_scope_server_cpp-1] with this new error message:
18: [test_client_scope_server_cpp-1] 
18: [test_client_scope_server_cpp-1]   'the given context is not valid, either rcl_init() was not called or rcl_shutdown() was called., at /Users/erik/ros2_ws/src/ros2/rcl/rcl/src/rcl/guard_condition.c:69'
18: [test_client_scope_server_cpp-1] 
18: [test_client_scope_server_cpp-1] rcutils_reset_error() should be called after error handling to avoid this.
18: [test_client_scope_server_cpp-1] <<<
18: [test_client_scope_server_cpp-1] libc++abi.dylib: terminating with uncaught exception of type rclcpp::exceptions::RCLError: the given context is not valid, either rcl_init() was not called or rcl_shutdown() was called., at /Users/erik/ros2_ws/src/ros2/rcl/rcl/src/rcl/guard_condition.c:69
18: [ERROR] [test_client_scope_server_cpp-1]: process has died [pid 67948, exit code -6, cmd '/Users/erik/ros2_ws/build/test_rclcpp/test_client_scope_server_cpp'].
18: test_client_scope_cpp (test_rclcpp.TestTwoExecutablesAfterShutdown)
18: Test that both executables finished cleanly. ... FAIL
18: 
18: ======================================================================
18: FAIL: test_client_scope_cpp (test_rclcpp.TestTwoExecutablesAfterShutdown)
18: Test that both executables finished cleanly.
18: ----------------------------------------------------------------------
18: Traceback (most recent call last):
18:   File "/Users/erik/ros2_ws/build/test_rclcpp/test_client_scope_cpp__rmw_cyclonedds_cpp_RelWithDebInfo.py", line 75, in test_client_scope_cpp
18:     launch_testing.asserts.assertExitCodes(
18:   File "/Users/erik/ros2_ws/install/launch_testing/lib/python3.9/site-packages/launch_testing/asserts/assert_exit_codes.py", line 62, in assertExitCodes
18:     assert info.returncode in allowable_exit_codes, 'Proc {} exited with code {}'.format(
18: AssertionError: Proc test_client_scope_server_cpp-1 exited with code -6
18: 
18: ----------------------------------------------------------------------
18: Ran 1 test in 0.001s
18: 
18: FAILED (failures=1)
18: -- run_test.py: return code 1
18: -- run_test.py: verify result file '/Users/erik/ros2_ws/build/test_rclcpp/test_results/test_rclcpp/test_client_scope_cpp__rmw_cyclonedds_cpp.xunit.xml'
1/1 Test #18: test_client_scope_cpp__rmw_cyclonedds_cpp ...***Failed    5.87 sec

0% tests passed, 1 tests failed out of 1

Total Test time (real) =   5.88 sec

The following tests FAILED:
	 18 - test_client_scope_cpp__rmw_cyclonedds_cpp (Failed)

Additional information

I'm filing it here because this is where I have seen the failure. What the root issue is here I don't know:

  • it could be that the signal handling as implemented does not match the intent, that it was meant to be safe;
  • it could be that is the TOCTTOU issue inherent in the while(rclcpp::ok()){rclcpp::spin_some();} pattern;
  • it could be that the exception is acceptable and that it is use of SIGINT that is the issue;
  • ...

and so it may well be that this ticket should be moved elsewhere.

@jacobperron jacobperron self-assigned this Dec 22, 2020
jacobperron added a commit that referenced this issue Dec 22, 2020
Fixes #458.

An exception can be thrown if an interrupt occurs between checking rclcpp::ok() and rclcpp::spin_some().

Related to ros2/rclcpp#1066

Signed-off-by: Jacob Perron <[email protected]>
@jacobperron
Copy link
Member

Indeed, the pattern

while (rclcpp::ok()) {
  rclcpp::spin_some();
}

is not safe.

I found this older issue about it ros2/rclcpp#1066
The immediate solution was to wrap the whole thing in a try-catch, e.g. ros2/examples#270

Unfortunately, this pattern seems to be used in many places with a try-catch 😨
IMO, it is not great to require the user to wrap this pattern in a try-catch and it is perhaps worth re-opening ros2/rclcpp#1066.

In the meantime, I think the best thing to do is to guard against the exception in this particular case. I've proposed a fix in #459

@jacobperron jacobperron added the bug Something isn't working label Dec 22, 2020
jacobperron added a commit that referenced this issue Jan 21, 2021
Fixes #458.

An exception can be thrown if an interrupt occurs between checking rclcpp::ok() and rclcpp::spin_some().

Related to ros2/rclcpp#1066

Signed-off-by: Jacob Perron <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants