-
Notifications
You must be signed in to change notification settings - Fork 773
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test_node is flaky #1169
Comments
I'm looking into a fix. |
Possibly related: I don't think |
gazebo_ros incorrectly assumes rclcpp::MultiThreadedExecutor is thread-safe. This change adds thread-safe wrappers to methods for adding/removing nodes and spinning. I believe this fixes #1169. This may have some performance impact on two fronts: 1) the locking while spinning and 2) switching from calling spin() to spin_once() (I think) removes the multi-threaded nature of the executor. Signed-off-by: Jacob Perron <[email protected]>
The rclcpp::MultiThreadedExecutor is not thread safe, and so it is possible to hit races that cause gazebo_ros to crash. Specifically, there's a race when adding and removing nodes, which can happen when node are created and destroyed during runtime. This is a copy of rclcpp::Executor with some modifications to make it thread safe. A lock was added to the spin implementation and also to the add_node and remove_node methods. In order to avoid deadlock, we must have a timeout in the spin loop while waiting for work. Fixes #1169 Signed-off-by: Jacob Perron <[email protected]>
Using ros2/rclcpp#1505, which addresses thread-safety issues, I still get the same crash in
|
I will try to reproduce, as it doesn't seem obvious what's the problem from the traceback. PS:
|
Fixed by #1212 |
I first noticed the test failing here: https://build.osrfoundation.org/job/ros2_gazebo_pkgs-ci-default_rolling-focal-amd64/5/
Locally, I can reproduce if I run the test standalone (not with
colcon test
). I think the lack of stress is uncovering some sort of race in the test:Backtrace
The text was updated successfully, but these errors were encountered: