-
Notifications
You must be signed in to change notification settings - Fork 422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rclcpp::Node constructor and destructor crash with multithreading #1042
Comments
@thomas-moulard re: https://github.com/ros-tooling/aws-oncall/issues/107 (can't comment on my own ticket in aws-oncall repo since I'm not a collaborator) |
does anybody work on the fix? |
As far as I can tell, no. @dirk-thomas is listed as the package maintainer so maybe he knows. |
@rotu how it was concluded that the thread safety issue is in I will try to track down the issue, but it would be great to have clearer information (and preferably, a simple example with reproduction steps). |
While I am listed in the manifest this repo is maintainer by the whole @ros2/team. |
I think I understand the issue:
We really should get rid of those global variables in I don't see a way of completely fixing the issue, as the rosout output handler does access the same map (https://github.com/ros2/rcl/blob/94b5a1d7d0899aa84a1026e21488bee95e67bbd8/rcl/src/rcl/logging_rosout.c#L241), and adding a lock to mutually exclude all node constructor calls, node destructor calls and all logs seems like a pretty bad idea. For the time being, I think that adding a global mutex protecting node construction/destruction will greatly reduce the issue (there might be a multithread crash between one thread constructing/destructing a node and another one logging something). |
Thinking about this again, I think we should actually add mutexes in I know there was some resistance about adding locks in |
I think that locks in |
Please update the package to reflect reality https://www.ros.org/reps/rep-0149.html#maintainer-multiple-but-at-least-one |
👎 to adding locks to
This is the right solution that I have pointed out several times during the development of the logging API's. It is also needed to address long standing issues with static destruction order... Any other solutions are band-aids in my opinion and this particular proposed one happens to also be a step backwards in design as well, in my opinion. |
Why? |
Yes, I regret having worked around a similar issue recently, instead of have pushed for a refactor of the loggers API.
Completely agreed.
I agree that writing "object oriented like" in
I will expand on this point ...
In general, I also vote 👎 for adding this kind of thing to |
I have to agree with @ivanpauno. For Foxy, assuming this issue is strictly related to |
Reverts most of #562 ros2/rclcpp#1042 describes a crash that can occur in rclcpp when rcl logging functions are called in different threads. This was fixed in ros2/rclcpp#1125, and a similar fix was made for rclpy in #562. This fix is unnecessary in rclpy because it cannot call the logging macros from multiple threads unless the GIL is released. The only place the GIL is released is around rcl_wait(), so the logging methods are already protected. Removing this code makes it a little easier to divide the remaining work of porting _rclpy.c to pybind11. If for some reason we decide to release the GIL around logging methods in the future, then they can be protected in the future using `pybind11::call_guard<T>` with a type that locks a global logging mutex when it is default constructed and unlocks it when its destructed. Signed-off-by: Shane Loretz <[email protected]> Signed-off-by: Shane Loretz <[email protected]>
Use of
rclcpp::Node
is unsafe in an asynchronous setting and can cause segfaults and undefined behavior if such objects are constructed and/or destructed simultaneously.See ros2/rosbag2#329 where this directly caused a reproducible crash. While the particular crash was averted in ros2/rosbag2#338, the constructor/destructor still use the unsafe functions
rcl_node_init
/rcl_node_fini
.Either these functions should be made threadsafe or they should be documented as unsafe and audited/instrumented to detect such unsafe usage.
The text was updated successfully, but these errors were encountered: