You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ros2/rmw_fastrtps#671 added logging in a callback in CustomParticipantInfo, meaning when that callback is called, it will try to acquire the global logging mutex.
In eProsima's PDP class, there's another mutex that gets acquired. One way it's acquired is before calling the above callback in CustomParticipantInfo.
I'm seeing a case where a cyclonedds subscriber is already started, and I'm starting a FastDDS publisher. The main thread acquires the logging mutex, tries to init the node, and is blocked trying to acquire the PDP mutex while creating the ros_discovery_info topic. The reason it's blocked is there's another thread that learned of the cyclonedds subscriber, acquired the PDP mutex, notified the custom participant listener, tried to log a message about a type hash mismatch, which tries to acquire the logging mutex that's held in the main thread.
I think the acquisition of the logging mutex should be reduced in scope so that it doesn't cover the entire rcl_node_init() call, so that deadlock like this is avoided when logging happens during the initialization process.
A workaround for the deadlock is to make rmw_fastrtps not use RCUTILS logging, so that it won't try to acquire the global logging mutex.
The text was updated successfully, but these errors were encountered:
@sloretz as you and @ivanpauno mentioned in the doc section, those can be decoupled and acquisition of the logging mutex should be reduced to avoid deadlock. i created a few PRs to address this issue, can you take a look when you have time?
#1125 made
NodeBase
acquire the global logging mutex before callingrcl_node_init()
.rclcpp/rclcpp/src/rclcpp/node_interfaces/node_base.cpp
Lines 58 to 63 in a5368e6
The rclcpp output handler also acquires the global logging mutex.
rclcpp/rclcpp/src/rclcpp/context.cpp
Line 133 in a5368e6
ros2/rmw_fastrtps#671 added logging in a callback in
CustomParticipantInfo
, meaning when that callback is called, it will try to acquire the global logging mutex.https://github.com/ros2/rmw_fastrtps/blob/901339f274fc07fad757fb32bd16f00815217302/rmw_fastrtps_shared_cpp/include/rmw_fastrtps_shared_cpp/custom_participant_info.hpp#L214-L220
In eProsima's PDP class, there's another mutex that gets acquired. One way it's acquired is before calling the above callback in
CustomParticipantInfo
.https://github.com/eProsima/Fast-DDS/blob/8a5a9160482b1543495c1ba49f3100fcceda12d9/src/cpp/rtps/builtin/discovery/participant/PDP.cpp#L773-L847
Another way it get's acquired is when creating a datawriter, as
rmw_fastrtps_cpp
does while creating theros_discovery_info
topic.https://github.com/ros2/rmw_fastrtps/blob/901339f274fc07fad757fb32bd16f00815217302/rmw_fastrtps_cpp/src/init_rmw_context_impl.cpp#L91-L97
Which leads to the mutex being acquired here:
https://github.com/eProsima/Fast-DDS/blob/8a5a9160482b1543495c1ba49f3100fcceda12d9/src/cpp/rtps/builtin/discovery/participant/PDP.cpp#L858-L869
I'm seeing a case where a cyclonedds subscriber is already started, and I'm starting a FastDDS publisher. The main thread acquires the logging mutex, tries to init the node, and is blocked trying to acquire the PDP mutex while creating the
ros_discovery_info
topic. The reason it's blocked is there's another thread that learned of the cyclonedds subscriber, acquired the PDP mutex, notified the custom participant listener, tried to log a message about a type hash mismatch, which tries to acquire the logging mutex that's held in the main thread.I think the acquisition of the logging mutex should be reduced in scope so that it doesn't cover the entire
rcl_node_init()
call, so that deadlock like this is avoided when logging happens during the initialization process.A workaround for the deadlock is to make
rmw_fastrtps
not use RCUTILS logging, so that it won't try to acquire the global logging mutex.The text was updated successfully, but these errors were encountered: