Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ros2] Improve executor destruction #860

Closed
wants to merge 3 commits into from
Closed

[ros2] Improve executor destruction #860

wants to merge 3 commits into from

Conversation

chapulina
Copy link
Contributor

This improves the situation on #855 for me locally. It goes from always failing to failing once every 10 runs 😅

@@ -28,6 +28,7 @@ std::mutex Node::lock_;

Node::~Node()
{
executor_->remove_node(get_node_base_interface());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might not have enough insights into this, but do you really need this? I thought the executor only stores weak pointers to the node and removes them once not active anymore.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a full grasp but the docs here: https://github.com/ros2/rclcpp/blob/master/rclcpp/include/rclcpp/executor.hpp#L135-L136 make me think that explicitly removing it might wake up an executor when just letting the weak reference expire might not. If that's the case my inclination would be to explicitly pass True rather than relying on it being the default value forever.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but do you really need this? I thought the executor only stores weak pointers to the node and removes them once not active anymore.

I'd also expect this not to be needed, but without explicitly removing the node, the executor never stops spinning even after all nodes it holds are destroyed.

my inclination would be to explicitly pass True rather than relying on it being the default value forever

Makes sense, I'll force it to be true

Copy link

@nuclearsandwich nuclearsandwich Feb 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

without explicitly removing the node, the executor never stops spinning even after all nodes it holds are destroyed

It might be worth checking with rclcpp devs whether relying on weak refs is the intention since any node could potentially be the last node and cause the indefinite spin that explicit removal and wakeup avoids. It might be that weak references are used just to protect from asynchronous node expiration but that explicit removal is still expected.

Edit: It needs to be said I'm speculating wildly. I'll be more impressed than anyone if I'm even half right.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forced true in 28e84a5


So I realize that this PR didn't have enough context to begin with, sorry about that. Here's a summary of what happens when the node doesn't remove itself:

I added these print statements:

diff --git a/gazebo_ros/src/executor.cpp b/gazebo_ros/src/executor.cpp
index db4d6fb..f90d7c1 100644
--- a/gazebo_ros/src/executor.cpp
+++ b/gazebo_ros/src/executor.cpp
@@ -37,7 +37,9 @@ Executor::~Executor()
 
 void Executor::run()
 {
+std::cout << "spin" << std::endl;
   spin();
+std::cout << "/spin" << std::endl;
 }
 
 void Executor::shutdown()
diff --git a/gazebo_ros/src/node.cpp b/gazebo_ros/src/node.cpp
index 7c7434b..ca73c18 100644
--- a/gazebo_ros/src/node.cpp
+++ b/gazebo_ros/src/node.cpp
@@ -28,7 +28,8 @@ std::mutex Node::lock_;
 
 Node::~Node()
 {
-  executor_->remove_node(get_node_base_interface(), true);
+//  executor_->remove_node(get_node_base_interface(), true);
+  std::cout << this << " ~Node" << std::endl;
 }
 
 Node::SharedPtr Node::Get(sdf::ElementPtr sdf)
diff --git a/gazebo_ros/test/test_node.cpp b/gazebo_ros/test/test_node.cpp
index 6a1c88c..3051f77 100644
--- a/gazebo_ros/test/test_node.cpp
+++ b/gazebo_ros/test/test_node.cpp
@@ -77,8 +77,11 @@ TEST(TestNode, GetSdf)
   EXPECT_NE(node_1, node_2);
 
   // Reset both
+std::cout << "Reset node 1: " <<  node_1 << std::endl;
   node_1.reset();
+std::cout << "Reset node 2: " <<  node_2 << std::endl;
   node_2.reset();
+std::cout << "Reseted both notes" << std::endl;
 
   // Create another node
   auto node_3 = gazebo_ros::Node::Get(plugin_sdf);

And get this output:

11: spin
11: Reset node 1: 0x55f7e35c2930
11: 0x55f7e35c2930 ~Node
11: Reset node 2: 0x55f7e483fe60
11: 0x55f7e483fe60 ~Node
# test is blocked here forever and eventually times out

So I'm not sure what's happening, but trying to destroy node_2 using node_2.reset() blocks forever without this fix. I haven't dug into the rclcpp code; there may be a problem there, but it's also possible we're just using it wrong.

For comparison, here's the result if I uncomment the remove_node:

11: spin
11: Reset node 1: 0x564c17606600
11: 0x564c17606600 ~Node
11: Reset node 2: 0x564c18801b60
11: 0x564c18801b60 ~Node
11: /spin
11: Reseted both notes
11: [INFO] [gazebo_ros_node]: ROS was initialized without arguments.
11: spin
11: 0x564c178ee370 ~Node
11: /spin

Note how spin() returns after the node_2 is reset, and re-spins for node_3.

@chapulina
Copy link
Contributor Author

TODO(@chapulina) : rebase against new crystal branch

@chapulina
Copy link
Contributor Author

I'm declining this PR for now because after the debugging session with @wjwwood it seems that the problem could be coming from deeper layers. The problematic tests have been commented out in #901 and #855 will keep tracking the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants