Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Executor does not maintain a reference to nodes #726

Open
rotu opened this issue May 21, 2019 · 10 comments
Open

Executor does not maintain a reference to nodes #726

rotu opened this issue May 21, 2019 · 10 comments

Comments

@rotu
Copy link
Contributor

rotu commented May 21, 2019

Bug report

Required Info:

  • Operating System:
    • Ubuntu 18.04.2 LTS (Bionic Beaver)
  • Installation type:
    • from source
  • Version or commit hash:
  • DDS implementation:
    • Fast-RTPS
  • Client library (if applicable):
    • rclcpp

Steps to reproduce issue

Create a shared pointer to a Node, pass it to SingleThreadedExecutor::add_node. Delete or reset the shared pointer. Spin the executor.

#include "rclcpp/rclcpp.hpp"

int main(int argc, char** argv)
{
  rclcpp::init(argc, argv);

  rclcpp::executors::SingleThreadedExecutor executor;
  executor.add_node(std::make_shared<rclcpp::Node>("Test"));
  executor.spin();
  return 0;
}

Expected behavior

The node should run.

Actual behavior

ROS crashes with an error like:

[ERROR] [rclcpp]: Couldn't add guard_condition to wait set: guard condition implementation is invalid, at /opt/ros/master/src/ros2/rcl/rcl/src/rcl/guard_condition.c:174, at /opt/ros/master/src/ros2/rcl/rcl/src/rcl/wait.c:455
terminate called after throwing an instance of 'std::runtime_error'
  what():  Couldn't fill wait set

Additional information

@rotu
Copy link
Contributor Author

rotu commented May 21, 2019

this may be a duplicate of 272

@tfoote
Copy link
Contributor

tfoote commented May 21, 2019

I would expect this program to run and not crash, as ticketed in #272. But i would not expect the node to persist.

Internally an executor keeps a weak reference to the nodes that have been added to it. And it needs to make sure to check the weak pointer everywhere before dereferncing it.

But if you don't keep your node around it should not persist just because it's been attached to an executor. In your test program you would never even be able to remove it from the executor since you don't have a reference to it to remove so it would end up persisting until the executor is destructed which is not a relevant lifetime.

@rotu
Copy link
Contributor Author

rotu commented May 21, 2019

If it’s supposed to be a weak reference, the API should be taking a pointer or an std::weak_ptr.

In the example program, I would expect the node to be destroyed when all references are gone. Yes, this means when the user presses ctrl-c and the executor’s destructor is called. What do you mean this is not a relevant lifecycle?

I guess I’m confused by this statement “But if you don't keep your node around it should not persist just because it's been attached to an executor”. Why? The only reason the node is created is to pass to an executor. When is it desirable for an executor to have a reference to a deleted node?

@allenh1
Copy link
Contributor

allenh1 commented May 21, 2019

@rotu is your source checkout up to date? I feel like there is a missing argument to the rclcpp::Node constructor (namely rclcpp::NodeOptions).

I implemented something similar spinning my node here.

Looks like the add_node function takes in a std::shared_ptr and stores as a std::weak_ptr (see here).

I'm not sure that should be done. I would expect this to be the other way around, since weak_ptr should not claim ownership of the pointer. @wjwwood is there any particular reason to store as a weak_ptr?

@rotu
Copy link
Contributor Author

rotu commented May 21, 2019

My source checkout is up to date. options has a default initializer.

@wjwwood
Copy link
Member

wjwwood commented May 21, 2019

If it’s supposed to be a weak reference, the API should be taking a pointer or an std::weak_ptr.

I get the sentiment behind this, but I don't 100% agree. Leaning on the Core Guidelines when uncertain, I'd say taking a shared_ptr so that we can use it for a while and then store it as a weak_ptr until needed later, is an appropriate use:

https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md#r-resource-management

We could take a weak_ptr and immediately .lock() inside add_node, but I think most people would think that's unnecessary. However, I'll consider it for when we update the executor API to support adding callback groups rather than nodes (so you can have more than one executor per node).

In the example program, I would expect the node to be destroyed when all references are gone. Yes, this means when the user presses ctrl-c and the executor’s destructor is called. What do you mean this is not a relevant lifecycle?

Assuming that a function that takes a shared_ptr<T> will share the ownership is reasonable, but it's not reasonable (imo) to assume it will keep it forever.

Ultimately the shared_ptr<Node> you create should be kept alive as long as you need it and that includes spinning on it. I don't think it's the job of the executor to keep it alive for you.

At any rate this is not how executors work and I don't think it makes sense to change that behavior, otherwise all the code which currently add's a node to an executor will need to be updated to remove it at some point otherwise the nodes would stay around as long as the executor does. That's a big breaking change with little upside in my opinion.

So based on that I think the described issue with executor not holding a shared pointer rather than a weak pointer is a "won't fix" for me. However, the proposed change in API signature (to use weak_ptr) is reasonable and, also, it should not crash as it does.

Instead, I think it should run (as @tfoote said) but the node should still be destroyed, or maybe spin() should throw an error since the executor is empty (not sure what behavior we want in that case).


If you want the shortest code possible, do this:

#include "rclcpp/rclcpp.hpp"

int main(int argc, char** argv)
{
  rclcpp::init(argc, argv);

  rclcpp::spin(std::make_shared<rclcpp::Node>("Test"));
  return 0;
}

Or with a custom executor:

#include "rclcpp/rclcpp.hpp"

int main(int argc, char** argv)
{
  rclcpp::init(argc, argv);

  auto node = std::make_shared<rclcpp::Node>("Test");
  rclcpp::executors::SingleThreadedExecutor executor;
  executor.add_node(node);
  executor.spin();
  return 0;
}

I think we should close this as a duplicate of #272, and if desired you guys can open an issue asking for the add_node signature to be changed to take a weak_ptr rather than a shared_ptr.

@allenh1
Copy link
Contributor

allenh1 commented May 21, 2019

I think we should close this as a duplicate of #272, and if desired you guys can open an issue asking for the add_node signature to be changed to take a weak_ptr rather than a shared_ptr.

@wjwwood I think that's definitely reasonable. My biggest issue, to be clear, is that I wouldn't expect something that has a add_node function to mean "I'll hold this and you do stuff to it." I'd take that idea (changing the signature to take a weak_pointer) and propose that we add an emplace_node function that could allow for this sort of quick-hand insertion.

Maybe something like this? I'm pretty sure this won't compile, but, in the off chance I got all the templates right to make the compiler happy, I'm still not so I don't think this is the right idea. The issue is I couldn't work around the templated return type in case the subclassed node has a different number of arguments than the base class in its constructor, as well as the fact that it is pretty important to verify that the subclassed node type is also a node type. Maybe my horrifying mess of whatever is below will inspire someone who is better at this than me to come up with something clever (or at least has a better chance of compiling than this horrible mess)?

template<typename... Args>
std::shared_ptr<T> emplace_node(Args && ... args)
{
  std::shared_ptr<T> node = std::make_shared<T>(std::forward<decltype(args)>(args)...);
  // expensive stuff:
  std::shared_ptr<rclcpp::Node> as_rclcpp_node = std::dynamic_pointer_cast<rclcpp::Node>(node);
  if (nullptr == as_rclcpp_node) {
    return nullptr;  // for shame
  }
  return weak_nodes_.emplace_back(std::move(as_rclcpp_node)).lock();  // absolutely disgusting
}

@rotu
Copy link
Contributor Author

rotu commented May 21, 2019

@wjwwood, the core guidelines you linked to explains exactly why add_node should not take a smart pointer:

R.30: Take smart pointers as parameters only to explicitly express lifetime semantics
Reason

Accepting a smart pointer to a widget is wrong if the function just needs the widget itself. It should be able to accept any widget object, not just ones whose lifetimes are managed by a particular kind of smart pointer. A function that does not manipulate lifetime should take raw pointers or references instead.

I see your point that it would be a possibly breaking change. I do think that it makes sense to keep the node object alive while it might be still executed. If the caller knows the node is no longer needed, they have a good way to indicate that: remove_node.

@wjwwood
Copy link
Member

wjwwood commented May 21, 2019

My biggest issue, to be clear, is that I wouldn't expect something that has a add_node function to mean "I'll hold this and you do stuff to it."

I can understand where the confusion comes from, but as I said, I don't think that the actual behavior (i.e. "I [the executor] will consider this node when spinning") is a less valid interpretation given just the name and signature. And also that changing that behavior is a major behavioral change (code still compiles but does not work right) and for a fairly small benefit. That's why I don't really want to change that.

propose that we add an emplace_node function that could allow for this sort of quick-hand insertion.

Emplace means something different in the C++ world, it means "construct and insert" in the context of a container (avoiding the need to create it locally and then copy on insertion). First of all, the executor isn't a container, instead it operates on references to the nodes, and it uses shared ownership at times to avoid some of the nastier bugs a user can run into if they destroy things out of order (or it's supposed to, it's failing to do so in this case). Second, the node is already constructed and always constructed as a shared_ptr (as it inherits from std::enable_shared_from_this) and so it doesn't benefit from "emplace", as copying the shared pointer which owns it is trivial.

We could still have the function you've proposed, but it really doesn't buy us anything I think. It's just so trivial to hold a reference to the node in the same scope that you create the executor.


the core guidelines you linked to explains exactly why add_node should not take a smart pointer:

That's fair and we could provide a version of add_node() that takes rclcpp::Node & as well, but then it's up to the user to ensure that the node outlives the executor (or it is removed before the node is destroyed). The shared pointer interface was meant to be a "guard rail" to protect people from that lifetime issue and I wouldn't normally include that in a fresh design, but we were also trying to emulate ROS 1 behavior and emulate it's "forgiving" API design.

As I said, we have plans to refactor the Executor so it can operate on callback groups instead of just nodes (a more granular relationship between nodes and executors), and we can reconsider these API's then.

In the meantime, I'd be ok with better errors, warnings (maybe if a node is immediately created and destroyed, though that might be annoying for cases like tests), and/or updating the signature to take a weak_ptr (that change is cheap and easy).

@allenh1
Copy link
Contributor

allenh1 commented May 22, 2019

We could still have the function you've proposed, but it really doesn't buy us anything I think. It's just so trivial to hold a reference to the node in the same scope that you create the executor.

I strongly agree. After thinking about what the factory looked like (by writing that ghastly monstrosity up there) I very quickly saw no benefits as well as confusion, stomach aches, new places for code to get caught, unpredictable behavior, and a less performant solution than simply declaring a node, then calling add_node on it.

I'm honestly not sure what to do about this case. This is just going to be one of those threads that gets pointed at until people remember not to do it, I think.

@wjwwood thank you so much for all the input.

nnmm pushed a commit to ApexAI/rclcpp that referenced this issue Jul 9, 2022
* Add tests proving the required functionality
* Add check to the functions
* Change name to test

Signed-off-by: Jorge Perez <[email protected]>
DensoADAS pushed a commit to DensoADAS/rclcpp that referenced this issue Aug 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants