Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Context destructor not being called #893

Closed
ivanpauno opened this issue Oct 14, 2019 · 14 comments
Closed

Context destructor not being called #893

ivanpauno opened this issue Oct 14, 2019 · 14 comments
Labels
bug Something isn't working

Comments

@ivanpauno
Copy link
Member

Bug report

Required Info:

  • Operating System:
    • Ubuntu 18.04
  • Installation type:
    • from source
  • Version or commit hash:
    • current master
  • DDS implementation:
    • Fast-RTPS (should be DDS implementation agnostic)
  • Client library (if applicable):
    • rclcpp

Steps to reproduce issue

Expected behavior

The log should be printed after ctrl-c.

Actual behavior

It's never printed.

static DefaultContext::SharedPtr default_context = DefaultContext::make_shared();
return default_context;

For some reason, the destructor of the static variable is not being call.

Additional information

I realized of this while working in ros2/rmw_fastrtps#312.
I'm removing the participant from the domain when the context is destructed.
If that doesn't happen, other Participants will not immediately detect that it's not more available (Participant disposal is much slower than removal).

May be related with https://stackoverflow.com/questions/38510621/destructor-of-a-global-static-variable-in-a-shared-library-is-not-called-on-dlcl, as the example is written as a component https://github.com/ros2/demos/blob/master/demo_nodes_cpp/src/topics/talker.cpp.

@ivanpauno
Copy link
Member Author

@wjwwood any idea on how to solve this?

@wjwwood
Copy link
Member

wjwwood commented Oct 14, 2019

@wjwwood any idea on how to solve this?

Not off hand. How are you "printing", because at some point std::cout is destroyed and even the stdin/stdout handles are closed, but that's usually after static globals are destructed.

It would be less error prone to set a break point and see if you reach it instead.

May be related with https://stackoverflow.com/questions/38510621/destructor-of-a-global-static-variable-in-a-shared-library-is-not-called-on-dlcl, as the example is written as a component https://github.com/ros2/demos/blob/master/demo_nodes_cpp/src/topics/talker.cpp.

Does it happen with a non-composable node?

@ivanpauno
Copy link
Member Author

How are you "printing", because at some point std::cout is destroyed and even the stdin/stdout handles are closed, but that's usually after static globals are destructed.

It would be less error prone to set a break point and see if you reach it instead.

Good question. I added a breakpoint, the destructor is not being executed.

Does it happen with a non-composable node?

I have now tried ros2 run composition manual_composition, same problem.

@ivanpauno
Copy link
Member Author

What about storing a std::weak_ptr instead of a std::shared_ptr here:

static DefaultContext::SharedPtr default_context = DefaultContext::make_shared();
return default_context;

After the first node is created (using the default context), it will have a part ownership of the context.
So, it will be kept alive:

rclcpp::Context::SharedPtr context_ {
rclcpp::contexts::default_context::get_global_default_context()};

const rclcpp::NodeOptions node_options_;

@wjwwood
Copy link
Member

wjwwood commented Oct 15, 2019

I think you need to find the root cause of why the constructor is not getting called now. Switching away from a shared_ptr may work, but even if it does, it doesn't explain the original issue. I don't see why using a shared pointer is a problem.

@wjwwood
Copy link
Member

wjwwood commented Oct 15, 2019

Also the context should be created when rclcpp::init() is called, not when the first node is created. You can create a wait set without a node and a wait set requires a context:

context_->get_rcl_context().get(),

@mabelzhang mabelzhang added the bug Something isn't working label Oct 17, 2019
@fujitatomoya
Copy link
Collaborator

@ivanpauno @wjwwood

let us take care of this, we will try to figure out the root cause.

@ivanpauno
Copy link
Member Author

@ivanpauno @wjwwood

let us take care of this, we will try to figure out the root cause.

Sure, I will need a fix at some point, but I'm not going to dig more into this in a few weeks.

I don't see why using a shared pointer is a problem.

Static variables with non-trivial destructors tend to cause cpp non-sense problems.
The google style guide, for example, forbids them https://google.github.io/styleguide/cppguide.html#Static_and_Global_Variables.
std::weak_ptr destructor is trivial.
I would try to avoid the pattern of having an static variable with a non-trivial destructor.


IMO, the problem is probably caused because the "destructors" section of the library (where static variable destructors are called) is not being executed.

We currently are installing a signal handler, but chaining with the previously installed signal handler.
If that handler is the default one, destructors are not called in that case (e.g. see https://stackoverflow.com/questions/4250013/is-destructor-called-if-sigint-or-sigstp-issued).
Probably, if the programs ends normally (no ctrl-c), the destructor will be called (I didn't test that).

There are other situations where this can happen, e.g. abnormal termination when an exception is not catched, calling std::terminate, std::abort or std::exit: https://stackoverflow.com/questions/12728535/will-global-static-variables-be-destroyed-at-program-end.

And better, all of this may be platform dependent.

@wjwwood
Copy link
Member

wjwwood commented Oct 18, 2019

std::weak_ptr destructor is trivial.

I don't see how the weak pointer will solve the problem. As a singleton it must exist even if someone else is not holding it until someone calls shutdown.

I would try to avoid the pattern of having an static variable with a non-trivial destructor.

That's a good goal, but I think it's unrealistic in this case.

I understand the the style guide says to avoid it, but I think it's a general rule. In this case, I think it's unavoidable and until we know exactly why it isn't working (beyond theorizing) I would recommend we hold off on deciding what to change.

Their rationale has to do with tricky bugs, especially when objects with non-trivial destructors interact with one another.

I'm not convinced that's the root of this bug (yet), and while I agree they should be avoided in cases where they are required (and there are a few, even the std library uses them), I think you just have to be careful and do them right.

I think this rule exists mostly to discourage people from using static global std::string when they could do const char *, for example, and to discourage complex relationships between non-trivial static globals.

We currently are installing a signal handler, but chaining with the previously installed signal handler.
If that handler is the default one, destructors are not called in that case (e.g. see https://stackoverflow.com/questions/4250013/is-destructor-called-if-sigint-or-sigstp-issued).

Yes, but we have a thread which is notified on ctrl-c to shutdown. It's possible the process aborts before we can get to it, but I think the detached thread would block abort until it finished. That's definitely something to check.

But we should assert that this is the problem (with print statements or gdb/lldb), not just assume.

There are other situations where this can happen, e.g. abnormal termination when an exception is not catched, calling std::terminate, std::abort or std::exit: https://stackoverflow.com/questions/12728535/will-global-static-variables-be-destroyed-at-program-end.

But that's ok (in my opinion), the only cases we should be interested in properly shutting down is when the main function exits normally, or when ctrl-c (sigint) is received (and we've allowed the signal handling to be installed). All other cases will not have a proper shutdown, and that is expected in my opinion.

It sounds like to me there are three main cases:

  • main function returns
  • SIGINT received
  • abort/terminate (due to unhandled exceptions or SIGTERM or something else)

For the first, you said:

Probably, if the programs ends normally (no ctrl-c), the destructor will be called (I didn't test that).

We should test that and see.

For the SIGINT, it could be this is the only thing that is actually broken, we need to confirm that our shutdown thread is not able to complete due to some default signal handler calling abort or something like that. This may be an issue (I can imagine it being the case) but we should confirm.

And for the last case, I personally think it's expected behavior that shutdown is not called. I mean if you put rclcpp::shutdown at the end of your main function it would not get called in this case either...

@Barry-Xu-2018
Copy link
Collaborator

Barry-Xu-2018 commented Oct 25, 2019

@ivanpauno @wjwwood @fujitatomoya

After investigation, the root cause is

  • rclcpp/rclcpp/src/rclcpp/node_interfaces/node_graph.cpp

    NodeGraph::NodeGraph(rclcpp::node_interfaces::NodeBaseInterface * node_base)
    : node_base_(node_base),
     graph_listener_(
    node_base->get_context()->get_sub_context<GraphListener>(node_base->get_context())       <==  add context shared_ptr into Context as SubContext
     ),
  • ros2/rclcpp/rclcpp/include/rclcpp/context.hpp

    class Context : public std::enable_shared_from_this<Context>
    std::unordered_map<std::type_index, std::shared_ptr<void>> sub_contexts_;          <== notice context shared_ptr is saved into this map
    

After Context::shutdown called, context shared_ptr still is in the map 'sub_contexts_'. No one remove context share_ptr from this map. This leads destructor of context is never called. I think this problem occurs not only termination by 'ctrl+c', but also normal termination.

We can modify as below to fix this problem.

bool
Context::shutdown(const std::string & reason) {
  ...
  std::lock_guard<std::recursive_mutex> lock(sub_contexts_mutex_);
  sub_contexts_.clear();                                                       <== clear (remove context shared_ptr)
}

Fixing has been made. Please review #906

@ivanpauno
Copy link
Member Author

@ivanpauno @wjwwood @fujitatomoya

After investigation, the root cause is

  • rclcpp/rclcpp/src/rclcpp/node_interfaces/node_graph.cpp
    NodeGraph::NodeGraph(rclcpp::node_interfaces::NodeBaseInterface * node_base)
    : node_base_(node_base),
     graph_listener_(
    node_base->get_context()->get_sub_context<GraphListener>(node_base->get_context())       <==  add context shared_ptr into Context as SubContext
     ),
  • ros2/rclcpp/rclcpp/include/rclcpp/context.hpp
    class Context : public std::enable_shared_from_this<Context>
    std::unordered_map<std::type_index, std::shared_ptr<void>> sub_contexts_;          <== notice context shared_ptr is saved into this map

After Context::shutdown called, context shared_ptr still is in the map 'sub_contexts_'. No one remove context share_ptr from this map. This leads destructor of context is never called. I think this problem occurs not only termination by 'ctrl+c', but also normal termination.

Nice catch!

We can modify as below to fix this problem.

bool
Context::shutdown(const std::string & reason) {
  ...
  std::lock_guard<std::recursive_mutex> lock(sub_contexts_mutex_);
  sub_contexts_.clear();                                                       <== clear (remove context shared_ptr)
}

Fixing has been made. Please review #906

I'm not too sure about the fix, I think it's not safe to assume that the sub-context won't be created again after shutdown is called (generating again the cyclic dependency).
@wjwwood What do you think?

@wjwwood
Copy link
Member

wjwwood commented Nov 18, 2019

Sorry for the delay in responding.

I'm not too sure about the fix, I think it's not safe to assume that the sub-context won't be created again after shutdown is called (generating again the cyclic dependency).

Yes, this is not a safe fix. I think the right thing to do is for anything that has a reference to the Context and could be owned by a sub context, needs to keep a weak pointer to the context instead. so in the case of graph listener, it will need to use a weak pointer rather than a shared pointer to the context and deal with any tear-down issue that follow.

@Barry-Xu-2018
Copy link
Collaborator

@ivanpauno @wjwwood @fujitatomoya

Sorry for the delay in responding.

I'm not too sure about the fix, I think it's not safe to assume that the sub-context won't be created again after shutdown is called (generating again the cyclic dependency).

Yes, this is not a safe fix. I think the right thing to do is for anything that has a reference to the Context and could be owned by a sub context, needs to keep a weak pointer to the context instead. so in the case of graph listener, it will need to use a weak pointer rather than a shared pointer to the context and deal with any tear-down issue that follow.

Thanks for your comments.
After checking codes, we think using weak pointer doesn't cause a lot of code changes.
I have finished change.
Please review 5bd65eb

@ivanpauno ivanpauno changed the title Default Context destructor not being called Context destructor not being called Dec 5, 2019
@ivanpauno
Copy link
Member Author

Fixed by #906.

DensoADAS pushed a commit to DensoADAS/rclcpp that referenced this issue Aug 5, 2022
* Keyboard controls for playback rate

Signed-off-by: Emerson Knapp <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants