Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single-threaded executor can be starved by timers #392

Open
dhood opened this issue Nov 8, 2017 · 6 comments
Open

Single-threaded executor can be starved by timers #392

dhood opened this issue Nov 8, 2017 · 6 comments
Assignees
Labels
bug Something isn't working hitlist ready Work is about to start (Kanban column)

Comments

@dhood
Copy link
Member

dhood commented Nov 8, 2017

Context (comes from ros2/demos#187): A single-threaded executor has a timer scheduled for every N seconds, and also a service server.

Bug: If the timer callback takes >=N seconds to complete, the executor will never process service requests once the timer is triggered for the first time.

This is because of the combination of the following:

  1. When the spinning executor calls to get_next_executable, it does not call rcl_wait since the timer's ready, so the service is not marked as ready.
  2. Even if you force get_next_executable to always call wait_for_work so the server can get receive its request, the server will still not get processed by the executor because the timer will always get chosen first as the next_ready_executable.

Forcing get_next_executable to always call wait_for_work and giving timers lower priority in get_next_ready_executable will fix this situation, but it is inefficient to wait when it's not necessary, and I'm not sure checking timers last is a fix-all (could there be a parallel situation when the server needs to be the lowest priority?).

@dirk-thomas mentioned that a queue of some sort is probably more appropriate, so that get_next_executable processes events in the order that they were received.

For now, we have to recommend that users not permit timer callbacks to block for longer than the duration at which they're scheduled (this might be in our documentation somewhere already?).

@dhood dhood added the bug Something isn't working label Nov 8, 2017
@mikaelarguedas
Copy link
Member

related to executor starvation reported #280
Right now we will match the behavior of rclpy and provide documentation as of how to make custom executors.

@Karsten1987
Copy link
Contributor

@dhood I assigned you to this ticket, but added it to the untargeted milestone given the last comment of @mikaelarguedas that documentation might suffice for the moment.
Please feel free to iterate over it again or assign it to bouncy in case you this behavior has to be fixed before the release.

@dirk-thomas
Copy link
Member

We shouldn't mark this with the milestone untargeted since that implies that we won't "ever" look at it again. Since this is actually a bug we would like to fix at some point I will remove the milestone.

@dirk-thomas dirk-thomas removed this from the untargeted milestone Feb 26, 2018
@mikaelarguedas mikaelarguedas added this to the bouncy milestone Mar 15, 2018
@mikaelarguedas mikaelarguedas added in progress Actively being worked on (Kanban column) and removed ready Work is about to start (Kanban column) labels Mar 29, 2018
@mjcarroll mjcarroll self-assigned this Mar 29, 2018
@mjcarroll mjcarroll added ready Work is about to start (Kanban column) and removed in progress Actively being worked on (Kanban column) labels Apr 12, 2018
@wjwwood wjwwood removed this from the bouncy milestone Jul 19, 2018
@ubald
Copy link

ubald commented Mar 12, 2019

👋 I ended up here trying to find our why when I had multiple timers with short intervals (and longer callbacks) only the first one worked, the others are never executed.

Is it possible that both issues are related?

@wjwwood
Copy link
Member

wjwwood commented Mar 12, 2019

@ubald I do not believe what you've described is the issue here. This issue is essential that we do not have fair scheduling and so as long as timers are always ready then other things like Services will never get handled. However, more than "the first one" work in this case.

@ubald
Copy link

ubald commented Mar 12, 2019

Thanks for the quick reply!

Please bear with me here, I just started with ROS2 yesterday, I don't have a full grasp of all the pieces involved yet, but when looking at the code it looks to me like it's related:

When calling get_next_timer:

get_next_timer(any_exec);

It iterates over all timers in the same order each time, so it will always start by checking the first one created
for (auto & timer_ref : group->get_timer_ptrs()) {
auto timer = timer_ref.lock();
if (timer && timer->is_ready()) {

Then if it has been running for longer than its interval it will always be ready here in rcl

I'll see if I can switch from a debian install to a source build, this way I might try to fix it and make a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working hitlist ready Work is about to start (Kanban column)
Projects
None yet
Development

No branches or pull requests

7 participants