-
Notifications
You must be signed in to change notification settings - Fork 417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Single-threaded executor can be starved by timers #392
Comments
related to executor starvation reported #280 |
@dhood I assigned you to this ticket, but added it to the untargeted milestone given the last comment of @mikaelarguedas that documentation might suffice for the moment. |
We shouldn't mark this with the milestone |
👋 I ended up here trying to find our why when I had multiple timers with short intervals (and longer callbacks) only the first one worked, the others are never executed. Is it possible that both issues are related? |
@ubald I do not believe what you've described is the issue here. This issue is essential that we do not have fair scheduling and so as long as timers are always ready then other things like Services will never get handled. However, more than "the first one" work in this case. |
Thanks for the quick reply! Please bear with me here, I just started with ROS2 yesterday, I don't have a full grasp of all the pieces involved yet, but when looking at the code it looks to me like it's related: When calling rclcpp/rclcpp/src/rclcpp/executor.cpp Line 518 in 070b312
It iterates over all timers in the same order each time, so it will always start by checking the first one created rclcpp/rclcpp/src/rclcpp/executor.cpp Lines 500 to 502 in 070b312
Then if it has been running for longer than its interval it will always be ready here in rcl I'll see if I can switch from a debian install to a source build, this way I might try to fix it and make a PR. |
Signed-off-by: Shane Loretz <[email protected]>
Signed-off-by: Dirk Thomas <[email protected]>
Context (comes from ros2/demos#187): A single-threaded executor has a timer scheduled for every N seconds, and also a service server.
Bug: If the timer callback takes >=N seconds to complete, the executor will never process service requests once the timer is triggered for the first time.
This is because of the combination of the following:
get_next_executable
, it does not callrcl_wait
since the timer's ready, so the service is not marked as ready.get_next_executable
to always callwait_for_work
so the server can get receive its request, the server will still not get processed by the executor because the timer will always get chosen first as thenext_ready_executable
.Forcing
get_next_executable
to always callwait_for_work
and giving timers lower priority inget_next_ready_executable
will fix this situation, but it is inefficient to wait when it's not necessary, and I'm not sure checking timers last is a fix-all (could there be a parallel situation when the server needs to be the lowest priority?).@dirk-thomas mentioned that a queue of some sort is probably more appropriate, so that
get_next_executable
processes events in the order that they were received.For now, we have to recommend that users not permit timer callbacks to block for longer than the duration at which they're scheduled (this might be in our documentation somewhere already?).
The text was updated successfully, but these errors were encountered: