Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid timers to be executed twice in the multithreaded executor #1328

Closed

Conversation

ivanpauno
Copy link
Member

@ivanpauno ivanpauno commented Sep 23, 2020

Fixes #1374 by adding a mutex in Timer class, that makes is_ready and execute methods mutually exclusive.

Fixes #1487.

It deletes the old "scheduled timers" mechanism, that has a long history of being error prone.

@ivanpauno ivanpauno added the bug Something isn't working label Sep 23, 2020
@ivanpauno ivanpauno self-assigned this Sep 23, 2020
@ivanpauno ivanpauno force-pushed the ivanpauno/fix-timer-multithreaded-executor-race branch from e78b165 to fb80881 Compare January 14, 2021 21:46
@ivanpauno ivanpauno removed the more-information-needed Further information is required label Jan 14, 2021
@ivanpauno
Copy link
Member Author

Round of CI testing everything above rclcpp:

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

Copy link
Member

@wjwwood wjwwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this breaks the Timer so that it cannot be run concurrently in a reentrant callback group with a multi-threaded executor. I think we're just trading one broken thing for another.

@ivanpauno
Copy link
Member Author

But this breaks the Timer so that it cannot be run concurrently in a reentrant callback group with a multi-threaded executor. I think we're just trading one broken thing for another.

I wouldn't consider that a bug, at most I would consider it a non optimal approach to the problem. Though really, I'm not sure if allowing to execute timer callbacks reentrantly is a good idea at all, but maybe it is (?).

On the other hand, #1487 is definitely buggy behavior.

But yes this PR isn't really "the right fix" as you can actually block another worker thread that took the same timer (ideally, if we would consider that reentrant execution of the same executable is not a good idea, a worker thread would never take the same executable than another and thus it wouldn't be blocked), but I don't see an easy solution here without major surgery of the executor code (and I'm not sure what that would be).

@fujitatomoya
Copy link
Collaborator

But yes this PR isn't really "the right fix" as you can actually block another worker thread that took the same timer

we cannot expect how long user callback takes. if user callback takes 5 seconds and if other threads are scheduled to execute the same timer (I think this is probable), these threads will be blocked for 5 seconds just waiting on the GenericTImer::execution_mutex_. in this case, executor cannot dispatch and process the other executables. (related to https://github.com/ros2/rclcpp/pull/1328/files#r499131414)

@wjwwood
Copy link
Member

wjwwood commented Jan 15, 2021

in this case, executor cannot dispatch and process the other executables

This is exactly the reason I came up with callback groups originally, because otherwise, when using a multi-threaded executor, the user would always have to synchronize their callbacks and that would lead to most threads in the pool just waiting on one of those mutexs, even though other things could be done.

@ivanpauno
Copy link
Member Author

Alternatively #1516 also fixes the issue.

@ivanpauno
Copy link
Member Author

Closing in favor of #1516

@ivanpauno ivanpauno closed this Jan 29, 2021
@ivanpauno ivanpauno deleted the ivanpauno/fix-timer-multithreaded-executor-race branch January 29, 2021 20:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
3 participants