Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimization of the SingleThreadedExecutor and MultiThreadedExecutor to reduce CPU overhead. #825

Open
MartinCornelis2 opened this issue Aug 15, 2019 · 4 comments

Comments

@MartinCornelis2
Copy link
Contributor

MartinCornelis2 commented Aug 15, 2019

Feature request / Bug report

Optimization of the SingleThreadedExecutor and MultiThreadedExecutor to reduce CPU overhead.
Possible solutions:

  • Overhaul design of current implementation
    • Work with callback groups instead of nodes
    • Use events to notify executor of changes to system instead of constant checking with weak nodes

OR

  • Add a separate optimized Executor that is less dynamic (more user configuration)

Feature description

We did research on the large CPU overhead introduced by the Executor in rclcpp and found it to be a blocking issue for our ROS 2 platform. The results of the research can be found here: https://github.com/nobleo/ros2_performance . As a result of the research a discussion was started here:
https://discourse.ros.org/t/singlethreadedexecutor-creates-a-high-cpu-overhead-in-ros-2/10077 . People who joined the discussion also looked into the CPU overhead and came to the same conclusions. In our opinion changes to the Executor or alternatives to the current existing Executor are necessary for normal ROS 2 to run on embedded boards.

We are continuing our research by implementing a custom static Executor as a POC to get an idea of the maximum CPU gain that is possible. Code will be made available when we have something functional and the results of the research will be shared on our github.

Additionally the current implementation of the Executor has some weird effects that are discussed here https://t-blass.de/papers/response-time-analysis-of-ros2.pdf . The paper focuses more on how to measure response time than on the nature of the Executor. However, the paper does highlight the Executors peculiar preference for timers and the weird priority that is based on CallbackType and RegistrationTime.

The improvements that could be made to scheduling (that are required for real-time) and the optimization with respect to CPU overhead could/should both be encapsulated in a new and improved Executor.

Related issues

These issues need to be considered when redesigning the executor.
#392
#519
#726

Edit:
The author of the paper mentioned above raised this issue:
#633

Implementation considerations

Major changes to the Executor in rclcpp (and possibly to the node class and others) could break existing user code. Implementing changes will take a lot of effort and may also involve changes in other abstraction layers of ROS2.

@iluetkeb
Copy link
Contributor

iluetkeb commented Aug 15, 2019

Work with callback groups instead of nodes

So far I have not seen data that points to nodes vs. callback groups being a decisive factor. Have I missed something?

Use events to notify executor of changes to system instead of constant checking with weak nodes

Here, you mean changes such as adding/removing subscribers, right?

@MartinCornelis2
Copy link
Contributor Author

So far I have not seen data that points to nodes vs. callback groups being a decisive factor. Have I missed something?

My problem with working with nodes is that a node has no meaning to the executor. The executor is concerned with callbacks and their priorities. The concept of callback groups is interesting, because they could imply a certain priority, nodes however do nothing (unless I'm missing something). This is more of a semantic issue or nuisance I'm having, so yes maybe for clarity I should have left this out.

Here, you mean changes such as adding/removing subscribers, right?

Yes this would prevent constant checking.

@iluetkeb
Copy link
Contributor

My problem with working with nodes is that a node has no meaning to the executor.

Callbacks from different nodes can safely be executed in parallel.

I’m not aware that this is currently exploited anywhere, but just saying.

The same holds between callback groups, so those would be sufficient as well, if we take care to map each node to a different group.

@ivanpauno
Copy link
Member

Work with callback groups instead of nodes

That would avoid this kind of logic:

rclcpp::SubscriptionBase::SharedPtr
MemoryStrategy::get_subscription_by_handle(
std::shared_ptr<const rcl_subscription_t> subscriber_handle,
const WeakNodeList & weak_nodes)
{
for (auto & weak_node : weak_nodes) {
auto node = weak_node.lock();
if (!node) {
continue;
}
for (auto & weak_group : node->get_callback_groups()) {
auto group = weak_group.lock();
if (!group) {
continue;
}
for (auto & weak_subscription : group->get_subscription_ptrs()) {
auto subscription = weak_subscription.lock();
if (subscription) {
if (subscription->get_subscription_handle() == subscriber_handle) {
return subscription;
}
if (subscription->get_intra_process_subscription_handle() == subscriber_handle) {
return subscription;
}
}
}
}
}
return nullptr;
}

If all the nodes are only using the default callback group, there's not much performance benefit of working with callback groups. But I agree the change is a good idea.


There's some duplication of data between memory_strategy and wait_set that can be affecting performance. What wait_for_work method is doing:

If we would avoid this back and forth between entities in memory_strategy and wait_set (be just using directly the entities in wait_set) we will probable improve performance.


Another idea. We're collecting entities from all the nodes in each iteration. If instead of that, we would collect them once and have a way of notify the executor that an entity was added (by storing a shared_ptr of the executor in the nodes for example), we could improve the performance a little bit more. We will still need to copy the entities once, because after waiting the entities that were not notified are cleared (but copying is less expensive that collecting).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants