Optimization of the SingleThreadedExecutor and MultiThreadedExecutor to reduce CPU overhead. #825

MartinCornelis2 · 2019-08-15T14:13:21Z

Feature request / Bug report

Optimization of the SingleThreadedExecutor and MultiThreadedExecutor to reduce CPU overhead.
Possible solutions:

Overhaul design of current implementation
- Work with callback groups instead of nodes
- Use events to notify executor of changes to system instead of constant checking with weak nodes

OR

Add a separate optimized Executor that is less dynamic (more user configuration)

Feature description

We did research on the large CPU overhead introduced by the Executor in rclcpp and found it to be a blocking issue for our ROS 2 platform. The results of the research can be found here: https://github.com/nobleo/ros2_performance . As a result of the research a discussion was started here:
https://discourse.ros.org/t/singlethreadedexecutor-creates-a-high-cpu-overhead-in-ros-2/10077 . People who joined the discussion also looked into the CPU overhead and came to the same conclusions. In our opinion changes to the Executor or alternatives to the current existing Executor are necessary for normal ROS 2 to run on embedded boards.

We are continuing our research by implementing a custom static Executor as a POC to get an idea of the maximum CPU gain that is possible. Code will be made available when we have something functional and the results of the research will be shared on our github.

Additionally the current implementation of the Executor has some weird effects that are discussed here https://t-blass.de/papers/response-time-analysis-of-ros2.pdf . The paper focuses more on how to measure response time than on the nature of the Executor. However, the paper does highlight the Executors peculiar preference for timers and the weird priority that is based on CallbackType and RegistrationTime.

The improvements that could be made to scheduling (that are required for real-time) and the optimization with respect to CPU overhead could/should both be encapsulated in a new and improved Executor.

Related issues

These issues need to be considered when redesigning the executor.
#392
#519
#726

Edit:
The author of the paper mentioned above raised this issue:
#633

Implementation considerations

Major changes to the Executor in rclcpp (and possibly to the node class and others) could break existing user code. Implementing changes will take a lot of effort and may also involve changes in other abstraction layers of ROS2.

iluetkeb · 2019-08-15T15:08:36Z

Work with callback groups instead of nodes

So far I have not seen data that points to nodes vs. callback groups being a decisive factor. Have I missed something?

Use events to notify executor of changes to system instead of constant checking with weak nodes

Here, you mean changes such as adding/removing subscribers, right?

MartinCornelis2 · 2019-08-15T15:39:24Z

So far I have not seen data that points to nodes vs. callback groups being a decisive factor. Have I missed something?

My problem with working with nodes is that a node has no meaning to the executor. The executor is concerned with callbacks and their priorities. The concept of callback groups is interesting, because they could imply a certain priority, nodes however do nothing (unless I'm missing something). This is more of a semantic issue or nuisance I'm having, so yes maybe for clarity I should have left this out.

Here, you mean changes such as adding/removing subscribers, right?

Yes this would prevent constant checking.

iluetkeb · 2019-08-15T16:25:46Z

My problem with working with nodes is that a node has no meaning to the executor.

Callbacks from different nodes can safely be executed in parallel.

I’m not aware that this is currently exploited anywhere, but just saying.

The same holds between callback groups, so those would be sufficient as well, if we take care to map each node to a different group.

ivanpauno · 2019-08-16T13:34:12Z

Work with callback groups instead of nodes

That would avoid this kind of logic:

rclcpp/rclcpp/src/rclcpp/memory_strategy.cpp

Lines 20 to 49 in 25f1969

    
           rclcpp::SubscriptionBase::SharedPtr 
        
           MemoryStrategy::get_subscription_by_handle( 
        
             std::shared_ptr<const rcl_subscription_t> subscriber_handle, 
        
             const WeakNodeList & weak_nodes) 
        
           { 
        
             for (auto & weak_node : weak_nodes) { 
        
               auto node = weak_node.lock(); 
        
               if (!node) { 
        
                 continue; 
        
               } 
        
               for (auto & weak_group : node->get_callback_groups()) { 
        
                 auto group = weak_group.lock(); 
        
                 if (!group) { 
        
                   continue; 
        
                 } 
        
                 for (auto & weak_subscription : group->get_subscription_ptrs()) { 
        
                   auto subscription = weak_subscription.lock(); 
        
                   if (subscription) { 
        
                     if (subscription->get_subscription_handle() == subscriber_handle) { 
        
                       return subscription; 
        
                     } 
        
                     if (subscription->get_intra_process_subscription_handle() == subscriber_handle) { 
        
                       return subscription; 
        
                     } 
        
                   } 
        
                 } 
        
               } 
        
             } 
        
             return nullptr; 
        
           }

If all the nodes are only using the default callback group, there's not much performance benefit of working with callback groups. But I agree the change is a good idea.

There's some duplication of data between memory_strategy and wait_set that can be affecting performance. What wait_for_work method is doing:

If we would avoid this back and forth between entities in memory_strategy and wait_set (be just using directly the entities in wait_set) we will probable improve performance.

Another idea. We're collecting entities from all the nodes in each iteration. If instead of that, we would collect them once and have a way of notify the executor that an entity was added (by storing a shared_ptr of the executor in the nodes for example), we could improve the performance a little bit more. We will still need to copy the entities once, because after waiting the entities that were not notified are cleared (but copying is less expensive that collecting).

Signed-off-by: Ivan Santiago Paunovic <[email protected]>

ivanpauno mentioned this issue Aug 16, 2019

Delete unnecessary call for get_node_by_group #823

Merged

ishugoel20 mentioned this issue Sep 25, 2019

Added static single threaded executor functionality #873

Closed

jacobperron mentioned this issue Dec 3, 2019

RCL Executor and convenience functions ros2/rcl#534

Closed

nnmm pushed a commit to ApexAI/rclcpp that referenced this issue Jul 9, 2022

Update maintainers (ros2#825)

0ca8c61

Signed-off-by: Ivan Santiago Paunovic <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimization of the SingleThreadedExecutor and MultiThreadedExecutor to reduce CPU overhead. #825

Optimization of the SingleThreadedExecutor and MultiThreadedExecutor to reduce CPU overhead. #825

MartinCornelis2 commented Aug 15, 2019 •

edited

Loading

iluetkeb commented Aug 15, 2019 •

edited

Loading

MartinCornelis2 commented Aug 15, 2019

iluetkeb commented Aug 15, 2019

ivanpauno commented Aug 16, 2019

Optimization of the SingleThreadedExecutor and MultiThreadedExecutor to reduce CPU overhead. #825

Optimization of the SingleThreadedExecutor and MultiThreadedExecutor to reduce CPU overhead. #825

Comments

MartinCornelis2 commented Aug 15, 2019 • edited Loading

Feature request / Bug report

Feature description

Related issues

Implementation considerations

iluetkeb commented Aug 15, 2019 • edited Loading

MartinCornelis2 commented Aug 15, 2019

iluetkeb commented Aug 15, 2019

ivanpauno commented Aug 16, 2019

MartinCornelis2 commented Aug 15, 2019 •

edited

Loading

iluetkeb commented Aug 15, 2019 •

edited

Loading