Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce costmap_2d_ros nodes #2489

Merged
merged 2 commits into from
Jun 2, 2022

Conversation

gezp
Copy link
Contributor

@gezp gezp commented Aug 4, 2021

Signed-off-by: zhenpeng ge [email protected]


Basic Info

Info Please fill out this column
Ticket(s) this addresses (tickets: #816)
Primary OS tested on (Ubuntu20.04, ROS 2 Galactic)
Robotic platform tested on (gazebo simulation of turtlebot3)

Description of contribution in a few bullet points

As described in #816, rclcpp_node_ and clinet_node_ in class Costmap2DROS need be removed, you can find more details(other nodes which need be removed) in #816 (comment).

  • clinet_node_ isn't used, so just remove it simplily.
  • rclcpp_node_ is used for TransformListerner and MessageFilter , as same as Reduce amcl_node nodes #2483 , so just use shared_from_this() instead of rclcpp_node_.
  • rclcpp_node_ is also used by costmap plugins and filters (only ObstacleLayer).
    • the API of Layer::initialize() is changed: remove the arguments client_node which is unused and replace rclcpp_node with callback_group
    • use dedicated callback_group to create ros subscribers in all costmap plugins and filters.

Test

colcon test

colcon test --packages-select nav2_costmap_2d

# Summary: 206 tests, 2 errors, 2 failures, 0 skipped
# these errors and failures have existed before.

navigation demo test (turtlebot3 in gazebo)

  • success

Future work that may be required in bullet points

the work way of costmap plugins and filters are not changed, and all costmap plugins and filters only use a dedicated callback_group (a default callback_group in rclcpp_node_ is used before this PR) for subscribers, but there may be a potential problem in this way.

the different way can be considered for parallel subscribers and no potential deadlock in costmap plugins and filters:

  • a dedicated callback_group with single thread executor
  • a dedicated callback_group with multi thread executor
  • a bunch of dedicated callback_group with single thread executor for each costmap plugins and filters (isolated)

For Maintainers:

  • Check that any new parameters added are updated in navigation.ros.org
  • Check that any significant change is added to the migration guide
  • Check that any new functions have Doxygen added
  • Check that any new features have test coverage
  • Check that any new plugins is added to the plugins page
  • If BT Node, Additionally: add to BT's XML index of nodes for groot, BT package's readme table, and BT library lists

@SteveMacenski
Copy link
Member

SteveMacenski commented Aug 4, 2021

This seems good, but are there no changes that need to be made to the lower level costmap layers / subscribers / executors? I almost can't believe this just 'works' with all of those layers creating subscribers to potentially N different sensor feeds, crunching numbers in parallel, with no problems.

Keep in mind when developing on costmap_2d that this is a vector of plugins and each are running in parallel with background threads to process data. Think of it like there are 100+ of these layers in use. That is obviously ridiculously large, but it should help you think about how to isolate them working so that it can support many layers. Having 5-10 layers running is totally not out of the question. I don't have hard data on this point, but in my experience, having ~7 running is pretty typical and usually half or more of those are processing high bandwidth data like 30fps depth images or laser scans.

This is in contrast to something like a planner, where we can also have N planners in the planner server, but each planner is sitting idle when not in use so its not a particular concern. (surely its possible, but that's a road to cross when its worth crossing)

If/when you make changes to the costmap layers, make sure to update any on the lists

As these are the 'supported' / mature layers that get alot of use

@gezp
Copy link
Contributor Author

gezp commented Aug 4, 2021

are there no changes that need to be made to the lower level costmap layers / subscribers / executors?

yes, i don't create new callback group, executor or thread in costmap.

i'm not sure whether these costmap layers or fillters are independent with each other (a layer won't wait for another layer). at least, in turtlebot3 navigation demo, there's no deadlock, we can run navigation successfully.

if we create many layers with high bandwidth data, we maybe have delays (refer to #2456 (comment)) because we only have a single thread executor to handle these callback.

if we create a single-threaded executor for each layers, it's overhead if we have 100+ layers. how about using multi-threaded executor for costmap ? because maybe some of layers are idle and some of layers are busy.

@SteveMacenski
Copy link
Member

SteveMacenski commented Aug 6, 2021

I think it might be good if you did a quick analysis of the costmap layers in the stack (+ STVL) to see how they used the input rclcpp node / client nodes before as they were given and what new issues might arise with these changes. I think at least one of those nodes was exclusively reserved for the layers and now the single node is also being shared with the costmapROS's implementation, TF, and potentially more. Also where's the spin being controlled if we wanted to potentially make each layer independent. Basically, lets do an analysis here of our options and what the status quo is today.

I am slightly surprised not to see any updates to the static layer - that also has a subscriber. Maybe it used the normal node, but I didn't look into it.

@wjwwood would you recommend having each costmap layer having a separate executor or having a single executor shared among all of them? In the BT, we made each node have their own separate one and that seems like the most clean way from a designer's perspective. But I'm uncertain about the amount of overhead that might contribute to. Do you have a sense about this?

@gezp
Copy link
Contributor Author

gezp commented Aug 6, 2021

I look at source code of class Layer and it's derived class. In class Layer, there are node_ , rclcpp_node_, client_node_ , which could be used to create subscriptions/service/action.

Costmap Layer:

  • StaticLayer
    • use node_ to create 2 subscriptions: map_sub_ (OccupancyGrid) and map_update_sub_ (OccupancyGridUpdate).
  • ObstacleLayer
    • use rclcpp_node_ to create multiple MessageFilter's Subscribers (LaserScan or PointCloud2) , i'm not sure the difference with subscription of rclcpp.
  • VoxelLayer (inherit ObstacleLayer)
  • RangeSensorLayer
    • use node_ to create multiple subscriptions (sensor_msgs::msg::Range)
  • SpatioTemporalVoxelLayer
    • use rclcpp_node_ to create service (spatio_temporal_voxel_layer::srv::SaveGrid)
    • use rclcpp_node_ to create multiple MessageFilter's Subscribers (LaserScan or PointCloud2)

Costmap Filter:

  • KeepoutFilter
    • use node_ to create 1 subscription (CostmapFilterInfo)
  • SpeedFilter
    • use node_ to create 1 subscription (CostmapFilterInfo)

@SteveMacenski
Copy link
Member

SteveMacenski commented Aug 6, 2021

Got it, that's exactly what I meant. I think we should pause for @wjwwood to give us his thoughts on the executors since that will influence the direction we should go. I'm leaning towards each layer having its own executor to manage its own subscriptions with this model -- but only if the overhead wouldn't be super high. If it is, maybe we have them all share a single callback group / executor like we do now and address it if it becomes a problem in the future.

There's been a sync with rolling, retriggering CI

@gezp
Copy link
Contributor Author

gezp commented Aug 7, 2021

I think the case is a bit different from the BT. the executors of each BTNode use spin_some() one by one, we only need a dedicated thread to run these (tick()), but the layers in Costmap2DROS need run in parallel. i have a idea about this:

  • share a single callback group / executor for all layers, and if some layers need a separated thread, we just create a single callback group / executor with dedicated thread for them, because not all of layers need a separated thread to handle ros callback.

but i don't know which layer need a separated callback group / executor. currently, maybe i could create a separated callback group / executor for layer which use rclcpp_node_.

anyway, i think we need a separated callback group / executor with dedicated thread when there are deadlocks or delays in our node. (currently i couldn't find a deadlock in Costmap2DROS )

There's been a sync with rolling, retriggering CI

i need wait for docker image ros:rolling , beacuse the CI uses docker to set environment

@wjwwood
Copy link
Contributor

wjwwood commented Aug 7, 2021

@wjwwood would you recommend having each costmap layer having a separate executor or having a single executor shared among all of them? In the BT, we made each node have their own separate one and that seems like the most clean way from a designer's perspective. But I'm uncertain about the amount of overhead that might contribute to. Do you have a sense about this?

I don't think the overhead of each executor should be too significant, or at least it shouldn't be exceptionally more than just the overhead of a thread. That being said I have not tried to measure the overhead of the executor in terms of memory. There will be some CPU overhead, most likely, but again it's hard for me to say if approaches a concerning amount or not.

Ideally, you wouldn't have more threads than cores on the CPU, since at the end of the day the threads share the CPU cores. Using a thread should be to block on I/O or to keep long running tasks isolated. If you designed you callbacks to be as quick as possible (good idea) then a thread-pool is likely the best option. The thread pool could be just a Multi-threaded executor and N costmap layers, each with their own callback group. This approach works well if the callbacks in the costmap are decoupled, i.e. they don't access a shared (and therefore synchronized) resource. Things that access a shared resource should ideally be the same mutually exclusive callback group, because otherwise they will just be sitting in their own executor thread waiting for the resource. That makes the overhead of that waiting thread basically all overhead.

Sorry, I'm not sure if I'm explaining it well, but basically I'd say: more executors is not insanely expensive (probably) but if multiple callbacks are using the same shared resource they should absolutely be in the same callback group (and therefore for the same executor) for maximum efficiency.

@SteveMacenski
Copy link
Member

I think you explained it well. I would be interested to know what the overhead in terms of memory the executor is, but if the CPU overhead is just a thread, that element's a good trade off. But if that is information that isn't readily available, I think we can proceed optimistically. It's never been my intent to focus too much on the embedded use-cases for Nav2 since compute resources are only getting faster and with more {cores, ram, etc}. The usual 'raspberry pi with a lidar' educational demo won't need 5+ costmap layers, so they should be served fine either way as the most simple case.

None of the callbacks should be accessing any shared resources. In costmap2D, they mostly just buffer the data for the update() loop to then trigger the actual updates of the internal costmaps / master costmap on a timer. I think the only exception to that is the static layer but it doesn't impact the master costmap, which is the only truly shared resource, until that update().

Since I like to know my options, I think from this discussion we have the following options:

    1. Use a single node and kind of expect that everything will work out OK with lower number of costmap layers (PR now)
    1. Use a single node, but look at some of the layers we think are going to be high frequency and separate those out only as having a separate executor -- for example obstacle/voxel might have 30fps pointclouds or laser scans, but static might have a single massive message come in every once in awhile (@gezp suggestion)
    1. Separate every layer as a separate executor (my initial gut instinct)
    1. Use multithreaded executor set to roughly the number of layers (question: should it be exactly the number of layers, or a parameter? TDB) (@wjwwood suggestion)

Generally I like "complete" solutions, things that we can decide on now and handle bulletproof any situation someone could throw at it so that we don't just kick the can down the road (even at the expense of some performance). To me, that leaves options 3 and 4, but I'd be open to 2 just given the small number of plugins we have in costmap_2d package.

I would default to William's suggestion. That generally sounds like the best answer to me, I just stopped thinking about multithreaded executors when we decided against that route for other PRs in this project. But I think costmap_2d is a pretty unique case and that makes alot of sense to me. That would also give examples of all the major ways of working with executors in Nav2: multithreaded, a bunch of independent executors, single server with single node which would make a nice reference for the future.

@wjwwood
Copy link
Contributor

wjwwood commented Aug 9, 2021

Use multithreaded executor set to roughly the number of layers (question: should it be exactly the number of layers, or a parameter? TDB) (@wjwwood suggestion)

Actually, I would use a Multi-threaded executor and let it set the number of threads which defaults to the number of cores on your CPU. Not one thread per costmap. Because even if you have five things that could be running at the same time (so therefore five threads) but you only have four cores, one of them will be waiting anyway. The only reason to have more threads than cores is if you have long running tasks that run in parallel and sync with each other. In this case we have relatively short lived tasks which are not syncing with each other, so using fewer threads in a thread pool is generally better.

@SteveMacenski
Copy link
Member

SteveMacenski commented Aug 9, 2021

Got it, sounds good, thanks for the clarification. @gezp do you like this direction?

@gezp
Copy link
Contributor Author

gezp commented Aug 10, 2021

Use multithreaded executor set to roughly the number of layers (question: should it be exactly the number of layers, or a parameter? TDB) (@wjwwood suggestion)

I like this option, for the developers of costmap layer, them needn't consider whether or not to use a separate thread, let multi threaded executor to schedule them automatically. and a separate callback group is necessary for each layers, because they don't access a shared resource in their callback. i will update the PR later.

@SteveMacenski
Copy link
Member

Let us know when we should take a look and review!

@gezp
Copy link
Contributor Author

gezp commented Aug 10, 2021

I have updated this PR, please review again.

@SteveMacenski
Copy link
Member

I'm putting this on the backburner until we come to a conclusion about the AMCL PR since they have some related elements and I don't want to distribute the conversation across multiple PRs

@mergify
Copy link
Contributor

mergify bot commented Oct 15, 2021

This pull request is in conflict. Could you fix it @gezp?

@ros-navigation ros-navigation deleted a comment from mergify bot Oct 15, 2021
@ros-navigation ros-navigation deleted a comment from mergify bot Nov 8, 2021
@SteveMacenski
Copy link
Member

@gezp where are we on this? Can you ping the ticket / PR that is blocking this work from being completed? I really don't want this to continue sitting around when it could be resolved

@gezp gezp force-pushed the reduce_costmap2d_ros_nodes branch from 5d87b5c to b590666 Compare May 27, 2022 11:32
@gezp
Copy link
Contributor Author

gezp commented May 27, 2022

the description of this PR was updated, and i create a dedicated callback_group to replace rclcpp_node, just remove client_node which is unused, but the numerous parallel layers problem isn't solved in this PR. we could discuss this problem in another ticket.

SteveMacenski
SteveMacenski approved these changes May 27, 2022
@SteveMacenski
Copy link
Member

SteveMacenski commented May 27, 2022

Now that all of them are being serviced by callback_group_, doesn't that change the behavior from before, where some where being run by node (main node) and others rclcpp_node_ (spawned internal node)? Before, there was a bit of a split so it wasn't all on the same thread.

I think this is what you mention about the numerous parallel layers problem? Maybe each obstacle layer (and by extension voxel layer) should have its own callback group / executor? What do you think.

Though, rclcpp_node_ was used more exclusively on the obstacle layers pulling in larger / regular sensor data. Interestingly, you set

      auto filter = std::make_shared<tf2_ros::MessageFilter<sensor_msgs::msg::PointCloud2>>(
        *sub, *tf_, global_frame_, 50,
        node->get_node_logging_interface(),
        node->get_node_clock_interface(),
        tf2::durationFromSec(transform_tolerance));

So node non-callback_group is being used for the filter, but the callback group is used for the subscriber

      auto sub = std::make_shared<message_filters::Subscriber<sensor_msgs::msg::PointCloud2,
          rclcpp_lifecycle::LifecycleNode>>(node, topic, custom_qos_profile, sub_opt);

Alternatively, we could make each costmap layer have its own executor. That feels like that might be approaching excessive, but I'm not sure what else we could do to guarantee that data from LayerA isn't being blocked by use of LayersC-F. I suppose though that all of the layers (in open source) just buffer data in their callbacks, except for the static layer, so that shouldn't be blocked too badly.

Thinking about it, ROS 1 would have only had a single thread as well and I don't recall that being problematic (right @DLu?) with many subscribers / costmap layers. So sharing all 1 executor might be fine and I'm overdesigning.

@gezp
Copy link
Contributor Author

gezp commented May 28, 2022

Now that all of them are being serviced by callback_group_, doesn't that change the behavior from before, where some where being run by node (main node) and others rclcpp_node_ (spawned internal node)? Before, there was a bit of a split so it wasn't all on the same thread.

yes, the behavior from before is not changed. rclcpp_node_ was just replaced with callback_group_ (to create subscribers)

I think this is what you mention about the numerous parallel layers problem? Maybe each obstacle layer (and by extension voxel layer) should have its own callback group / executor? What do you think.

There may be a potential deadlock problem when we use numerous layers which create numerous subscribers. but any way, it works well now.

To solve this potential problem, we can use isolated executor for each layers or multi thread executor (like component container for multiple composed nodes), which could also improve parallel performance.

I think this is a new problem we should discuss in another ticket, it's not easy to solve.

So sharing all 1 executor might be fine and I'm overdesigning

yes, i'm also not sure about this problem.

@SteveMacenski
Copy link
Member

SteveMacenski commented May 31, 2022

So if we say that we don't think that having all of the layers on the same executor is a problem (which I think its not; given how current layers are implemented + ROS 1 have everything on a single thread) is there anything blocking this PR?

It sounds like we're going to kick adding individual callback groups / executors per-layer down the road until there's an actual issue that needs to be resolved. I suspect it'll come up sometime in the future, but I don't think we need to proactively over design for something that a user would have experienced before this PR and in ROS 1 anyway.

@gezp
Copy link
Contributor Author

gezp commented Jun 1, 2022

is there anything blocking this PR?

nothing, this PR is completed.

I suspect it'll come up sometime in the future, but I don't think we need to proactively over design for something that a user would have experienced before this PR and in ROS 1 anyway.

i agree, we can ignore it now, and solve this problem when it occurs

@SteveMacenski
Copy link
Member

Please pull in main / rebase, there are tests failing that I want to see gone before merging

…of inner node

Signed-off-by: zhenpeng ge <[email protected]>

update comment

Signed-off-by: zhenpeng ge <[email protected]>
@gezp gezp force-pushed the reduce_costmap2d_ros_nodes branch from fe68af7 to 972c433 Compare June 2, 2022 01:18
@SteveMacenski
Copy link
Member

SteveMacenski commented Jun 2, 2022

The obstacle test in costmap 2D has filed the last 2 CI runs, I retriggered a 3rd time just to make sure, but there seems to be a problem.

Edit: Still failing

@gezp gezp force-pushed the reduce_costmap2d_ros_nodes branch from eb3c794 to d7ab41c Compare June 2, 2022 05:15
@gezp
Copy link
Contributor Author

gezp commented Jun 2, 2022

The obstacle test in costmap 2D has filed the last 2 CI runs

sorry, this is my mistake. I reproduced this failed result on my local computer by multiple tests, and found the test TestNode.testDynParamsSetStatic failed.

i compare modified codes with before, i find i add some unnecessary modifications. in fact, only ObstacleLayer which used rclcpp_node_ before need dedicated callback group, and other layers which used default callback group before needn't change.

After updated, i tested nav2_costmap_2d on my local computer multiple times, and there was no failures.

@SteveMacenski
Copy link
Member

No worries!

@SteveMacenski SteveMacenski merged commit 28ce7e8 into ros-navigation:main Jun 2, 2022
@SteveMacenski
Copy link
Member

SteveMacenski commented Jun 2, 2022

Again, huge thanks for your help here, this is really fantastic and your work here has had a massive improvement on my own understanding of callback groups and executors and it was a great opportunity to further improve the code quality of Nav2 to be the best practices for all of ROS 2 code. This is really nice to have in finally and thanks very much for sticking with it and contributing to rclcpp and TF! You never know the places you go in open source 😉

@SteveMacenski
Copy link
Member

I think the last bit here is to remove the option for creating an extra client node in the lifecycle node in nav2_utils!

@jwallace42
Copy link
Contributor

This PR seems to break drive heading, back up and assisted teleop. Specifically, it seems to break the CostmapTopicCollisionChecker.

I will look into this tomorrow and hopefully provide some more information! :)

@gezp
Copy link
Contributor Author

gezp commented Jun 14, 2022

This PR seems to break drive heading, back up and assisted teleop. Specifically, it seems to break the CostmapTopicCollisionChecker.

sorry, i tested this on ubuntu20.04 with some modifications and it worked well, then i updated my local environment and test on ubuntu22.04 locally, there were some weird problems. but i'm not sure why and i will spend some time to handle it. @jwallace42

redvinaa pushed a commit to EnjoyRobotics/navigation2 that referenced this pull request Jun 30, 2022
* use a dedicated callback group for costmap plugin and filter instead of inner node

Signed-off-by: zhenpeng ge <[email protected]>

update comment

Signed-off-by: zhenpeng ge <[email protected]>

* remove unnecessary modifications

Signed-off-by: zhenpeng ge <[email protected]>
@doisyg
Copy link
Contributor

doisyg commented Jul 12, 2022

Nice work! Just came across this PR as I am in the process of switching to Humble.
As planned, some STVL adaptations are required after this PR, I tried to to make them (without testing them yet) there : SteveMacenski/spatio_temporal_voxel_layer@aba99df
@gezp can you check the basic logic of my commit ?

@marpeja
Copy link

marpeja commented Aug 31, 2022

@gezp
How do you do this change as callback_group isn't declared before in the scope?

the API of Layer::initialize() is changed: remove the arguments client_node which is unused and replace rclcpp_node with callback_group

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants