Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-robot navigation (using multi_tb3_simulation_launch.py) does not work [Groot monitoring] #2386

Closed
tiffanyyk opened this issue Jun 2, 2021 · 12 comments · Fixed by #2417

Comments

@tiffanyyk
Copy link

tiffanyyk commented Jun 2, 2021

Bug report

Required Info:

  • Operating System:
    • Docker + Ubuntu 20.04
  • ROS2 Version:
    • foxy binaries
  • Version or commit hash:
    • 0.4.7-1focal.20210423.031543
  • DDS implementation:
    • Fast DDS

Steps to reproduce issue

Run the command:

ros2 launch nav2_bringup multi_tb3_simulation_launch.py

In both of the rviz windows that open:

  • Click Startup
  • Select the 2D pose estimates
  • Select navigation goals

Expected behavior

  • Pressing Startup should successfully start the nav2 stack for both robots
  • Both robots navigate to their respective goals

Actual behavior

  • For whichever robot whose nav2 stack is started second, the following error occurs when clicking Startup (lines 685 to 690 in the attached file):
[bt_navigator-12] [ERROR] [1622663296.721143000] []: Caught exception in callback for transition 10
[bt_navigator-12] [ERROR] [1622663296.721190600] []: Original error: Address already in use
[bt_navigator-12] [WARN] [1622663296.721239400] []: Error occurred while doing error handling.
[bt_navigator-12] [FATAL] [1622663296.721258700] [robot1.bt_navigator]: Lifecycle node bt_navigator does not have error state implemented
[lifecycle_manager-14] [ERROR] [1622663296.722826500] [robot1.lifecycle_manager_navigation]: Failed to change state for node: bt_navigator
[lifecycle_manager-14] [ERROR] [1622663296.722865600] [robot1.lifecycle_manager_navigation]: Failed to bring up all requested nodes. Aborting bringup.
  • When selecting the goal for this robot, the following error occurs (line 703 in the attached file):
[rviz2-4] [ERROR] [1622663305.191578000] [robot1.rviz2]: Goal was rejected by server

Additional information

The terminal output is attached here:
log_multi_tb3_sim.txt

@SteveMacenski
Copy link
Member

PRs would be appreciated.


[nav2_gazebo_spawner-2] Traceback (most recent call last):
[nav2_gazebo_spawner-2]   File "/opt/ros/foxy/lib/python3.8/site-packages/rclpy/__init__.py", line 216, in spin_until_future_complete
[nav2_gazebo_spawner-2]     executor.add_node(node)
[nav2_gazebo_spawner-2] AttributeError: 'float' object has no attribute 'add_node'
[nav2_gazebo_spawner-2] 
[nav2_gazebo_spawner-2] During handling of the above exception, another exception occurred:
[nav2_gazebo_spawner-2] 
[nav2_gazebo_spawner-2] Traceback (most recent call last):
[nav2_gazebo_spawner-2]   File "/opt/ros/foxy/lib/nav2_gazebo_spawner/nav2_gazebo_spawner", line 11, in <module>
[nav2_gazebo_spawner-2]     load_entry_point('nav2-gazebo-spawner==0.3.0', 'console_scripts', 'nav2_gazebo_spawner')()
[nav2_gazebo_spawner-2]   File "/opt/ros/foxy/lib/python3.8/site-packages/nav2_gazebo_spawner/nav2_gazebo_spawner.py", line 102, in main
[nav2_gazebo_spawner-2]     rclpy.spin_until_future_complete(node, future, args.timeout)
[nav2_gazebo_spawner-2]   File "/opt/ros/foxy/lib/python3.8/site-packages/rclpy/__init__.py", line 219, in spin_until_future_complete
[nav2_gazebo_spawner-2]     executor.remove_node(node)
[nav2_gazebo_spawner-2] AttributeError: 'float' object has no attribute 'remove_node'
[ERROR] [nav2_gazebo_spawner-2]: process has died [pid 8010, exit code 1, cmd '/opt/ros/foxy/lib/nav2_gazebo_spawner/nav2_gazebo_spawner --robot_name robot1 --robot_namespace robot1 --turtlebot_type waffle -x 0.0 -y 0.5 -z 0.01 --ros-args'].

Looking at your log file, it probably has nothing to do with the log snippet added above

@tiffanyyk
Copy link
Author

@SteveMacenski This shouldn't be a gazebo problem for a number of reasons:

  • The world and robots appear in gazebo as expected
  • The problems with nav2 still occur when I replace gazebo with another robotics simulator
  • When using multi_tb3_simulation_launch.py with only one robot in this line, the errors do not occur.

@SteveMacenski
Copy link
Member

SteveMacenski commented Jun 4, 2021

There's clearly a major error there -- I'd look into resolving that first. The lifecycle error is likely in response to not completing other lifecycle transitions or having errors itself as a direct result of this. It would probably not be worth looking into the lifecycle error until this was resolved since there is a high likelihood of correlation.

@simonchamorro
Copy link

I fixed the nav2_gazebo_spawner error here. The timeout arg was being mixed with the executor arg. However, the behavior described by the original issue persists. Only one nav stack gets initialized correctly, the second lifecycle manager is unable to initialize the nav stack.

[bt_navigator-23] [ERROR] [1622831491.969550100] []: Original error: Address already in use
[bt_navigator-23] [WARN] [1622831491.969580800] []: Error occurred while doing error handling.
[bt_navigator-23] [FATAL] [1622831491.969592600] [robot2.bt_navigator]: Lifecycle node bt_navigator does not have error state implemented
[lifecycle_manager-25] [ERROR] [1622831491.978392500] [robot2.lifecycle_manager_navigation]: Failed to change state for node: bt_navigator
[lifecycle_manager-25] [ERROR] [1622831491.978540200] [robot2.lifecycle_manager_navigation]: Failed to bring up all requested nodes. Aborting bringup.

I will continue looking into this. @SteveMacenski Any insight on what might be causing the Original error: Address already in use ? Here is my log file.

log.txt

@SteveMacenski
Copy link
Member

SteveMacenski commented Jun 4, 2021

Please submit a PR to update that in main, thanks for finding that! As you can see, I don't make use of the multi-robot features to catch that.

That error has some history in the ROS2 community if you google it, but I don't see a clear answer. I'd start with looking at the list of ros2 nodes and ros2 topics / srvs / actions and see if any are colliding (e.g. are their nodes or network objects not correctly namespaced under robot1 or robot2?) If so, then its probably a collision and that's DDS's way of telling you that you have 2 things with the same name/type/id on the network and its unhappy. It may be possible we made some changes by accident with leading / on topics or node initializations in the global namespace that are unhappy in multirobot situations. They should be very easy fixes once you know what they are.

@simonchamorro
Copy link

The only thing I notice when inspecting topics is that robot2 has a /robot2/waypoint_follower and a /robot2/waypoint_follower_rclcpp_node, whereas robot1 only has a /robot2/waypoint_follower. I suspect there could be an issue in the on_configure function of the waypoint follower, but I don't fully understand the role of the generated rclcpp node. Any input on this?

ros2 node list output: nodes.txt

@SteveMacenski
Copy link
Member

SteveMacenski commented Jun 4, 2021

I analyzed the list and have the same conclusion. Looking at another example of a client node in costmap:

  auto options = rclcpp::NodeOptions().arguments(
      
      
            {"--ros-args", "-r", std::string("__node:=") + get_name() + "_client", "--"});
      
      
          client_node_ = std::make_shared&lt;rclcpp::Node&gt;("_", options);

https://github.com/ros-planning/navigation2/blob/ac531a251610ca4e9b080da939e252a556425b64/nav2_costmap_2d/src/costmap_2d_ros.cpp

It looks like the WP follower is missing some of the fields for namespacing properly. Try updating the waypoint follower client node inputs as the costmap_2d format, that will probably resolve.

In particular the extra "", field in the call is suspect to me. That's probably a namespace parameter that's being set to empty string that would make it global.

@tiffanyyk
Copy link
Author

I'm not sure the waypoint follower has to do with the Original error: Address already in use problem in bt_navigator. In navigation_launch.py, the waypoint_follower comes after bt_navigator. Additionally, when bt_navigator has an error, it shows Failed to bring up all requested nodes. Aborting bringup. (refer to the first snippet in original post) I'm guessing this is why for the second robot /robot2/waypoint_follower_rclcpp_node is not in the list of nodes.

I've also tried switching the order of the nodes in the launch file so that waypoint_follower is configured before bt_navigator. It looks like waypoint_follower configures without problems for both robots and in ros2 node list I see /namespace/waypoint_follower and /namespace/waypoint_follower_rclcpp_node for both robot1 and robot2. However, the issue with bt_navigator persists.

Also, having a look at the code for the rclcpp node, it seems to be in the same format as the costmap code.

auto options = rclcpp::NodeOptions().arguments(    
  {"--ros-args",      
    "-r", std::string("__node:=") + get_name() + "_rclcpp_node",      
    "--"});  
// Support for handling the topic-based goal pose from rviz  
client_node_ = std::make_shared&lt;rclcpp::Node&gt;("_", options);

Any ideas on how to proceed?

@AlexKaravaev
Copy link

I think I've narrowed down the issue, when enable_groot_monitoring: false set to false in bt_navigator params, both robots are navigating without the problem.

I haven't looked further yet, but I guess error arise somewhere here
https://github.com/ros-planning/navigation2/blob/c6294d534b2f5e4179237aa8ac88e1339c3c62d4/nav2_behavior_tree/src/behavior_tree_engine.cpp#L87-L97

Maybe it's conflicting ports/addresses, I don't know for sure.

@SteveMacenski
Copy link
Member

SteveMacenski commented Jun 8, 2021

Ah I see, I think I have an understanding of the issue then. Groot monitoring only allows 1 zero-mq object per process (which was the root of another implicit issue in the stack). I don't see why it would have an issue with multiple processes, but its possible that with the multi-robot case, the ports are in collision with each other. You could try to change the port IDs to resolve. However, this is one of a few different issues that the live groot monitoring is causing us, so I'd also be open to a PR to simply remove this capability. I don't think many, if any, users rely on it and it has caused us a number of issues.

Live BT monitoring was a "nice to have" but not if it conflicts with "need to have" features like the ability to spawn multiple robot instances on 1 machine.

@SteveMacenski SteveMacenski changed the title Multi-robot navigation (using multi_tb3_simulation_launch.py) does not work Multi-robot navigation (using multi_tb3_simulation_launch.py) does not work [Groot monitoring] Jun 9, 2021
@SteveMacenski
Copy link
Member

@tiffanyyk @AlexKaravaev Does this PR resolve your issues? #2409

@SteveMacenski
Copy link
Member

SteveMacenski commented Jun 18, 2021

#2417

I believe that the fix here is to set the groot parameters for each robot to be different. That would be the fix so that you can utilize groot in multirobot situations. However by default enabling it, they will collide with each other.

As such, I have decided to turn groot off by default, so that it is only fired on when requested by a user. In that situation, then they'd be conciously adding the enable_groot_monitoring param to their yam files to see the 2 port parameters as well. It would be clear to an engineer that if you have 2 robots you can't both use the same networking ports so they they should be changed in the 2 respective parameter files.

Closing this issue with the merging of #2417 because it implements your solution, we understand the problem (groot was on using the same ports for 2 instances) and the solution is to either disable if you don't want them or make sure your yaml file uses different ports for the robots so there is no collision.

Thank you for reporting this issue and I will make sure to document this on navigation.ros.org for future consumption!

Edit: docs PR ros-navigation/docs.nav2.org@8cbed5e

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants