-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use_sim_time with this->now() and callback inconsistency after #619 in Humble #635
Comments
Thanks for the bug report, @benjinne. I'm going to see if I can reproduce this. @MiguelCompany, have you seen anything like this? |
@Barry-Xu-2018 @iuhilnehc-ynos any idea? sound really critical issue. |
I can't reproduce this problem. # terminal 1 (add a test application that a publisher publishes '/clock' with `rosgraph_msgs.msg.Clock` starting from 1)
ros2 run demo_nodes_py timer
# terminal 2 (update the callback to call `this->now()`)
ros2 run demo_nodes_cpp talker --ros-args -p use_sim_time:=True
or
ros2 run demo_nodes_cpp talker ( use terminal 3 to call `ros2 param set /talker use_sim_time True` later )
# terminal 2 log
[INFO] [1664331704.865874420] [talker]: Publishing: 'Hello World: 1' time: 1664331704865865605
[INFO] [1664331705.865777557] [talker]: Publishing: 'Hello World: 2' time: 1664331705865774037
[INFO] [1664331706.865919796] [talker]: Publishing: 'Hello World: 3' time: 1664331706865913545
[INFO] [1664331707.865875875] [talker]: Publishing: 'Hello World: 4' time: 1664331707865871912
[INFO] [1664331708.865777110] [talker]: Publishing: 'Hello World: 5' time: 1664331708865773788
[INFO] [1664331709.865772521] [talker]: Publishing: 'Hello World: 6' time: 1664331709865769315
[INFO] [1664331710.866109847] [talker]: Publishing: 'Hello World: 7' time: 0
[INFO] [1664331711.865835457] [talker]: Publishing: 'Hello World: 8' time: 15000000000
[INFO] [1664331712.866043628] [talker]: Publishing: 'Hello World: 9' time: 16000000000
[INFO] [1664331713.865918251] [talker]: Publishing: 'Hello World: 10' time: 17000000000
...
[INFO] [1664331873.866060468] [talker]: Publishing: 'Hello World: 170' time: 177000000000 # check it more than 2 minutes
The backtrace correctly shows to get the time if using simulator time for the node. I have no idea what kind of situation could block calling the now(). now() backtrace
(gdb) bt
#0 rcl_get_ros_time (data=0x555555767760, current_time=0x7fffffffab68) at /home/chenlh/Projects/ROS2/ros2-master/src/ros2/rcl/rcl/src/rcl/time.c:71
ready to use `rcutils_atomic_load_uint64_t` get the `current` from `t->current_time`
#1 0x00007ffff6cabb9b in rcl_clock_get_now (clock=0x5555557676b0, time_point_value=0x7fffffffab68)
at /home/chenlh/Projects/ROS2/ros2-master/src/ros2/rcl/rcl/src/rcl/time.c:261
#2 0x00007ffff7bd32f7 in rclcpp::Clock::now (this=0x555555767690)
at /home/chenlh/Projects/ROS2/ros2-master/src/ros2/rclcpp/rclcpp/src/rclcpp/clock.cpp:74
#3 0x00007ffff7c3fe9a in rclcpp::Node::now (this=0x555555602c40)
at /home/chenlh/Projects/ROS2/ros2-master/src/ros2/rclcpp/rclcpp/src/rclcpp/node.cpp:544
#4 0x00007ffff4302dee in demo_nodes_cpp::Talker::Talker(rclcpp::NodeOptions const&)::{lambda()#1}::operator()() const (__closure=0x5555558a35c0)
at /home/chenlh/Projects/ROS2/ros2-master/src/ros2/demos/demo_nodes_cpp/src/topics/talker.cpp:48
#5 0x00007ffff430dbb6 in rclcpp::GenericTimer<demo_nodes_cpp::Talker::Talker(rclcpp::NodeOptions const&)::{lambda()#1}, (void*)0>::execute_callback_delegate<{lambda()#1}, (demo_nodes_cpp::Talker::Talker(rclcpp::NodeOptions const&)::{lambda()#1})0>() (this=0x5555558a3590)
at /home/chenlh/Projects/ROS2/ros2-master/install/include/rclcpp/rclcpp/timer.hpp:244
#6 0x00007ffff430d8be in rclcpp::GenericTimer<demo_nodes_cpp::Talker::Talker(rclcpp::NodeOptions const&)::{lambda()#1}, (void*)0>::execute_callback()
(this=0x5555558a3590) at /home/chenlh/Projects/ROS2/ros2-master/install/include/rclcpp/rclcpp/timer.hpp:230
#7 0x00007ffff7bf8bd7 in rclcpp::Executor::execute_timer (timer=warning: RTTI symbol not found for class 'std::_Sp_counted_ptr_inplace<rclcpp::WallTimer<demo_nodes_cpp::Talker::Talker(rclcpp::NodeOptions const&)::{lambda()#1}, (void*)0>, std::allocator<void*>, (__gnu_cxx::_Lock_policy)2>'
warning: RTTI symbol not found for class 'std::_Sp_counted_ptr_inplace<rclcpp::WallTimer<demo_nodes_cpp::Talker::Talker(rclcpp::NodeOptions const&)::{lambda()#1}, (void*)0>, std::allocator<void*>, (__gnu_cxx::_Lock_policy)2>'
std::shared_ptr<class rclcpp::TimerBase> (use count 3, weak count 1) = {...})
at /home/chenlh/Projects/ROS2/ros2-master/src/ros2/rclcpp/rclcpp/src/rclcpp/executor.cpp:650
#8 0x00007ffff7bf766e in rclcpp::Executor::execute_any_executable (this=0x7fffffffb750, any_exec=...)
at /home/chenlh/Projects/ROS2/ros2-master/src/ros2/rclcpp/rclcpp/src/rclcpp/executor.cpp:512
#9 0x00007ffff7c089ca in rclcpp::executors::SingleThreadedExecutor::spin (this=0x7fffffffb750)
at /home/chenlh/Projects/ROS2/ros2-master/src/ros2/rclcpp/rclcpp/src/rclcpp/executors/single_threaded_executor.cpp:37
#10 0x0000555555566318 in main (argc=4, argv=0x7fffffffc548)
at /home/chenlh/Projects/ROS2/ros2-master/build/demo_nodes_cpp/rclcpp_components/node_main_talker.cpp:62
|
@audrow can you reproduce this issue? |
I haven't been able to reproduce it. I tried for a good while yesterday. |
Anyone have tips on how I can debug a callback to see why this->now() gets stuck and keeps repeating the same value after a few minutes? I'm able to debug it with gdb but I don't know what to look for. |
@benjinne can you provide the reproducible test package on this?
if you attach the gdb on that process or get the core file, how about checking the stack trace on that stuck thread? |
@fujitatomoya I don't have a reproducible test package, but I was able to get the stacktrace of the node that has an issue with this->now() returning with the same repeated value even though the /clock topic is updating. I think thread 12 is listening to the clock topic and enters deadlock somehow stack trace
|
Edit, here's a better stacktrace with all of ros2 built with debug flags stack trace
|
@benjinne I think this might have been solved by eProsima/Fast-DDS#3194 and its corresponding backports (eProsima/Fast-DDS#3195) for Humble. Could you try? |
I'm having a hard time replicating the issue. Sometimes it takes over 10 minutes for it to happen and I have to leave my computer run for a while. I can give this a try and let it run for an hour and see what happens |
@MiguelCompany is there an easy way to check which fast-dds version is being used for a binary install of ros2? |
I believe those fastdds changes helped. Everything is running more smoothly and no crashes after running for an hour. I'd say it's fixed. |
Given the latest comments, I'm going to say that this is a duplicate of #650 , and close it as such. |
Bug report
Required Info:
Steps to reproduce issue
After #619 and #633 merged into humble, my simulation became very unreliable. It worked well before that PR, so I'm wondering if anyone else is having issues with it.
I don't have a good example yet, but with
use_sim_time
I have some callbacks that eventually stop updating about a minute after launching. Other times, in my callbackthis->now()
freezes and stops updating even though the /clock topic is always updating. For example I have a callback from one subscription which transforms a message into another format then grabsthis->now()
and publishes it into a new topic. Since the time stops updating on the output of that callback, it breaks my simulation.I've tried using various callback groups and the multi-threaded executor, but nothing seems to help.
Expected behavior
this->now() with use_sim_time should always update and never freeze
The text was updated successfully, but these errors were encountered: