-
Notifications
You must be signed in to change notification settings - Fork 421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Callback not reached, using callback group and Fast-RTPS #1611
Comments
ros2/rclcpp#1611 Signed-off-by: Tomoya Fujita <[email protected]>
I can reproduce this issue with provided samples. as far as i checked, on service server's subscription, fast-dds receives the event and notify wait_set via condition variable. (saying fast-dds seems to be okay) |
The weird thing, though, is that the upper layers also seem to be doing the correct thing, as it works in CycloneDDS. So I'm a bit confused as to where the issue is. @MiguelCompany any thoughts on what might be happening here? |
Hi all, The main reason to cause this issue seems like the We expected that didn't do the work as we expected. The guard_condition created from What do you think about this? |
I have added experimental source code as below, and then the behavior of callback_group_ = this->create_callback_group(rclcpp::CallbackGroupType::MutuallyExclusive, false);
// <add a timer with this callback_group_ before running the `spin`>
// it just makes sure the rmw_wait can be triggered out to run the next cycle.
timer_ = this->create_wall_timer(
std::chrono::seconds(1),
[](){},
callback_group_
);
//callback_group_ = this->create_callback_group(rclcpp::CallbackGroupType::Reentrant, false);
callback_group_executor_thread_ = std::thread([this]() {
callback_group_executor_.add_callback_group(callback_group_, this->get_node_base_interface());
callback_group_executor_.spin();
}); |
I'll check whether there exist some other events that can be used. |
After diving into the source code, I think I made a mistake at #1611 (comment).
(Updated: use specific commit id instead of |
@iuhilnehc-ynos Thanks for the thorough checks! I guess changing this line to a |
Thanks for your reply. Actually, I have tried that before, but it didn't work.
Use
rmw_wait simple logic as follow,
|
I am considering if we could update GuardCondition as
|
right. this is taken care by main thread callback_group_executor_thread_ will be in sleep and never be triggered when main thread takes service request. but as far as i see, cyclonedds wakes up other threads at this time, so that subscription event can be taken by callback_group_executor_thread_. |
Let me make it more clear, the reason why the condition variable can't be triggered in callback_group_executor_thread_ is that detaching GuardCondition will set the mutex and condition_variable with |
The solution proposed on this comment may work. What bothers me is that getHasTriggered() will only return |
It would be great to have a new regression test to expose this problem. |
I'll add a test case for |
@wjwwood @MiguelCompany @fujitatomoya Could you help to review these two PRs(#1640, ros2/rmw_fastrtps#527)? |
Is this allowed by design? Judging from the RMW implementations, it isn't a supported use case. |
Actually, I thought about this before. |
I think it could become a supported case, but it should be considered a new feature. As with any new feature, the implications of adding it should be vetted and qualified, designs should be created/updated, and the community should come to an agreement on whether to support it or not. At this point, I read this issue like "this unsupported use case happens to work with one RMW implementation" more than "we should fix other implementations to support it". Hopefully someone else can chime in to confirm whether attaching a node to multiple executors is currently a supported use case or not. |
Using a single guard condition with two wait sets is not supported, but using two (or more) executors with |
I amended what I said. |
@wjwwood thank you for looking into my question.
Could an alternative solution to this issue be rclcpp creating multiple graph conditions for each node, one per executor associated with the node? This way RMW implementations would not have to support conditions attached to multiple waitsets. |
That's definitely not supported. If that's happening then it's a bug in |
If I understand it correctly, the solution outlined by @iuhilnehc-ynos earlier in this thread seems to be going in that direction, hence why I asked (see also ros2/rmw_connextdds#51 which implements it for |
Ah, well then I misunderstood the fix. It is documented (though all the
So the bug is likely in The (relatively) new |
|
Ah, it is clearly documented...i was thinking that this use case is okay, so i was expecting no constraints. @iuhilnehc-ynos
although this works, i guess that is an enhancement. so probably we would want to create another issue on this?
i am bit inclined to do this, since this is not designed or recommended. so at least it should detect this operation for now. |
I don't think this is a good solution, because it will require changes to the rmw API and it will be complicated because the creation and destruction of the graph guard conditions will need to be synchronized with the middleware which will need to trigger them from a different thread (most likely). Instead, I think we should solve this by changing how this works in rclcpp. I'm not convinced we need the node's graph guard condition anyways, if I understand the issue correctly. |
To add a new PR with a limitation (e.g. can not add a notify guard condition in one more executor.) for If somebody could continue to help him, I appreciate that. |
i'd like to confirm (or try to summarize) the current situation and my understanding , any other comments from anyone welcome!
me neither. if i am not mistaken, this sample code should be working.
i think the root cause is i was bit confused before, but above is my understanding now. |
Yes, I think that's ideal.
Switching to To move forward on this issue we should ensure:
We can do the first thing manually (make sure executors don't automatically use the node's sole graph guard condition) or by using the For the second point, we can do that separately from the first, and I think we can do that by just avoiding the node's graph guard condition (if that is the root cause of the issue). Again, I think this should be possible conceptually, but I haven't looked into the details yet. |
I agree about this. From #1611 (comment) and #1611 (comment), I am sorry that I can't find out how to notify the thread to run the next cycle if a thread enters into rmw_wait only for the 3 guard conditions (shutdown_guard_condition_ and interrupt_guard_condition_ of Executor, the notify_guard_condition of NodeBase). The following is my latest thought,
What do you think? |
friendly ping @wjwwood Could you give me some guidance or share a bit of advice on #1611 (comment) ? |
I was thinking, perhaps naively, that instead of a node having a notify guard condition, we could have a "notify" guard condition for each callback group. I think the interrupt guard condition in the executor should be left alone? It is used for the |
Thank you.
👍 , OK, I'll use this way.
Agreed. |
One more question, do you think the "notify_guard_condition" in the Node is still necessary?
|
I think that sounds right. You might run into some undesirable circular references, but we iterate if that's the case. I'd say, deprecate the notify guard condition of the node with the intention to remove it eventually. |
i confirmed that this has been addressed by #1640, i will go ahead to close this. |
Bug report
Required Info:
Steps to reproduce issue
Hello,
we are currently using CycloneDDS as RWM implementation, but we would need to use Fast-RTPS.
Using Fast-RTPS changes our ROS2 application behavior. I found out it is related to the callback groups, which we use a lot.
I tried to isolate the issue with 3 nodes that are quite simple:
Here's a small description of each:
/topic_bool
./topic_bool
. When a new message is received on this topic, the client calls the service/trigger_srv
and wait 10 seconds for a response. The service client is declared with a callback group (which is added to a dedicated spinning executor)/trigger_srv
subscribes to/topic_bool
when a new service request is received. When a new message is received on this topic, the server sends the service response. The subscription is declared with a callback group (which is added to a dedicated spinning executor)Then in 3 different shells I force Fast-RTPS as RMW implementation:
export RMW_IMPLEMENTATION=rmw_fastrtps_cpp
And I launch each node in a shell.
Expected behavior
To get the expected behavior, you can launch the nodes using CycloneDDS.
Here's what should happen:
/topic_bool
./trigger_srv
./topic_bool
./topic_bool
.Actual behavior
Here's what happens when using Fast-RTPS:
/topic_bool
./trigger_srv
./topic_bool
./topic_bool
./topic_bool
.Additional information
I tried to reproduce this issue under different conditions:
ROS_DOMAIN_ID
)Between each test, I made sure to do
ros2 daemon stop
.Under all these conditions, and all combinations of them, the issue can be reproduced.
Thanks for helping!
The text was updated successfully, but these errors were encountered: