-
Notifications
You must be signed in to change notification settings - Fork 422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SharedFuture from async_send_request never becomes valid #2039
Comments
Using the Intra-Process services and clients from #1847, it's possible to hot-fix the issue for single-process applications (i.e. enabling IPC on all clients/servers pairs). |
It's hanged at ...
node = rclcpp::Node::make_shared("client_node", opt);
m_nodes.push_back(node);
//m_node_executor->add_node(node);
m_client = node->create_client<std_srvs::srv::SetBool>("example_service");
...
do {
RCLCPP_INFO(m_nodes.front()->get_logger(), "Sending request: %d" , counter);
auto result = m_client->async_send_request(request);
RCLCPP_INFO(m_nodes.front()->get_logger(), "Waiting for response: %d" , counter);
if (rclcpp::spin_until_future_complete(m_nodes.back(), result) ==
rclcpp::FutureReturnCode::SUCCESS)
{
auto answer = result.get();
RCLCPP_INFO(m_nodes.front()->get_logger(), "Got response: %s", answer->message.c_str());
}
... |
The case makes sense to me. The reason why If I think that rclcpp/rclcpp/include/rclcpp/client.hpp Line 800 in 7c67851
|
@iuhilnehc-ynos |
@llapx #2044 looks good to me, as @iuhilnehc-ynos mentioned, this could happen in theory. just in case, can you make sure #2039 (comment) is the case for this issue with using gdb or additional logging message? i think we want to know the root cause before fixing the problem. |
Confirmed, and it works well (the test case passed, no hang happend).
Not yet, I will have a try. |
I retest it on humble branch, and it works well (both on latest rolling & humble branch): ...
100: [INFO] [1667963086.523220110] [server_node]: Received service client request... Sending response: 999999
100: [INFO] [1667963086.523367484] [server_node]: Got response: 999999
100: [ OK ] TestRegularService.test_regular_service (285302 ms)
100: [----------] 1 test from TestRegularService (285302 ms total)
100:
100: [----------] Global test environment tear-down
100: [==========] 1 test from 1 test suite ran. (285302 ms total)
100: [ PASSED ] 1 test.
100: -- run_test.py: return code 0
100: -- run_test.py: inject classname prefix into gtest result file '/work/rclcpp_issues_2039/build/rclcpp/test_results/rclcpp/test_regular_service.gtest.xml'
100: -- run_test.py: verify result file '/work/rclcpp_issues_2039/build/rclcpp/test_results/rclcpp/test_regular_service.gtest.xml'
1/1 Test #100: test_regular_service ............. Passed 285.49 sec
The following tests passed:
test_regular_service
100% tests passed, 0 tests failed out of 1
Label Time Summary:
gtest = 285.49 sec*proc (1 test)
Total Test time (real) = 286.08 sec
Finished <<< rclcpp [8min 4s]
Summary: 1 package finished [8min 4s]
My test env: ros:humble on docker |
Hi @llapx, |
I used default env of latest ros:humble docker (with some extra packages installed for compiling rclcpp):
|
@jefferyyjhsu, thanks for the confirmation to let me know there is still an issue for rmw_fastrtps. I did the test yesterday, and I can reproduce the issue using rmw_fastrtps in whatever the humble or rolling (I didn't use docker to test). From the #2039 (comment), I think that the issue for I am afraid that another bug exists in the https://github.com/eProsima/Fast-DDS/blob/cafd896e0e786e1af49f8f953e0843cc10780d29/src/cpp/fastdds/core/condition/WaitSetImpl.cpp#L159 doesn't hold https://github.com/eProsima/Fast-DDS/blob/cafd896e0e786e1af49f8f953e0843cc10780d29/src/cpp/fastdds/core/condition/WaitSetImpl.cpp#L131 might lose a condition notification when condition_variable is waiting for the A case such as https://github.com/eProsima/Fast-DDS/blob/cafd896e0e786e1af49f8f953e0843cc10780d29/src/cpp/fastdds/core/condition/WaitSetImpl.cpp#L123 with ret_val(false), and then there is a notification triggered at https://github.com/eProsima/Fast-DDS/blob/cafd896e0e786e1af49f8f953e0843cc10780d29/src/cpp/fastdds/core/condition/StatusConditionImpl.cpp#L90. A possible solution, diff --git a/src/cpp/fastdds/core/condition/WaitSetImpl.cpp b/src/cpp/fastdds/core/condition/WaitSetImpl.cpp
index 73390e35d..ecd74b735 100644
--- a/src/cpp/fastdds/core/condition/WaitSetImpl.cpp
+++ b/src/cpp/fastdds/core/condition/WaitSetImpl.cpp
@@ -68,7 +68,7 @@ ReturnCode_t WaitSetImpl::attach_condition(
// Should wake_up when adding a new triggered condition
if (is_waiting_ && condition.get_trigger_value())
{
- wake_up();
+ cond_.notify_one();
}
}
}
@@ -156,6 +156,7 @@ ReturnCode_t WaitSetImpl::get_conditions(
void WaitSetImpl::wake_up()
{
+ std::lock_guard<std::mutex> guard(mutex_);
cond_.notify_one();
}
could you help double-check if the issue still happened after applying the above patch for @MiguelCompany @fujitatomoya @llapx Do you have any suggestions? |
@iuhilnehc-ynos I think you may be right. The fix seems an easy one, but adding a regression test might be difficult. I'll try to think of something ... |
Thanks @iuhilnehc-ynos, I tested the proposed fix and the example now passes reliably. |
@iuhilnehc-ynos #2039 (comment) is deep, but i think the proposed change makes sense and actually fixes the problem reported by this issue. |
@iuhilnehc-ynos can you make PR for Fast-DDS? |
Based on #2039 (comment), I think @MiguelCompany will create a PR with a regression test for it by himself. |
@iuhilnehc-ynos Please open the PR, and we'll add the regression test on it |
Just a note: this seems to be the same issue reported here osrf/docker_images#628 |
I think we can close this since,
are merged now. please reopen the issue or another one if anything missing. |
Bug report
Required Info:
Steps to reproduce issue
Build and run the unit test below
Expected behavior
The unit test is expected to send an async request and a valid result will be retrieved with SharedFuture::get() later. The unit should repeat the process 1000000 times without issues.
Actual behavior
The unit test hangs randomly regardless of the underline DDS.
Example snippet:
The text was updated successfully, but these errors were encountered: