[rmw_connextdds] New RMW discovery options #108

mxgrey · 2023-03-10T10:09:26Z

This PR updates rmw_connextdds to be compatible with the upcoming discovery options feature for RMW. (Note that the final agreed upon requirements matrix is the one in this comment).

This PR depends on these upstream PRs:

With this PR, the requirements matrix for the new discovery options is almost fully satisfied. There is just one aspect that is not working correctly, which I would appreciate some help with from Connext DDS experts: It seems that Connext DDS participants are not accepting unknown initial peers even though accept_unknown_peers is being set to true. As a result, the Same Host matrix is 100% passing while the Other Host matrix looks like this:

🆗: Required behavior is passing
🔴: Required behavior is failing

Different Hosts			Node B setting
			No static peer			With static peer
			Off	Localhost	Subnet	Off	Localhost	Subnet
Node A setting	No static peer	Off	🆗	🆗	🆗	🆗	🆗	🆗
		Localhost	🆗	🆗	🆗	🆗	🔴	🔴
		Subnet	🆗	🆗	🆗	🆗	🔴	🆗
	With static peer	Off	🆗	🆗	🆗	🆗	🆗	🆗
		Localhost	🆗	🔴	🔴	🆗	🆗	🆗
		Subnet	🆗	🔴	🆗	🆗	🆗	🆗

The gist of the current problem is:

when two endpoints are on different hosts
and one or both endpoints are in LOCALHOST mode
and neither endpoint is in OFF mode
and only one endpoint declares the other as a static peer

-> then the required behavior is for them to discover each other and connect, but they currently do not. Every other scenario is working correctly.

From my reading of the Connext DDS documentation, I believe the current implementation is supposed to work since accept_unknown_peers is set to true, but perhaps there is a detail that I'm overlooking.

One observation from watching packets with wireshark is that when there's a LOCALHOST + static peer <=> LOCALHOST + no static peer connection, the endpoint that has a static peer is sending DATA(p) participant information packets while the endpoint without a static peer is never sending the DATA(p) packets (..even though it does send various ACKNACK and HEARTBEAT packets). I'm happy to provide the wireshark data if that would be helpful.

Signed-off-by: Michael X. Grey <[email protected]>

rmw_connextdds_common/src/common/rmw_context.cpp

rmw_connextdds_common/src/common/rmw_impl.cpp

rmw_connextdds_common/src/ndds/dds_api_ndds.cpp

asorbini

Thank you for this contribution, and apologies for the late reply. Things look good, except for a few minor changes (mostly "coding style" and missing error checking).

I'll review and suggest a solution to the problems with the failing cases of the matrix chart next.

rmw_connextdds_common/src/common/rmw_context.cpp

rmw_connextdds_common/src/common/rmw_impl.cpp

rmw_connextdds_common/src/ndds/dds_api_ndds.cpp

asorbini · 2023-03-29T06:41:51Z

The gist of the current problem is:
* when two endpoints are on different hosts

* and one or both endpoints are in `LOCALHOST` mode

* and neither endpoint is in `OFF` mode

* and only one endpoint declares the other as a static peer
-> then the required behavior is for them to discover each other and connect, but they currently do not. Every other scenario is working correctly.

From my reading of the Connext DDS documentation, I believe the current implementation is supposed to work since accept_unknown_peers is set to true, but perhaps there is a detail that I'm overlooking.

I've been reviewing the requirements, your implementation, and thinking about the problem. At first I thought I had spotted an issue, and started thinking about alternative implementations, but after going down that rabbit hole for a little bit, I realized I was wrong.

Your solution is the correct way to prevent the default multicast discovery from occurring while still allowing unicast communication to occur, and I have verified it with a simple hello world for Connext 6.0.1.

discovery_options_tests.tar.gz (I have only modified ShapeType_publisher.c and ShapeType_subscriber.c to customize dds.transport.UDPv4.builtin.parent.allow_multicast_interfaces_list).

I tried running the publisher on one host and the subscriber on a different one connected via LAN: by default, the processes don't discover each other, but if one declares the other's ip via NDDS_DISCOVERY_PEERS, then communication occurs as expected.

I will continue my testing by building the branch and observing the rmw's behavior.

asorbini · 2023-03-30T06:21:22Z

@mxgrey @clalancette I have identified the issue and implemented a solution here (the commit should apply directly on top of your branch).

Basically, the problem stems from the interaction between variable NDDS_DISCOVERY_PEERS and Connext's multicast receive address, explained in this section of the Connext manual. Quoting the comment I added in the code:

Connext looks at variable NDDS_DISCOVERY_PEERS to determine whether it should add a multicast locator to the set of locators used for discovery. If this variable is empty, or if it contains at least one multicast address, a multicast locator is used for discovery (the default 239.255.0.1 group when the variable is empty, the first multicast locator found in the list when one or more multicast addresses are present).
Because of this, we must make sure that NDDS_DISCOVERY_PEERS contains some value to prevent this default behavior, otherwise the participant will be announcing the default multicast group, which in turn will cause ROS_STATIC_PEERS not to work correctly

My proposed solution is to detect whether the variable is empty when the LOCALHOST discovery policy is used, and set it to some value (127.0.0.1 seems like a safe value) so that Connext will not automatically add the multicast address. The variable must be set before the DomainParticipantFactory is initialized, which forced me to move that initialization from rmw_api_connextdds_ini to rmw_context_t::initialize_node.

Let me know if this fixes the failing test cases in the matrix, and I'll keep an eye out for updates on this PR to have it merged.

…ERY_RANGE is LOCALHOST Signed-off-by: Andrea Sorbini <[email protected]>

wjwwood · 2023-03-30T19:42:07Z

I cherry-picked your commit @asorbini, I'm still working on some of my own changes, then I'll ping you again. Thanks!

Signed-off-by: William Woodall <[email protected]>

wjwwood · 2023-03-30T20:52:56Z

Ok, I think I'm finished addressing feedback, @asorbini can you have another look when you have time?

asorbini

Looks good, I think it can be merged with a green run of CI.

I only had a question about whether we should accept invalid range enums, which I'd like to address, and two other minor things with DDS_String_alloc (which could stay as is, but it would be great to clean up).

rmw_connextdds_common/src/common/rmw_impl.cpp

rmw_connextdds_common/src/ndds/dds_api_ndds.cpp

asorbini · 2023-03-30T23:26:30Z

@wjwwood I realized that I was under the wrong impression that only NDDS_DISCOVERY_PEERS will affect the determination of whether a multicast receive address is announced or not, whereas just setting initial_peers will not achieve the same effect.

After a bit more thinking, I believe they are equivalent, and I'd like to try reimplementing the feature with that strategy.

The reason for this is that it would be a cleaner implementation that could potentially allow two DomainParticipants to be created with different discovery options. This is not a problem at the moment, because the options are process-/ROS 2 context-/DDS factory-wide, but the change would likely make the implementation more future-proof in case we change things in the future.

It would also remove the new dependency from rcutils_set_env, which isn't much of a problem per se (since we already depend on rcutils_get_env), but "less dependencies"/"the same number of dependencies" is always a better scenario than "one more dependency" in my book.

The change would require moving back the participant factory initialization to rmw_connextdds_init, and checking if ctx->initial_peers is empty after asserting the static peers, and if so, filling it with "127.0.0.1" (when policy is LOCALHOST).

I can prepare the code and link a commit to cherry-pick, if that's ok with you. I should have it ready by the end of the day, and we can test/merge the PR tomorrow.

I'll start preparing it, while I wait for your input.

wjwwood · 2023-03-31T00:16:04Z

After a bit more thinking, I believe they are equivalent, and I'd like to try reimplementing the feature with that strategy.

That is fine with me.

It would also remove the new dependency from rcutils_set_env, which isn't much of a problem per se (since we already depend on rcutils_get_env), but "less dependencies"/"the same number of dependencies" is always a better scenario than "one more dependency" in my book.

Also fine with me.

I can prepare the code and link a commit to cherry-pick, if that's ok with you. I should have it ready by the end of the day, and we can test/merge the PR tomorrow.

Sounds good to me, thank you for pushing on it so quickly! I'll wait to re-test the matrix of use cases until you have this.

asorbini · 2023-03-31T05:12:20Z

@wjwwood After further testing, turns out I was right the first time around, and we must set NDDS_DISCOVERY_PEERS for the multicast receive address not to be used. Setting only initial_peers is not enough.

So I left the implementation as is, but I took the opportunity to consolidate all the changes into rmw_context.cpp so that they can be shared by rmw_connextddsmicro too (and we can think about deprecating RMW_CONNEXT_INITIAL_PEERS in favor of ROS_STATIC_PEERS in the near future).

I committed the changes ~~here~~ in this commit.

While at it, I also addressed those two comments I had about DDS_String_alloc already taking into account the nul terminator, and actually got rid of one of those two calls by remembering that DDS_String_dup exists (and so we don't need to use DDS_String_alloc+memcpy).

I also run the linter/uncrustify and fixed a couple of issues. Should be good for testing.

asorbini · 2023-03-31T17:41:24Z

I was just told that it might be possible to prevent the multicast receive behavior by resetting DomainParticipantQos::discovery::multicast_receive_address directly.

It might be a better idea, because of similar reasons I explained earlier (work at the participant level, instead of factory, don't mess with env vars., etc...).

I apologize for the back and forth, but I'm going to take a quick look into implementing this. While the current implementation works, it does have the problem that it prevents a unit test from initializing two contexts with two different policies, which might be a problem.

asorbini · 2023-03-31T18:18:47Z

I applied and tested the change, and it does work as expected.

@wjwwood can you apply these two commits on top of your branch and run CI/test matrix?

b156654 (the one I created yesterday, which refactors the logic to a separate function)
f5df932 (the one from this morning, which resets multicast_receive_addresses)

The commits are the two most recent ones from this branch.

Signed-off-by: Andrea Sorbini <[email protected]>

…EERS Signed-off-by: Andrea Sorbini <[email protected]>

Signed-off-by: William Woodall <[email protected]>

wjwwood · 2023-03-31T19:04:55Z

I cherry-picked those and update it to not accept NOT_SET explicitly. I also added a doc note about this and we're bringing fastrtps and cyclone up to that too: ros2/rmw@2dbd6ef

Signed-off-by: William Woodall <[email protected]>

asorbini

Just a minor touch up, otherwise I think we're looking good

rmw_connextdds_common/src/common/rmw_context.cpp

wjwwood · 2023-03-31T21:21:34Z

I'm currently debugging an issue with the pr in it's current state. It's failing to create a node when using RMW_IMPLEMENTATION=rmw_connextdds ROS_AUTOMATIC_DISCOVERY_RANGE=LOCALHOST ROS_STATIC_PEERS=127.0.0.1.

asorbini · 2023-03-31T22:05:30Z

I'm currently debugging an issue with the pr in it's current state. It's failing to create a node when using RMW_IMPLEMENTATION=rmw_connextdds ROS_AUTOMATIC_DISCOVERY_RANGE=LOCALHOST ROS_STATIC_PEERS=127.0.0.1.

I'm not able to reproduce on my side. I'm rebuilding the tree after pulling all repositories to see if that makes a difference (looks like I was missing a few of the most recent commits).

Signed-off-by: William Woodall <[email protected]>

wjwwood · 2023-03-31T23:57:36Z

We figured out the issue and fixed it in 4bafd73. Thanks for your help @asorbini!

Signed-off-by: William Woodall <[email protected]>

asorbini

Good to go with green CI

Signed-off-by: Shane Loretz <[email protected]>

asorbini · 2023-04-04T18:07:06Z

rmw_connextdds_common/src/common/rmw_context.cpp

+        "[email protected]://127.0.0.1",
+        "[email protected]://",
+      };
+      const auto rc2 = rmw_connextdds_extend_initial_peer_list(


These peers should be added to the list only if it has not been already customized by the user in the environment variable and/or XML Qos, otherwise they might shadow custom values they configured

We could also leave it as is, but we should document the behavior in the README so users are aware that these peers are always added when the LOCALHOST policy is used.

I would prefer the "smarter" approach because so far, the rmw does not have any "hard-coded" QoS setting that cannot be somehow disabled and controlled via XML QoS by users. I'd like these peers to not disrupt that trend, by supporting either having the rmw not assert them if the peers have already been customized, or have the user explicitly disable them if they need to (i.e. through a new env var... but we have many already, and I'd rather not add another one).

Thats a good point. The current strategy for enabling users to control these settings is to set the range to RMW_AUTOMATIC_DISCOVERY_RANGE_SYSTEM_DEFAULT. It looks like maybe line 242 of this PR should be else if (ctx->discovery_options->automatic_discovery_range != ..._SYSTEM_DEFAULT) so that we don't try to extend the peer list in that case.

Skipped modifying initial peers list when automatic discovery range is SYSTEM_DEFAULT in c501a76

Signed-off-by: Shane Loretz <[email protected]>

mxgrey added 4 commits March 9, 2023 00:34

Use the new discovery params

e59a43d

Signed-off-by: Michael X. Grey <[email protected]>

Update to latest rmw API

43002e6

Signed-off-by: Michael X. Grey <[email protected]>

Fix typo

e2ff364

Signed-off-by: Michael X. Grey <[email protected]>

Fix memory management

4738aba

Signed-off-by: Michael X. Grey <[email protected]>

This was referenced Mar 10, 2023

Warns if new DDS OOBE settings are used with Connext #107

Closed

ROS 2: Improve the discovery methods used by nodes to prevent conflicts and accidental discoveries RobotLocomotion/drake-ros#183

Closed

clalancette assigned wjwwood Mar 23, 2023

Merge branch 'rolling' into discovery-peers-specification

d550c7d

wjwwood mentioned this pull request Mar 28, 2023

Improve handling of dynamic discovery ros2/ros2#1359

Closed

8 tasks

wjwwood reviewed Mar 28, 2023

View reviewed changes

asorbini requested changes Mar 29, 2023

View reviewed changes

sloretz changed the title ~~New RMW discovery options~~ [rmw_connextdds] New RMW discovery options Mar 29, 2023

Make sure NDDS_DISCOVERY_PEERS is not empty when ROS_AUTOMATIC_DISCOV…

dedf94b

…ERY_RANGE is LOCALHOST Signed-off-by: Andrea Sorbini <[email protected]>

wjwwood added 8 commits March 30, 2023 13:03

fix up finalize logic

5ae99e9

Signed-off-by: William Woodall <[email protected]>

typo

77768a8

Signed-off-by: William Woodall <[email protected]>

check the return code of ensure_length()

2d52d8f

Signed-off-by: William Woodall <[email protected]>

fixup use of memcpy when copying a string

9aef6d1

Signed-off-by: William Woodall <[email protected]>

improve formatting of domain_tag string

a492c17

Signed-off-by: William Woodall <[email protected]>

use c++17 [[fallthrough]] attribute

43e6382

Signed-off-by: William Woodall <[email protected]>

change style of switch statement

b4c9e9c

Signed-off-by: William Woodall <[email protected]>

undo change to request c++17 vs 14 since we're not using fallthrough

2d32141

Signed-off-by: William Woodall <[email protected]>

wjwwood requested a review from asorbini March 30, 2023 20:53

asorbini approved these changes Mar 30, 2023

View reviewed changes

rmw_connextdds_common/src/common/rmw_impl.cpp Outdated Show resolved Hide resolved

rmw_connextdds_common/src/ndds/dds_api_ndds.cpp Outdated Show resolved Hide resolved

rmw_connextdds_common/src/ndds/dds_api_ndds.cpp Outdated Show resolved Hide resolved

asorbini and others added 3 commits March 31, 2023 11:46

Handle all discovery options in rmw_context.cpp

d5c59dc

Signed-off-by: Andrea Sorbini <[email protected]>

Reset multicast_receive_addresses instead of setting NDDS_DISCOVERY_P…

cf370ca

…EERS Signed-off-by: Andrea Sorbini <[email protected]>

error when range is NOT_SET

84238ad

Signed-off-by: William Woodall <[email protected]>

fixup review comments

84db854

Signed-off-by: William Woodall <[email protected]>

asorbini requested changes Mar 31, 2023

View reviewed changes

rmw_connextdds_common/src/common/rmw_context.cpp Outdated Show resolved Hide resolved

rmw_connextdds_common/src/common/rmw_context.cpp Show resolved Hide resolved

fix a zero initialization issue

4bafd73

Signed-off-by: William Woodall <[email protected]>

remove redundant break statement

e3a5c8f

Signed-off-by: William Woodall <[email protected]>

asorbini approved these changes Apr 1, 2023

View reviewed changes

sloretz added 2 commits April 1, 2023 04:25

Init discovery_options when initializing node options

8a4fbfd

Signed-off-by: Shane Loretz <[email protected]>

Set maximum participant ID to 32 on localhost

5c56565

Signed-off-by: Shane Loretz <[email protected]>

asorbini reviewed Apr 4, 2023

View reviewed changes

sloretz added 3 commits April 4, 2023 21:06

Skipping adding peers when automatic_discovery_range is SYSTEM_DEFAULT

c501a76

Signed-off-by: Shane Loretz <[email protected]>

Merge branch 'rolling' into discovery-peers-specification

b74dda7

Try solving los of precision warning on Windows

a054497

Signed-off-by: Shane Loretz <[email protected]>

wjwwood approved these changes Apr 7, 2023

View reviewed changes

sloretz added 2 commits April 7, 2023 13:24

Fix windows warning

ef89f78

Signed-off-by: Shane Loretz <[email protected]>

Merge branch 'rolling' into discovery-peers-specification

429b63d

sloretz merged commit 022cc46 into ros2:rolling Apr 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[rmw_connextdds] New RMW discovery options #108

[rmw_connextdds] New RMW discovery options #108

mxgrey commented Mar 10, 2023 •

edited

Loading

asorbini left a comment

asorbini commented Mar 29, 2023

asorbini commented Mar 30, 2023

wjwwood commented Mar 30, 2023

wjwwood commented Mar 30, 2023

asorbini left a comment

asorbini commented Mar 30, 2023

wjwwood commented Mar 31, 2023

asorbini commented Mar 31, 2023 •

edited

Loading

asorbini commented Mar 31, 2023

asorbini commented Mar 31, 2023

wjwwood commented Mar 31, 2023

asorbini left a comment

wjwwood commented Mar 31, 2023

asorbini commented Mar 31, 2023

wjwwood commented Mar 31, 2023

asorbini left a comment

asorbini Apr 4, 2023

asorbini Apr 4, 2023

sloretz Apr 4, 2023

sloretz Apr 4, 2023

[rmw_connextdds] New RMW discovery options #108

[rmw_connextdds] New RMW discovery options #108

Conversation

mxgrey commented Mar 10, 2023 • edited Loading

asorbini left a comment

Choose a reason for hiding this comment

asorbini commented Mar 29, 2023

asorbini commented Mar 30, 2023

wjwwood commented Mar 30, 2023

wjwwood commented Mar 30, 2023

asorbini left a comment

Choose a reason for hiding this comment

asorbini commented Mar 30, 2023

wjwwood commented Mar 31, 2023

asorbini commented Mar 31, 2023 • edited Loading

asorbini commented Mar 31, 2023

asorbini commented Mar 31, 2023

wjwwood commented Mar 31, 2023

asorbini left a comment

Choose a reason for hiding this comment

wjwwood commented Mar 31, 2023

asorbini commented Mar 31, 2023

wjwwood commented Mar 31, 2023

asorbini left a comment

Choose a reason for hiding this comment

asorbini Apr 4, 2023

Choose a reason for hiding this comment

asorbini Apr 4, 2023

Choose a reason for hiding this comment

sloretz Apr 4, 2023

Choose a reason for hiding this comment

sloretz Apr 4, 2023

Choose a reason for hiding this comment

mxgrey commented Mar 10, 2023 •

edited

Loading

asorbini commented Mar 31, 2023 •

edited

Loading