-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[rmw_connextdds] New RMW discovery options #108
Conversation
Signed-off-by: Michael X. Grey <[email protected]>
Signed-off-by: Michael X. Grey <[email protected]>
Signed-off-by: Michael X. Grey <[email protected]>
Signed-off-by: Michael X. Grey <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this contribution, and apologies for the late reply. Things look good, except for a few minor changes (mostly "coding style" and missing error checking).
I'll review and suggest a solution to the problems with the failing cases of the matrix chart next.
I've been reviewing the requirements, your implementation, and thinking about the problem. At first I thought I had spotted an issue, and started thinking about alternative implementations, but after going down that rabbit hole for a little bit, I realized I was wrong. Your solution is the correct way to prevent the default multicast discovery from occurring while still allowing unicast communication to occur, and I have verified it with a simple hello world for Connext 6.0.1. discovery_options_tests.tar.gz (I have only modified I tried running the publisher on one host and the subscriber on a different one connected via LAN: by default, the processes don't discover each other, but if one declares the other's ip via I will continue my testing by building the branch and observing the rmw's behavior. |
@mxgrey @clalancette I have identified the issue and implemented a solution here (the commit should apply directly on top of your branch). Basically, the problem stems from the interaction between variable
My proposed solution is to detect whether the variable is empty when the Let me know if this fixes the failing test cases in the matrix, and I'll keep an eye out for updates on this PR to have it merged. |
…ERY_RANGE is LOCALHOST Signed-off-by: Andrea Sorbini <[email protected]>
I cherry-picked your commit @asorbini, I'm still working on some of my own changes, then I'll ping you again. Thanks! |
Signed-off-by: William Woodall <[email protected]>
Signed-off-by: William Woodall <[email protected]>
Signed-off-by: William Woodall <[email protected]>
Signed-off-by: William Woodall <[email protected]>
Signed-off-by: William Woodall <[email protected]>
Signed-off-by: William Woodall <[email protected]>
Signed-off-by: William Woodall <[email protected]>
Signed-off-by: William Woodall <[email protected]>
Ok, I think I'm finished addressing feedback, @asorbini can you have another look when you have time? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, I think it can be merged with a green run of CI.
I only had a question about whether we should accept invalid range enums, which I'd like to address, and two other minor things with DDS_String_alloc
(which could stay as is, but it would be great to clean up).
@wjwwood I realized that I was under the wrong impression that only After a bit more thinking, I believe they are equivalent, and I'd like to try reimplementing the feature with that strategy. The reason for this is that it would be a cleaner implementation that could potentially allow two DomainParticipants to be created with different discovery options. This is not a problem at the moment, because the options are process-/ROS 2 context-/DDS factory-wide, but the change would likely make the implementation more future-proof in case we change things in the future. It would also remove the new dependency from The change would require moving back the participant factory initialization to I can prepare the code and link a commit to cherry-pick, if that's ok with you. I should have it ready by the end of the day, and we can test/merge the PR tomorrow. I'll start preparing it, while I wait for your input. |
That is fine with me.
Also fine with me.
Sounds good to me, thank you for pushing on it so quickly! I'll wait to re-test the matrix of use cases until you have this. |
@wjwwood After further testing, turns out I was right the first time around, and we must set So I left the implementation as is, but I took the opportunity to consolidate all the changes into I committed the changes While at it, I also addressed those two comments I had about I also run the linter/uncrustify and fixed a couple of issues. Should be good for testing. |
I was just told that it might be possible to prevent the multicast receive behavior by resetting It might be a better idea, because of similar reasons I explained earlier (work at the participant level, instead of factory, don't mess with env vars., etc...). I apologize for the back and forth, but I'm going to take a quick look into implementing this. While the current implementation works, it does have the problem that it prevents a unit test from initializing two contexts with two different policies, which might be a problem. |
I applied and tested the change, and it does work as expected. @wjwwood can you apply these two commits on top of your branch and run CI/test matrix?
The commits are the two most recent ones from this branch. |
Signed-off-by: Andrea Sorbini <[email protected]>
…EERS Signed-off-by: Andrea Sorbini <[email protected]>
Signed-off-by: William Woodall <[email protected]>
I cherry-picked those and update it to not accept |
Signed-off-by: William Woodall <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a minor touch up, otherwise I think we're looking good
I'm currently debugging an issue with the pr in it's current state. It's failing to create a node when using |
I'm not able to reproduce on my side. I'm rebuilding the tree after pulling all repositories to see if that makes a difference (looks like I was missing a few of the most recent commits). |
Signed-off-by: William Woodall <[email protected]>
Signed-off-by: William Woodall <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to go with green CI
Signed-off-by: Shane Loretz <[email protected]>
Signed-off-by: Shane Loretz <[email protected]>
"[email protected]://127.0.0.1", | ||
"[email protected]://", | ||
}; | ||
const auto rc2 = rmw_connextdds_extend_initial_peer_list( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These peers should be added to the list only if it has not been already customized by the user in the environment variable and/or XML Qos, otherwise they might shadow custom values they configured
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also leave it as is, but we should document the behavior in the README so users are aware that these peers are always added when the LOCALHOST policy is used.
I would prefer the "smarter" approach because so far, the rmw does not have any "hard-coded" QoS setting that cannot be somehow disabled and controlled via XML QoS by users. I'd like these peers to not disrupt that trend, by supporting either having the rmw not assert them if the peers have already been customized, or have the user explicitly disable them if they need to (i.e. through a new env var... but we have many already, and I'd rather not add another one).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats a good point. The current strategy for enabling users to control these settings is to set the range to RMW_AUTOMATIC_DISCOVERY_RANGE_SYSTEM_DEFAULT
. It looks like maybe line 242 of this PR should be else if (ctx->discovery_options->automatic_discovery_range != ..._SYSTEM_DEFAULT)
so that we don't try to extend the peer list in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Skipped modifying initial peers list when automatic discovery range is SYSTEM_DEFAULT in c501a76
Signed-off-by: Shane Loretz <[email protected]>
Signed-off-by: Shane Loretz <[email protected]>
Signed-off-by: Shane Loretz <[email protected]>
This PR updates
rmw_connextdds
to be compatible with the upcoming discovery options feature for RMW. (Note that the final agreed upon requirements matrix is the one in this comment).This PR depends on these upstream PRs:
With this PR, the requirements matrix for the new discovery options is almost fully satisfied. There is just one aspect that is not working correctly, which I would appreciate some help with from Connext DDS experts: It seems that Connext DDS participants are not accepting unknown initial peers even though
accept_unknown_peers
is being set totrue
. As a result, theSame Host
matrix is 100% passing while theOther Host
matrix looks like this:The gist of the current problem is:
LOCALHOST
modeOFF
mode-> then the required behavior is for them to discover each other and connect, but they currently do not. Every other scenario is working correctly.
From my reading of the Connext DDS documentation, I believe the current implementation is supposed to work since
accept_unknown_peers
is set to true, but perhaps there is a detail that I'm overlooking.One observation from watching packets with wireshark is that when there's a
LOCALHOST + static peer <=> LOCALHOST + no static peer
connection, the endpoint that has a static peer is sendingDATA(p)
participant information packets while the endpoint without a static peer is never sending theDATA(p)
packets (..even though it does send various ACKNACK and HEARTBEAT packets). I'm happy to provide the wireshark data if that would be helpful.