-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize QoS to improve responsiveness of reliable endpoints #26
Conversation
This PR resolves some test failures in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
It would be great if there would be a way to check if the user set these qos settings in the qos profile file, and only apply these defaults if they were not set.
In that way, RMW_CONNEXT_ENV_DISABLE_RELIABILITY_OPTIMIZATIONS
wouldn't be needed.
We have been using the environment variable trick for other things though, so applying it here sounds fine.
Unfortunately there is no way to determine where the value of a certain field of QoS returned by
Yeah, the list is growing, but at the moment it's the "cleanest" strategy I can think of. |
I've have reviewed these changes with @GerardoPardo and we both agreed that it would be better to align these settings with the built-in profile The reason for that is that that profile has been reviewed and vetted, and it has been successfully deployed in customer scenarios as the de-facto new "default settings" for the reliability protocol. The main differences with the current settings are that
The last point brings me to a couple of issues that I have identified in my current proposal:
With all of this being said, I have been re-running I have experimented with changing different QoS settings, but only very aggressive reliability settings (which I wouldn't want to set as the global default) get the test to pass. I still haven't been able to nail down the exact reason why the test is failing in the first place. AFAICT the test has a timeout of 3 seconds, which should be plenty of time for samples to be repaired, even with "slower" HB periods of 100 or 200 ms (certainly not Connext's default of 3s though...). The test also fails on a check which, again AFAICT, only depends on a So, at this point, I'd like to put this PR on hold, until I can identify the exact reason why the test is failing, and an exact configuration that makes it pass consistently. In the meantime, maybe we can disable the test for |
Turns out the failures in I'm still leaving this PR open for now, because I think it would be good to change the reliability protocol settings regardless, since the default ones are pretty bad. |
This needs a rebase and CI to get into Rolling. |
Signed-off-by: Andrea Sorbini <[email protected]>
bac4df4
to
92cd8d7
Compare
…ReliabilityProtocol.Common Signed-off-by: Andrea Sorbini <[email protected]>
92cd8d7
to
1d9cad3
Compare
OK, I'm going to go ahead and merge this one in, since CI looks green and this should improve our test times a bit. |
This PR introduces some "on by default" QoS optimizations for the RTPS reliability protocol parameters used by reliable DataWriters and DataReaders.
By default, Connext uses a pretty slow "heartbeat period" of 3 seconds. This configuration makes the reliability protocol pretty slow to react to missed samples.
With these changes,
rmw_connextdds
will reduce the heartbeat period to 100ms, with a minimum of 20ms. This should be in line with the default settings of other DDS vendors.The optimizations can be disabled with variable
RMW_CONNEXT_DISABLE_RELIABILITY_OPTIMIZATIONS
, for example if a user would prefer to customize these parameters via XML (which is ultimately always the best solution for a complex system).