-
Notifications
You must be signed in to change notification settings - Fork 297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Solitary messages published from ROS 1 publishers that do not latch will sometimes not arrive at the ROS 2 subscriber #130
Comments
From the description I have my doubts that the single message would be received by any ROS 1 subscriber / node. There is a none-zero time between creating a publisher and the connections between all interested subscribers being established. During that interval "early" published messages are expected to be lost. That is by design in an asynchronous publish / subscribe system (without any kind of caching which the latching does provide). If you can confirm that this is the source of your problem I don't think there is anything in the bridge to improve this behavior. |
Just to avoid confusion, this is an orthogonal issue to the hard hang issue. We're still investigating that. It was just something we discovered during testing. The scripts we're using to test (the ones without the latching publishers) are acceptance testing scripts from the University of Edinburgh that have worked fine in a pure ROS 1 environment in the past, they use it to shake out their robot every time they have a maintenance visit from NASA and NASA has used them a bit in the past as well. You can find them here: https://github.com/ipab-slmc/valkyrie_testing_edi. Our API is moving from custom comms w/ optional custom ROS 1 translator to DDS + ROS 2 compliant conventions w/ optional ROS 1 bridge. So we're updating their scripts to use as an acceptance test for the ROS 1 bridge layer. And on the ROS 2 side of things we're using Reliable QoS configurations for Fast-RTPS. It just seems like it's a regression that we have to make the publisher scripts latch when using the bridge and we didn't before. But I understand that the bridge is fundamentally different than pure ROS 1 so it might be useful to document more formally in the bridge usage instructions instead of "fixing" the "issue" which might not be an issue at all and just a side effect of the technology. It could also be specific to Fast-RTPS but we don't have the ability to change our DDS implementation currently but we're investigating that. |
The Since the master needs to be polled for information the delay between you publisher getting created and the bridge actually subscribing to the topic will likely be longer than in the case where you already have a subscriber running. I would guess that the increased delay is causing your msg loss. You either update your code to not rely on the time until the connections are established to be very short or you want to start the bridge with explicitly bridging the topic in question without relying on the polled information from the master. |
Maybe a better way to phrase it is that I acknowledge using the bridge to connect a ROS 1 publisher to a ROS 2 subscriber is a very different beast than publishing from a ROS 1 node to a ROS 2 subscriber with no middleman. I guess where I'm coming from is that the way the docs are written seems to imply that the flags for things like |
Using I don't think that code which creates a publisher and publishes one message immediately is a common use case. Simply because ROS doesn't guarantee that this works all the time - even without the bridge being involved. Please feel free to update the docs to add a paragraph (or more) about this scenario to guide future readers. |
I don't have a good intuition for what exactly the overhead is here, is it computational or memory? We haven't noticed any issues when running the bridge with those flags but we have relatively beefy systems. I'd love to make a PR against the README but I'm far from an expert on how the bridge internals work so I just want to make sure I have a firm grasp on it to make sure we're using it correctly. |
If your ROS graph has many topics / services you are not interested in to bridge all of these will be subscribed to and messages will be sent to the bridge unnecessarily. Depending on the size of your system and the size and frequency of the messages that can pose a significant overhead, e.g. just consider a camera node advertising raw as well as compressed topics. Maybe without the bridge most of them are not even being used so the node doesn't perform any computation for them. With the bridge using |
That makes sense. Thanks for discussing it with me. I might try to make a pass at the README's later this week after I play around with it a bit more. |
Bug report
Required Info:
Ubuntu 16.04
Binaries
Ardent
Fast-RTPS
N/A
Steps to reproduce issue
Create a ROS 2 subscriber and start it, then start a
ros1_bridge dynamic_bridge
WITHOUT--bridge-all-topics
.Create a ROS 1 node with a publisher targeting the ROS 2 subscriber that does not latch and only publishes a single message (does not stream messages), and that exits after publishing its message. Run this node.
Expected behavior
The ROS 2 subscriber should receive the message via the bridge.
Actual behavior
Sometimes, the bridge will accept the message but the subscriber will never receive it. It's not reliably reproducible, but it DOES reliably STOP happening if you use
--bridge-all-topics
. It is also reliably fixed by using a latching publisher.The text was updated successfully, but these errors were encountered: