-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Obstacle Layer, Voxel Layer, and Costmap Topic Collision Checker not working in Humble/Rolling due to Fast-DDS Regression #3014
Comments
the problem is that Obstacle Layer not working (the scan data isn't subscribed by Obstacle Layer), becasue it seems that we can't create subscriber in another callback group at runtime since rollling updated in ubuntu22.04, but i don't know why. and i make a simple PR to sovle this problem by removing |
I think that makes sense in the interim, but can you file a ticket describing this issue to the message filters repo? Do you see the same behavior if you create a minimal example or just something we see in Nav2's use of it? The unit test in message filters unsubscribing then later subscribing seems to work https://github.com/ros2/message_filters/blob/master/test/test_subscriber.cpp#L105-L126 so I agree it seems related to callback groups. I went through the git blame for message filters and don't see any changes in that repo that seem problematic (or really even substantial) in the last year. |
It seems like it might be related to this? It might be using the wrong constructor and even though we pass the options, if the 2nd one is called then its not even sent. These are called in the constructor https://github.com/ros2/message_filters/blob/master/include/message_filters/subscriber.h#L186-L202. That's new relative in the last year, but may have never been released on 20.04 for us to see. I think (?) that's the only issue at hand. Let me know if you agree |
I determined root cause of the costmap collision checker issue. Since the obstacle layer isn't getting its callbacks as @gezp reported, the local costmap is empty so it shows always collision free. We use the local costmap in the behavior server, not the global costmap, which does not have the static map set by default in When a static map is set or the global costmap/footprint/frame are used, things work fine! So once we solve the issue described in ros2/rmw_fastrtps#613 / short term patch in #3018 then this should all be good |
I found same issue. (obstacle_layer doesn't subscribe sensor callback) |
could we pin the issue on the top? |
ros2/rmw_fastrtps#619 fixes ros2/rmw_fastrtps#613 on Fast-DDS @Aposhian I believe you were the Fast-DDS user that was blocked especially by this. Can you verify this fixes the issue for you so we can close this out? |
Just FYI, ros2/rmw_fastrtps#619 has been merged in rolling. |
Sweet. It should be pretty easy for me to test then when the binaries are released! Until we have confirmation though, leaving this open 😀 I’d normally grab some shadow-fixed binaries but my dev time is very short right now due to different internal priorities. Most the time I can spend right now is used making sure I’m not the blocker in PRs/issue tickets. ~November things get back to normal. |
I switch to rmw_cyclonedds and it works! Will doing so has any side effect? |
I would read concepts. The key line from the documentation is So depending on your specific use case there might be some side effects. |
Confirming that Humble binaries as of 18-Oct still have this issue. Switching to Cyclone seems to fix. Just commenting here for others that might be trying to track down issue |
I still cannot believe that close to 5 months after the last ROS2 LTS release, its default DDS implementation is not yet fully functional ! |
@SteveMacenski, confirming @dcconner claim. Issue is still, there it's not possible run proper navigation stack in humble as of 21-Oct |
For what it is worth, you are the one being unprofessional. You are insulting a lot of people who are doing a lot of work on the ROS 2 codebase, and I'm not going to stand for it. If you continue to do this, I'm going to lock this conversation. If you want to stick to technical topics, we can continue.
OK, great, that's exactly the kind of feedback that we need. What we need now is a detailed explanation of your system, and what is not working for you. Even better would be a minimal example that we could use to reproduce the issue so that we can eventually incorporate it into the system tests and make sure it doesn't happen going forward. Once we get a better idea of what is going on, we can work with eProsima to get a better feeling of where we can improve this. |
My apologies if whatever I said was disrespectful, definitely not my intention at all. Infact I was trying to assert that the ros2 concept is great and we love the idea and worked super hard to incorporate into our fleet. If there is any way that we can contribute to bringing out these issues ahead of time through pre-release testing, please let me what mailing list or discourse topic to follow. I will try to send a reproducible example as soon as possible |
Lets keep this conversation on the actual issue at hand here folks, this isn't a place for people working on other project ecosystems to come in and given their 2 cents. This is a specific issue regarding a specific project (nav2). Release requests for updated binaries and branches for fixes to solve this problem are fair game - but generally railing on people/projects are not. That's not how you get folks to help you out and work collaboratively with you on this problem. |
Hi everyone, Following up on what @SteveMacenski said, I'd like to try to move this conversation forward on a more constructing manner. As we have show over and over, we are definitely willing to contribute to solve any issue rmw_fastrtps may be manifesting, but to do that we need to understand the actual problems. Mind that we (eProsima) do middleware, not only used in ROS 2 but in many critical systems, so claiming that Fast DDS, or any other implementation for that matter, is broken maybe an over statement (even though for sure there are many ways things to improve. Mind that we do not do navigation nor simulation and these tools are not our strongest suit, so we'd appreciate help on specifically describing what the problems are so we can reproduce on out end. It seems that so far we have not been able to actually understand what issues you are describing here, as the testing we having been doing seemed successful. I do not think GitHub is the best platform to organize the effort of actually fixing these things so I'd like to create a small group of those of you who are willing to help on getting this solved and move the conversation to the Nav2 slack workspace so we can cease angrily commenting here and rather work towards a fix. On a side note, I think it is important, although maybe redundant, to point out that the ROS 2 core is a coordinated effort of a lot of parties, and that there are processes to be followed for contributing, releasing and finally getting things onto the binary repositories; this does not just happen overnight. Furthermore, I'd like to say that all of us (not only eProsima, but also OR and other parties involved) are widely available so reach out to us and we may be able to give you a hand with your concerns more effectively than going on a rampage about all the possible improvements that ROS 2 needs, which are many and not only limited to Nav2. On that thought, it is worth remembering that ROS 2 is a FOSS project, and as such it needs inputs from the community to keep going and grow. As @clalancette pointed out, I think that the Nav2 community specifically could bring a lot of value to the table in the form of integration and system tests, as you guys really leverage most of the "advanced" ROS 2 features, which as this issue surfaces require far more testing. Please write to me here, send me and email at [email protected], or preferably contact me on the Nav2 slack so we can organize a small chat early next week and move from there. |
@aditya2592 I am curious as to the performance differences you saw in "current CycloneDDS" vs what you are getting with FastDDS. We switched to CycloneDDS with Galactic and moved forward with it on Humble. I am curious as to what limitations/latency issues you ran into and what was specifically effected so we can keep an eye out for it as well. When you were testing CycloneDDS were you using the newest iceoryx or an older one? Also did you test configuring the payload size, multicast, specific network interfaces, etc? https://github.com/eclipse-cyclonedds/cyclonedds#run-time-configuration |
@bperseghetti This is Feb 2022, so our iceoryx was probably the old one. I certainly looked at iceoryx, but at the time, the repo said ROS services are not supported and that was big limitation for us since we were trying to shift to a service drive architecture [looks like we missed it by a month]. The issues we saw were all in line with the TSC report here. This report confirmed our observations from production and then it made a lot more sense to keep pushing on Fast DDS since it seemed a lot more efficient. Feel free to contact me through my Github profile for further discussion. |
@aditya2592 you mentioned ROS services not being supported with Iceoryx, I suspect you're thinking of https://github.com/ros2/rmw_iceoryx#Limitations, which is a non-DDS RMW using Iceoryx exclusively. However, the Cyclone DDS RMW can also optionally be configured to use Iceoryx for shared-memory on the same host, while still using regular UDP/TCP DDS for inter-host communication. https://github.com/eclipse-cyclonedds/cyclonedds/blob/master/docs/manual/shared_memory.rst That said, I haven't had much luck with it so far. Enabling Iceoryx with Cyclone broke Nav2 immediately, apparently Iceoryx is compiled only supporting a limited number of "Ports" for its communication and Nav2 blows past that. With a stripped back setup I was able to get the robot teleoping with no traffic for once on the loopback interface :) Here's a Cyclone DDS config file to get it working, note you'll also have to manually run the
|
Related: issues when trying to enable iceoryx shared mem with CycloneDDS on a basic Nav2 stack (repro included): eclipse-cyclonedds/cyclonedds#1326 |
Deleted totally off topic question |
I tried CycloneDDS and it worked. Thanks for your discussion. They are really helpful. Just a quick question, is it possible that nav2 can be used in Fast DDS in the future? |
Presumably so, but that’s not in my control to change |
I had some time to localize the problem and make run-time verification of the bugfix. What was found:
How I've tested the fix:
(*) TB3 simulation passes, testcase from rclcpp Issue1611 was not checked
Some thoughts:
TLDR |
@AlexeyMerzlyakov Thanks for such a thourough testing!
ros2/rclcpp#2033 was the backport for ros2/rclcpp#1640, and was merged on ros2/rclcpp@33cbd76 months ago, so I would say it should also work on Humble. There have been patch releases since then, so it should be easy to test directly from binaries. |
Great to see this progressing !
As this appears to still be a problem on Humble (the LTS version that most people use in production), I believe it should stays open |
Really great analysis @AlexeyMerzlyakov!
@MiguelCompany says this was backported already, should it not work in Humble too? |
I may have missed that, but @AlexeyMerzlyakov tests were done on Rolling, hence my comment. |
Awesome! I think that means we can close this out and unpin it! Thanks @AlexeyMerzlyakov for your very careful analysis of the issue and bringing this topic to a close! |
Does switching DDS to Cyclone fix it? |
I have tried it, and no switching to cyclone does not fix it. I think this is another problem. And apart from the nav2_simple_commander demos, generally, if the localization is running with known map, local costmap shows up, if we are running without known map, then local costmap does not show up. By 'generally` I mean using the scripts under /opt/ros/humble/share/nav2_bringup |
After some time, issues start even with cyclone dds. I need to restart it several times to get it working,
|
More info #2489 (comment)
Seems to have broken sine #2489 for 22.04 / Humble updates.
CC @jwallace42 @gezp
The text was updated successfully, but these errors were encountered: