-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
abort() in test_subscription__rmw_cyclonedds_cpp #279
Comments
The original
dds_take that is expecting an array of pointers to messages. That gets one into trouble.
Secondly, I do think the messages themselves ought to be initialised, because I think the deserialiser assumes it is storing into a valid sample. (See
Totally fragile. I should've replaced that read/take interface when I had the chance of freely doing so when I discovered it, shortly after it was open-sourced. Now changing those interfaces needs some effort to make the transition as easy as possible for users (other than ROS 2). It currently takes an array of pointers to samples (it is emulating a
There are two lines of reasoning:
The abort originally entered the code as a shortcut (no time to waste on unused things), but I think (2) is actually the wiser route.
Within the boundaries of the current The only proper solution is to fix the interface of Cyclone. That'll either be a backwards incompatible change (the desire to change this interface is one of the reasons I have been holding the major version at 0) or it'll be a second interface sitting next to it (with the problem of finding a good name). So that route simply takes more time. My suggestion would be to take the ugly fix in the RMW layer, then fix Cyclone's interface and finally switch to using the fixed interface in the RMW layer. |
OK, so if I'm understanding properly, you are suggesting that we actually implement Additionally, I think we probably want to augment the test in https://github.com/ros2/rmw_implementation/blob/master/test_rmw_implementation/test/test_subscription.cpp#L526 . In particular, I think we should add a case that exercises this exact problem (when the pointer is NULL), and a case where the pointer is properly setup (which is probably the original intent of this test). @eboasson I'm going to work on the second one (the test) tomorrow. If you have time to work on the first one (implementation of |
Eventually yes. But this:
was actually meant as a reference to:
If I am not mistaken, the type confusion I mentioned (made oh so easy by the weak type system of C) is the real bug. A functionally correct (but ugly) fix is easy and I'll try to do it today. The most uncertain part of the timing is that I'd like to reliably reproduce it locally to verify my reading of the code and the fix. |
ros2/rmw_implementation#175 should fix the test. I didn't add the
So I think that between the fix to rmw_cyclonedds and the fix to test_rmw_implementation, we can consider this issue fixed. I'm going to review the rmw_cyclonedds PR shortly. |
Closing this as we fixed it years ago. |
We're seeing occasional failures in test_subscription__rmw_cyclonedds_cpp::take_sequence: https://ci.ros2.org/view/nightly/job/nightly_linux-aarch64_repeated/1491/testReport/junit/(root)/projectroot/test_subscription__rmw_cyclonedds_cpp/
I finally managed to capture a stack trace, and it looks like this:
Digging through the trace, we can see that the real code starts in frame 9, inside the test, calling
rmw_take_sequence()
. Via a series of other calls, it ends up inside of cyclonedds itself, in dds_read_impl. What happens in there is that ifbuf[0] == NULL
, we end up attempting toddsi_sertype_realloc_samples
. However,rmw_cyclonedds_cpp
sets the realloc callbacks toabort()
here: https://github.com/ros2/rmw_cyclonedds/blob/master/rmw_cyclonedds_cpp/src/serdata.cpp#L447Backing up a bit, we can see why this is failing. The test calls rmw_message_sequence_init to initialize the message. The
size
parameter is 1, so we allocate onevoid *
pointer in there. However, we do not initialize it. This is thenvoid **buf
that eventually gets passed intodds_read_impl
. In most cases, the sequence of bytes stored in that pointer is not zero, so we never end up trying to callddsi_sertype_realloc_samples
(and hence we don'tabort()
). However, if that pointer just happens to be NULL, then we will attempt to call theddsi_sertype_realloc_samples
and abort.This leads to a couple of questions:
take_sequence
test correct? That is, should we be initializing the returned data pointers before passing them intormw_take_sequence
?buf[0] == NULL
inside ofdds_read_impl
seems somewhat fragile. Should that be using a buffer size or something?@eboasson looking for any feedback from you on what the correct path forward here is.
The text was updated successfully, but these errors were encountered: