Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr , dynamic_cast Fail #696

Open
Chris-166 opened this issue Jun 13, 2023 · 17 comments

Comments

@Chris-166
Copy link

Bug report

11149 11149 F libc : Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x70 in tid 11149
06-01 19:19:03.665 11194 11194 F DEBUG : *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
06-01 19:19:03.665 11194 11194 F DEBUG : Revision: '0'
06-01 19:19:03.665 11194 11194 F DEBUG : ABI: 'arm64'
06-01 19:19:03.665 11194 11194 F DEBUG : Timestamp: 2023-06-01 19:19:03+0800
06-01 19:19:03.665 11194 11194 F DEBUG : uid: 10148
06-01 19:19:03.665 11194 11194 F DEBUG : signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x70
06-01 19:19:03.665 11194 11194 F DEBUG : Cause: null pointer dereference
06-01 19:19:03.665 11194 11194 F DEBUG : x0 000000705a8a42e8 x1 0000000000000068 x2 0000007fd0efc0c0 x3 00000070fa842f48
06-01 19:19:03.665 11194 11194 F DEBUG : x4 0000007fd0efc0a0 x5 0000007fd0efc0a9 x6 726574746168632f x7 726574746168632f
06-01 19:19:03.665 11194 11194 F DEBUG : x8 0000000000000000 x9 0000000000000000 x10 0000000000000001 x11 0000000000000000
06-01 19:19:03.665 11194 11194 F DEBUG : x12 7fffffffffffffff x13 fffffffffc000000 x14 0000000000000060 x15 0000000000400000
06-01 19:19:03.665 11194 11194 F DEBUG : x16 00000070666a6c08 x17 000000706638afc4 x18 00000070fb1fc000 x19 000000705a8a4300
06-01 19:19:03.665 11194 11194 F DEBUG : x20 0000007fd0efbe58 x21 000000705a8a3c00 x22 0000000000000068 x23 00000070fa842f48
06-01 19:19:03.665 11194 11194 F DEBUG : x24 0000007fd0efc0c0 x25 00000070faa9b020 x26 0000000000000001 x27 000000706af5f2c8
06-01 19:19:03.665 11194 11194 F DEBUG : x28 00000070fa780fe0 x29 0000007fd0efbdb0

---> [initial analysis]

// src/eProsima/Fast-DDS/src/cpp/fastdds/publisher/PublisherImpl.cpp
06-07 17:59:33.769  9481  9481 F DEBUG   :       #00 pc 000000000037bd90  /system/lib64/libfastrtps.so (eprosima::fastdds::dds::PublisherImpl::create_datawriter(eprosima::fastdds::dds::Topic*, eprosima::fastdds::dds::DataWriterQos const&, eprosima::fastdds::dds::DataWriterListener*, eprosima::fastdds::dds::StatusMask const&)+72) (BuildId: c3a0357250be636d9f8bc7567eee9ef117a559bd)

// src/ros2/rmw_fastrtps/rmw_fastrtps_cpp/src/publisher.cpp
06-07 17:59:33.769  9481  9481 F DEBUG   :       #01 pc 000000000005780c  /system/lib64/librmw_fastrtps_cpp.so (rmw_fastrtps_cpp::create_publisher(CustomParticipantInfo const*, rosidl_message_type_support_t const*, char const*, rmw_qos_profile_s const*, rmw_publisher_options_s const*, bool, bool)+4640) (BuildId: f8b6c59566cd01cfdeaf9606c9b0b7fba029844a)

// src/ros2/rmw_fastrtps/rmw_fastrtps_cpp/src/rmw_publisher.cpp
06-07 17:59:33.769  9481  9481 F DEBUG   :       #02 pc 000000000007fb1c  /system/lib64/librmw_fastrtps_cpp.so (rmw_create_publisher+568) (BuildId: f8b6c59566cd01cfdeaf9606c9b0b7fba029844a)

// src/ros2/rmw_implementation/rmw_implementation/src/functions.cpp
06-07 17:59:33.769  9481  9481 F DEBUG   :       #03 pc 000000000000a174  /system/lib64/librmw_implementation.so (rmw_create_publisher+132) (BuildId: d9a02bcc7c31dffb595d460d46d870e2260cfc19)

// src/ros2/rcl/rcl/src/rcl/publisher.c
06-07 17:59:33.770  9481  9481 F DEBUG   :       #04 pc 0000000000036174  /system/lib64/librcl.so (rcl_publisher_init+2328) (BuildId: c87cc37825a97de3aa60a6311dd5f638bcaffa10)

// org_ros2_rcljava_node_NodeImpl.cpp
06-07 17:59:33.770  9481  9481 F DEBUG   :       #05 pc 0000000000006dc8  /system/lib64/liborg_ros2_rcljava_node__node_impl__jni.so (Java_org_ros2_rcljava_node_NodeImpl_nativeCreatePublisherHandle+296) (BuildId: d2d9fe467c1c5c8af99b768d5dbfabe188227a14
```)

via addr2line backtrace, the code segment where NE occurs is as follows:

https://github.com/eProsima/Fast-DDS/blob/2.8.x/src/cpp/fastdds/publisher/PublisherImpl.cpp
DataWriter* PublisherImpl::create_datawriter(
Topic* topic,
const DataWriterQos& qos,
DataWriterListener* listener,
const StatusMask& mask)
{
logInfo(PUBLISHER, "CREATING WRITER IN TOPIC: " << topic->get_name()); // topic is null
//Look for the correct type registration
TypeSupport type_support = participant_->find_type(topic->get_type_name());
...
}

the topic pointer is null,**because the topic pointer is not obtained through the following code** :

bool
cast_or_create_topic(
eprosima::fastdds::dds::DomainParticipant * participant,
eprosima::fastdds::dds::TopicDescription * desc,
const std::string & topic_name,
const std::string & type_name,
const eprosima::fastdds::dds::TopicQos & topic_qos,
bool is_writer_topic,
TopicHolder * topic_holder)
{
...

if (is_writer_topic) {
  topic_holder->topic = dynamic_cast<eprosima::fastdds::dds::Topic *>(desc); // **dynamic_cast fail**
  assert(nullptr != topic_holder->topic);
}

...
}

**The "desc" type I printed out by adding the following log is "N8eprosima7fastdds3dds5TopicE"**
`LOGE("utils.cpp#cast_or_create_topic, ready to dynamic_cast , desc type is %s", typeid(*desc).name());`


1. Operating System
   (1) Compilation environment:22.04.1-Ubuntu
   (2) Runtime environment:Android 10

2. Version or commit hash
commit ade3e9bb00c9e0cbf98f642cfc828aab81cece08

3.  Fast-DDS: https://github.com/eProsima/Fast-DDS/tree/2.8.x  
(commit 3b7e618de63c7de00715de570ad041fc6efed94f)

4. Steps to reproduce issue
(1) Compile rmw_fastrtps source code  into librmw_fastrtps_shared_cpp.so, push librmw_fastrtps_shared_cpp.so to Android platform(system/lib64);
(2) Refer to the [demo](https://github.com/YasuChiba/ros2-android-test-app/blob/main/app/src/main/java/com/example/ros2_android_test_app/MainActivity.java) to develop an apk. But the so that the apk depends on is under system/lib64, not in the apk project like the demo;
(3) **If Create subscriber first, then create publisher, the above NE will appear; If Create publisher first, then create subscriber , apk can run stably.**

5. Expected behavior:No Exception.

6. Actual behavior:NE crash

@fujitatomoya
Copy link
Collaborator

@Chris-166 Could you provide reproducible colcon project to make this problem happen? having android system is hard for us, that would be really appreciated.

@Chris-166
Copy link
Author

@fujitatomoya We are currently not reproducing this issue on the colcon project.

We want to change the "dynamic_cast" in the following code to "static_cast".
Because we found by adding logs: in normal and abnormal cases, the "desc" object is ""N8eprosima7fastdds3dds5TopicE"" type.

topic_holder->topic = dynamic_cast<eprosima::fastdds::dds::Topic *>(desc); // **dynamic_cast fail**

after modification:
topic_holder->topic = static_cast<eprosima::fastdds::dds::Topic *>(desc);

Are there any risks with this modification?Please help to evaluate, Thanks.

@Chris-166
Copy link
Author

@fujitatomoya For the "dynamic_cast fail", do you have any debugging directions?
Could it be caused by differences in the C++ standard library used by the NDK?

@iuhilnehc-ynos

This comment was marked as off-topic.

@Chris-166
Copy link
Author

@iuhilnehc-ynos Null pointer protection can only prevent the program from crashing, but the program function will be affected.
I want to change "dynamic_cast" to "static_cast",the reason is as follows:

  1. we found by adding logs: in normal and abnormal cases, the "desc" object is ""N8eprosima7fastdds3dds5TopicE"" type;
  2. After changed to static_cast, the program can run normally.
    But I don't know what side effects this modification will introduce.

"NOTE: Without enough information, it's hard to know why the dynamic_cast failed."
-> Yes, I think so. And what information can I provide if further analysis is required? Thanks.

@iuhilnehc-ynos
Copy link
Contributor

iuhilnehc-ynos commented Jun 14, 2023

I want to change "dynamic_cast" to "static_cast"

Currently, it's OK.
but it seems dangerous because static_cast for a pointer from TopicDescription* into Topic* can't promise it's a pointer of Topic.
e.g., there might be a method that is overridden from DomainEntity or a new method belonging to Topic in the future called inside PublisherImpl::create_datawriter, if so and the type of desc is not Topic but a new class derived from TopicDescription, it could cause crash again.

we found by adding logs: in normal and abnormal cases, the "desc" object is ""N8eprosima7fastdds3dds5TopicE"" type;

It seems they're the correct type.

what information can I provide if further analysis is required

I am not sure if you build some libraries with -fno-rtti.

@iuhilnehc-ynos
Copy link
Contributor

we found by adding logs: in normal and abnormal cases, the "desc" object is ""N8eprosima7fastdds3dds5TopicE"" type;

Oh, I see you used the typeid to print the type, which means the RTTI is not disabled.
Sorry, I don't know why the dynamic_cast failed in such a way.

@fujitatomoya
Copy link
Collaborator

topic_holder->topic = dynamic_cast<eprosima::fastdds::dds::Topic *>(desc);

this should be no problem, it can downcast to eprosima::fastdds::dds::Topic.

we found by adding logs: in normal and abnormal cases, the "desc" object is ""N8eprosima7fastdds3dds5TopicE"" type;

this only means that it still can access to desc at this moment? but desc object could be null after?

@Chris-166
Copy link
Author

this only means that it still can access to desc at this moment? but desc object could be null after?

Exception Cases:
(Add Logs)

	  LOGE("utils.cpp#cast_or_create_topic, ready to dynamic_cast , desc type is %s", typeid(*desc).name());
      topic_holder->topic = dynamic_cast<eprosima::fastdds::dds::Topic *>(desc);
	  LOGE("utils.cpp#cast_or_create_topic, after dynamic_cast , desc type is %s", typeid(*desc).name());
	  assert(nullptr != topic_holder->topic);

2023-06-01 19:01:15.222 7052-7052 E/rmw_fastrtps_shared_cpp: utils.cpp#cast_or_create_topic, desc name = rt/chatter, type = std_msgs::msg::dds_::String_
2023-06-01 19:01:15.222 7052-7052 E/rmw_fastrtps_shared_cpp: utils.cpp#cast_or_create_topic, ready to dynamic_cast , desc type is N8eprosima7fastdds3dds5TopicE
2023-06-01 19:01:15.222 7052-7052 E/rmw_fastrtps_shared_cpp: utils.cpp#cast_or_create_topic, after dynamic_cast , desc type is N8eprosima7fastdds3dds5TopicE

@iuhilnehc-ynos
Copy link
Contributor

I'd like to share two links with you.

https://developer.android.com/ndk/guides/common-problems#rttiexceptions_not_working_across_library_boundaries
android/ndk#533 (comment)

Maybe you need to update the TopicDescription::~TopicDescription in a new file src/cpp/fastdds/topic/TopicDescription.cpp, and the src/cpp/CMakeLists.txt.

I am not sure, I didn't test it.

@Chris-166
Copy link
Author

Thank you for your reply!

But I did not understand the description of this exception. Could you please provide me with the patch first to directly verify your doubts?

@Chris-166
Copy link
Author

I want to change "dynamic_cast" to "static_cast"

Currently, it's OK. but it seems dangerous because static_cast for a pointer from TopicDescription* into Topic* can't promise it's a pointer of Topic. e.g., there might be a method that is overridden from DomainEntity or a new method belonging to Topic in the future called inside PublisherImpl::create_datawriter, if so and the type of desc is not Topic but a new class derived from TopicDescription, it could cause crash again.

we found by adding logs: in normal and abnormal cases, the "desc" object is ""N8eprosima7fastdds3dds5TopicE"" type;

It seems they're the correct type.

what information can I provide if further analysis is required

I am not sure if you build some libraries with -fno-rtti.

``

I want to change "dynamic_cast" to "static_cast"

Currently, it's OK. but it seems dangerous because static_cast for a pointer from TopicDescription* into Topic* can't promise it's a pointer of Topic. e.g., there might be a method that is overridden from DomainEntity or a new method belonging to Topic in the future called inside PublisherImpl::create_datawriter, if so and the type of desc is not Topic but a new class derived from TopicDescription, it could cause crash again.

we found by adding logs: in normal and abnormal cases, the "desc" object is ""N8eprosima7fastdds3dds5TopicE"" type;

It seems they're the correct type.

what information can I provide if further analysis is required

I am not sure if you build some libraries with -fno-rtti.

Note:The complete compilation script is as follows

export PYTHON3_EXEC="$( which python3 )"
export PYTHON3_LIBRARY="$( ${PYTHON3_EXEC} -c 'import os.path; from distutils import sysconfig; print(os.path.realpath(os.path.join(sysconfig.get_config_var("LIBPL"), sysconfig.get_config_var("LDLIBRARY"))))' )"
export PYTHON3_INCLUDE_DIR="$( ${PYTHON3_EXEC} -c 'from distutils import sysconfig; print(sysconfig.get_config_var("INCLUDEPY"))' )"
export ANDROID_ABI=arm64-v8a
export ANDROID_TARGET=29
export ANDROID_NATIVE_API_LEVEL=android-29
export ANDROID_TOOLCHAIN_NAME=aarch64-linux-android-clang
 
colcon build \
      --packages-ignore cyclonedds rcl_logging_log4cxx rcl_logging_spdlog rosidl_generator_py rclandroid ros2_talker_android ros2_listener_android \
      --cmake-args \
      -DENABLE_LTTNG=OFF \
      -DTRACETOOLS_DISABLED=ON \
      -DCMAKE_VERBOSE_MAKEFILE=ON \
      -DPYTHON_EXECUTABLE=${PYTHON3_EXEC} \
      -DPYTHON_LIBRARY=${PYTHON3_LIBRARY} \
      -DPYTHON_INCLUDE_DIR=${PYTHON3_INCLUDE_DIR} \
      -DCMAKE_TOOLCHAIN_FILE=${ANDROID_NDK}/build/cmake/android.toolchain.cmake \
      -DANDROID=ON \
      -DANDROID_FUNCTION_LEVEL_LINKING=OFF \
      -DANDROID_NATIVE_API_LEVEL=${ANDROID_TARGET} \
      -DANDROID_TOOLCHAIN_NAME=${ANDROID_TOOLCHAIN_NAME} \
      -DANDROID_STL=c++_shared \
      -DANDROID_ABI=${ANDROID_ABI} \
      -DANDROID_NDK=${ANDROID_NDK} \
      -DTHIRDPARTY=ON \
      -DCOMPILE_EXAMPLES=OFF \
      -DCMAKE_FIND_ROOT_PATH="${PWD}/install" \
      -DBUILD_TESTING=OFF \
      -DRCL_LOGGING_IMPLEMENTATION=rcl_logging_noop \
      -DTHIRDPARTY_android-ifaddrs=FORCE 

@iuhilnehc-ynos
Copy link
Contributor

But I did not understand the description of this exception. Could you please provide me with the patch first to directly verify your doubts?

I guess you used the branch 2.8.x of Fast-DDS, which is mentioned in #696 (comment) (https://github.com/eProsima/Fast-DDS/blob/2.8.x/src/cpp/fastdds/publisher/PublisherImpl.cpp), the patch is based on the latest commit of branch 2.8.x. Please help to check whether it can fix the dynamic is failing issue or not.

@iuhilnehc-ynos
Copy link
Contributor

Note:The complete compilation script is as follows

Thank you.
It's out of my scope, so I am not going to build it on my local machine.

@Chris-166
Copy link
Author

But I did not understand the description of this exception. Could you please provide me with the patch first to directly verify your doubts?

I guess you used the branch 2.8.x of Fast-DDS, which is mentioned in #696 (comment) (https://github.com/eProsima/Fast-DDS/blob/2.8.x/src/cpp/fastdds/publisher/PublisherImpl.cpp), the patch is based on the latest commit of branch 2.8.x. Please help to check whether it can fix the dynamic is failing issue or not.

Using this patch, dynamic is failing issue can still be reproduced.

@fujitatomoya
Copy link
Collaborator

@Chris-166 we are not using android to our platform, and which is not officially supported with ROS 2. (https://docs.ros.org/en/rolling/Releases/Release-Rolling-Ridley.html)

you can keep this issue open I guess, but we are not gonna be able to help you out on this soon.

@fujitatomoya
Copy link
Collaborator

@Chris-166 friendly ping, otherwise i would like to close this issue since Android is not supported platform.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants