Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIGSEGV in libc++_shared while _Unwind_Backtrace #816

Closed
pendyalasyam opened this issue Oct 11, 2018 · 17 comments
Closed

SIGSEGV in libc++_shared while _Unwind_Backtrace #816

pendyalasyam opened this issue Oct 11, 2018 · 17 comments

Comments

@pendyalasyam
Copy link

I have an application which internally uses three .so files. From Java, I am calling function in lib1.so via JNI which internally call function in lib2.so. lib2.so make a call to lib3.so whose functionality
is to print the back trace. This is set up is working fine when I am using libgnustl_shared.so. But after switching to libc++_shared.so, the application is crashing. After hours of analysis, I observed
that _Unwind_Backtrace is successfully calling the callback provided for number of times same as the number of function calls in lib2.so and lib3.so together. Now while _Unwind_Backtrace is calling
the callback for the functions in the lib1.so, it is crashing with SEGV

To be clear following is my call flow.

Java -> lib1.so (func1 -> func2) -> lib2.so (func3 -> func4 -> func5) -> lib3.so(func 6 -> func 7 -> _Unwind_Backtrace)

_Unwind_Backtrace is calling the callback provided to it successfully for 5 times ( 2 functions in lib3.so and 3 functions in lib2.so) and then crashing with following error statement

10-11 04:17:28.956 16882 16904 F libc : Fatal signal 11 (SIGSEGV), code 1, fault addr 0x1f in tid 16904
10-11 04:17:29.043 16905 16905 F DEBUG : #00 pc 0007d84e /data/app/com.myapp/lib/arm/libc++_shared.so
10-11 04:17:29.044 16905 16905 F DEBUG : #1 pc 0007d67b /data/app/com.myapp/lib/arm/libc++_shared.so
10-11 04:17:29.044 16905 16905 F DEBUG : #2 pc 00079bf5 /data/app/com.myapp/lib/arm/libc++_shared.so
10-11 04:17:29.044 16905 16905 F DEBUG : #3 pc 00079b97 /data/app/com.myapp/lib/arm/libc++_shared.so (__gxx_personality_v0+270)

This is happening very consistently. So this could not be memory corruption.

Even after hours of thinking, I could not understand what is going wrong. If anybody has any idea, kindly help.


Environment Details

  • NDK Version: 16.1.1
  • Build system: ndk-build + cmake + standalone toolchain
  • Host OS: Windows
  • Compiler: clang
  • ABI: arm
  • STL: c++_shared
  • NDK API level: 23
  • Device API level: 23
@DanAlbert
Copy link
Member

The main thing to look for is whether or not the unwinder was linked properly in each of your libraries (third-party dependencies too; check every .so in your application). This can be diagnosed easily with readelf. For Windows, there's a readelf in each of the NDK's binutils directories. i.e. $NDK/toolchains/arm-linux-androideabi-4.9/prebuilt/windows-x86_64/bin/arm-linux-androideabi-readelf.

$ readelf -sW lib1.so | grep Unwind

If the library was correctly, you'll see something like reported in #785. Every unwind symbol (with the exception of __gnu_Unwind_Find_exidx) is either LOCAL or HIDDEN (or both). If you see any that are not like this aside from the aforementioned __gnu_Unwind_Find_exidx (this one is expected as it actually comes from libc), the library was not built correctly. They're usually GLOBAL DEFAULT, but WEAK DEFAULT or any other form that is not some variation of private is a problem.

https://android.googlesource.com/platform/ndk/+/master/docs/BuildSystemMaintainers.md#Unwinding details the general requirements for linking the STL correctly. It's somewhat ahead of time (libgcc is not a linker script until r19, which is not even in beta yet), but the concepts are correct. CMake builds your libraries correctly as long as their dependencies are not also broken. For a standalone toolchain you need to manually set up the -Wl,--exclude-libs arguments. ndk-build should never exhibit this problem for any library that it builds (it could still come from a third-party library, but the ndk-build library will not be the source of the crash).

If none of those things help, we'll need a test case to dig into this (one that repros with r18). Our tests don't show any issues in this area and there's not enough information to know what went wrong. I strongly suspect that the information in #379 will turn something up though.

@pendyalasyam
Copy link
Author

@DanAlbert, Could you please help me understand "Why unwinder has to be linked properly in each of libraries". If unwinder is not linked with the library, wont it work for unwinding function calls from those libraries??

@DanAlbert
Copy link
Member

I don't understand your question. How's it going to unwind without an unwinder?

@pendyalasyam
Copy link
Author

"I don't understand your question. How's it going to unwind without an unwinder?"
@DanAlbert, sorry. Actually, I don't have any idea how unwinder works. Just thought that only the lib that is using unwind code has to be linked with libunwind.a. But if you say, every lib has to be linked, then I need to spend sometime to understand how it works.

However, I have run readelf with my libs and following is the output.

####################readelf from lib1.so##################################
11: 0001b774 36 FUNC GLOBAL DEFAULT 13 _Unwind_Resume
332: 00000000 0 FUNC WEAK DEFAULT UND __gnu_Unwind_Find_exidx
333: 0001b680 0 FUNC GLOBAL DEFAULT 13 __gnu_Unwind_Restore_VFP_D
334: 0001b670 0 FUNC GLOBAL DEFAULT 13 __gnu_Unwind_Restore_VFP
335: 0001b690 0 FUNC GLOBAL DEFAULT 13 __gnu_Unwind_Restore_VFP_D_16_to_31
336: 0001b6a0 0 FUNC GLOBAL DEFAULT 13 __gnu_Unwind_Restore_WMMXD
337: 0001b728 0 FUNC GLOBAL DEFAULT 13 __gnu_Unwind_Restore_WMMXC
341: 0001abac 8 FUNC GLOBAL DEFAULT 13 _Unwind_GetCFA
342: 0001abb4 164 FUNC GLOBAL DEFAULT 13 __gnu_Unwind_RaiseException
343: 0001ac58 28 FUNC GLOBAL DEFAULT 13 __gnu_Unwind_ForcedUnwind
344: 0001ac74 116 FUNC GLOBAL DEFAULT 13 __gnu_Unwind_Resume
345: 0001ace8 32 FUNC GLOBAL DEFAULT 13 __gnu_Unwind_Resume_or_Rethrow
346: 0001ad08 4 FUNC GLOBAL DEFAULT 13 _Unwind_Complete
347: 0001ad0c 24 FUNC GLOBAL DEFAULT 13 _Unwind_DeleteException
348: 0001ad24 92 FUNC GLOBAL DEFAULT 13 _Unwind_VRS_Get
349: 0001ada8 92 FUNC GLOBAL DEFAULT 13 _Unwind_VRS_Set
350: 0001ae30 200 FUNC GLOBAL DEFAULT 13 __gnu_Unwind_Backtrace
355: 0001b2fc 864 FUNC GLOBAL DEFAULT 13 _Unwind_VRS_Pop
356: 0001b688 0 FUNC GLOBAL DEFAULT 13 __gnu_Unwind_Save_VFP_D
357: 0001b678 0 FUNC GLOBAL DEFAULT 13 __gnu_Unwind_Save_VFP
358: 0001b698 0 FUNC GLOBAL DEFAULT 13 __gnu_Unwind_Save_VFP_D_16_to_31
359: 0001b6e4 0 FUNC GLOBAL DEFAULT 13 __gnu_Unwind_Save_WMMXD
360: 0001b73c 0 FUNC GLOBAL DEFAULT 13 __gnu_Unwind_Save_WMMXC
362: 0001b750 36 FUNC GLOBAL DEFAULT 13 ___Unwind_RaiseException
363: 0001b750 36 FUNC GLOBAL DEFAULT 13 _Unwind_RaiseException
364: 0001b774 36 FUNC GLOBAL DEFAULT 13 ___Unwind_Resume
365: 0001b798 36 FUNC GLOBAL DEFAULT 13 ___Unwind_Resume_or_Rethrow
366: 0001b798 36 FUNC GLOBAL DEFAULT 13 _Unwind_Resume_or_Rethrow
367: 0001b7bc 36 FUNC GLOBAL DEFAULT 13 ___Unwind_ForcedUnwind
368: 0001b7bc 36 FUNC GLOBAL DEFAULT 13 _Unwind_ForcedUnwind
369: 0001b7e0 36 FUNC GLOBAL DEFAULT 13 ___Unwind_Backtrace
370: 0001b7e0 36 FUNC GLOBAL DEFAULT 13 _Unwind_Backtrace
372: 0001bc68 16 FUNC GLOBAL DEFAULT 13 _Unwind_GetRegionStart
373: 0001bc78 28 FUNC GLOBAL DEFAULT 13 _Unwind_GetLanguageSpecificData
374: 0001bc94 8 FUNC GLOBAL DEFAULT 13 _Unwind_GetDataRelBase
375: 0001bc9c 8 FUNC GLOBAL DEFAULT 13 _Unwind_GetTextRelBase

####################readelf from lib2.so##################################
23: 001e0c48 36 FUNC GLOBAL DEFAULT 13 _Unwind_Resume
5998: 00000000 0 FUNC WEAK DEFAULT UND __gnu_Unwind_Find_exidx
5999: 001e0b54 0 FUNC GLOBAL DEFAULT 13 __gnu_Unwind_Restore_VFP_D
6000: 001e0b44 0 FUNC GLOBAL DEFAULT 13 __gnu_Unwind_Restore_VFP
6001: 001e0b64 0 FUNC GLOBAL DEFAULT 13 __gnu_Unwind_Restore_VFP_D_16_to_31
6002: 001e0b74 0 FUNC GLOBAL DEFAULT 13 __gnu_Unwind_Restore_WMMXD
6003: 001e0bfc 0 FUNC GLOBAL DEFAULT 13 __gnu_Unwind_Restore_WMMXC
6005: 001e0080 8 FUNC GLOBAL DEFAULT 13 _Unwind_GetCFA
6006: 001e0088 164 FUNC GLOBAL DEFAULT 13 __gnu_Unwind_RaiseException
6007: 001e012c 28 FUNC GLOBAL DEFAULT 13 __gnu_Unwind_ForcedUnwind
6008: 001e0148 116 FUNC GLOBAL DEFAULT 13 __gnu_Unwind_Resume
6009: 001e01bc 32 FUNC GLOBAL DEFAULT 13 __gnu_Unwind_Resume_or_Rethrow
6010: 001e01dc 4 FUNC GLOBAL DEFAULT 13 _Unwind_Complete
6011: 001e01e0 24 FUNC GLOBAL DEFAULT 13 _Unwind_DeleteException
6012: 001e01f8 92 FUNC GLOBAL DEFAULT 13 _Unwind_VRS_Get
6013: 001e027c 92 FUNC GLOBAL DEFAULT 13 _Unwind_VRS_Set
6014: 001e0304 200 FUNC GLOBAL DEFAULT 13 __gnu_Unwind_Backtrace
6018: 001e07d0 864 FUNC GLOBAL DEFAULT 13 _Unwind_VRS_Pop
6019: 001e0b5c 0 FUNC GLOBAL DEFAULT 13 __gnu_Unwind_Save_VFP_D
6020: 001e0b4c 0 FUNC GLOBAL DEFAULT 13 __gnu_Unwind_Save_VFP
6021: 001e0b6c 0 FUNC GLOBAL DEFAULT 13 __gnu_Unwind_Save_VFP_D_16_to_31
6022: 001e0bb8 0 FUNC GLOBAL DEFAULT 13 __gnu_Unwind_Save_WMMXD
6023: 001e0c10 0 FUNC GLOBAL DEFAULT 13 __gnu_Unwind_Save_WMMXC
6025: 001e0c24 36 FUNC GLOBAL DEFAULT 13 ___Unwind_RaiseException
6026: 001e0c24 36 FUNC GLOBAL DEFAULT 13 _Unwind_RaiseException
6027: 001e0c48 36 FUNC GLOBAL DEFAULT 13 ___Unwind_Resume
6028: 001e0c6c 36 FUNC GLOBAL DEFAULT 13 ___Unwind_Resume_or_Rethrow
6029: 001e0c6c 36 FUNC GLOBAL DEFAULT 13 _Unwind_Resume_or_Rethrow
6030: 001e0c90 36 FUNC GLOBAL DEFAULT 13 ___Unwind_ForcedUnwind
6031: 001e0c90 36 FUNC GLOBAL DEFAULT 13 _Unwind_ForcedUnwind
6032: 001e0cb4 36 FUNC GLOBAL DEFAULT 13 ___Unwind_Backtrace
6033: 001e0cb4 36 FUNC GLOBAL DEFAULT 13 _Unwind_Backtrace
6035: 001e113c 16 FUNC GLOBAL DEFAULT 13 _Unwind_GetRegionStart
6036: 001e114c 28 FUNC GLOBAL DEFAULT 13 _Unwind_GetLanguageSpecificData
6037: 001e1168 8 FUNC GLOBAL DEFAULT 13 _Unwind_GetDataRelBase
6038: 001e1170 8 FUNC GLOBAL DEFAULT 13 _Unwind_GetTextRelBase

####################readelf from lib3.so##################################
10: 0019fd68 36 FUNC GLOBAL DEFAULT 13 _Unwind_Resume
1215: 00000000 0 FUNC WEAK DEFAULT UND __gnu_Unwind_Find_exidx

readelf & grep unwind on lib1.so and lib3.so are giving same outputs. So I am assuming there is no problem with linking unwinder.
I even checked which cpp flags are used for compiling cpp files of lib1 and lib3. Both are using the options -fexceptions -funwind-tables -frtti options.

you said "WEAK DEFAULT or any other form that is not some variation of private is a problem.". But in lib3.so, __gnu_Unwind_Find_exidx is WEAK DEFAULT and unwind is working for lib3.so

@DanAlbert
Copy link
Member

So I am assuming there is no problem with linking unwinder.

Other way around. Those are all built incorrectly.

@pendyalasyam
Copy link
Author

The output is same even when I used gnustl and with gnustl the same setup is working fine.
But when I use libc++_shared, then only it is crashing.

However, I take your suggestion. I will get those symbols to LOCAL HIDDEN and will try running.

If you have time, could you please tell me why these symbols has to be LOCAL HIDDEN. What is the dependency? why cant they be GLOBAL DEFAULT?

@bhupeshpant19jan
Copy link

@DanAlbert, Could you please provide some readouts for Local and Global references you are talking about? I think we both are out of sync on this issue.
Thanks!!

@DanAlbert
Copy link
Member

The output is same even when I used gnustl and with gnustl the same setup is working fine.

gnustl is different. What I said above remains true for libc++.

Could you please provide some readouts for Local and Global references you are talking about?

readelf on libc++ itself shows them:

$ readelf -sW android-ndk-r19-canary/sources/cxx-stl/llvm-libc++/libs/armeabi-v7a/libc++_shared.so | grep _Unwind 
  2420: 00000000     0 FUNC    GLOBAL DEFAULT  UND __gnu_Unwind_Find_exidx
 31253: 0007125d    24 FUNC    LOCAL  DEFAULT   12 _ZN10__cxxabiv1L22exception_cleanup_funcE19_Unwind_Reason_CodeP21_Unwind_Control_Block
 31254: 00071605    36 FUNC    LOCAL  DEFAULT   12 _ZN10__cxxabiv1L27dependent_exception_cleanupE19_Unwind_Reason_CodeP21_Unwind_Control_Block
 32159: 00071de9    56 FUNC    LOCAL  DEFAULT   12 _ZL13_Unwind_GetGRP15_Unwind_Contexti
 32160: 00072191    52 FUNC    LOCAL  DEFAULT   12 _ZL13_Unwind_SetGRP15_Unwind_Contextij
 32163: 00071ac5   804 FUNC    LOCAL  DEFAULT   12 _ZN10__cxxabiv1L11scan_eh_tabERNS_12_GLOBAL__N_112scan_resultsE14_Unwind_ActionbP21_Unwind_Control_BlockP15_Unwind_Context
 32164: 00071e3d    72 FUNC    LOCAL  DEFAULT   12 _ZN10__cxxabiv1L13set_registersEP21_Unwind_Control_BlockP15_Unwind_ContextRKNS_12_GLOBAL__N_112scan_resultsE
 32165: 00071e21    28 FUNC    LOCAL  DEFAULT   12 _ZN10__cxxabiv1L14call_terminateEbP21_Unwind_Control_Block
 32166: 00071ab1    20 FUNC    LOCAL  DEFAULT   12 _ZN10__cxxabiv1L15continue_unwindEP21_Unwind_Control_BlockP15_Unwind_Context
 32168: 000720e5   108 FUNC    LOCAL  DEFAULT   12 _ZN10__cxxabiv1L24exception_spec_can_catchExPKhhPKNS_16__shim_type_infoEPvP21_Unwind_Control_Block
 34186: 00074005   280 FUNC    LOCAL  DEFAULT   12 _ZL13unwind_phase2P13unw_context_tP12unw_cursor_tP21_Unwind_Control_Blockb
 34187: 00073f25    48 FUNC    LOCAL  DEFAULT   12 _ZN12_GLOBAL__N_114unwindOneFrameEjP21_Unwind_Control_BlockP15_Unwind_Context
 36512: 0007411d     2 FUNC    LOCAL  HIDDEN    12 _Unwind_Complete
 36513: 000741f1    12 FUNC    LOCAL  HIDDEN    12 _Unwind_DeleteException
 36514: 00074189    52 FUNC    LOCAL  HIDDEN    12 _Unwind_GetLanguageSpecificData
 36515: 000741bd    52 FUNC    LOCAL  HIDDEN    12 _Unwind_GetRegionStart
 36516: 00073f65   160 FUNC    LOCAL  HIDDEN    12 _Unwind_RaiseException
 36517: 00074121   104 FUNC    LOCAL  HIDDEN    12 _Unwind_Resume
 36518: 00073c75   172 FUNC    LOCAL  HIDDEN    12 _Unwind_VRS_Get
 36519: 00073a09   620 FUNC    LOCAL  HIDDEN    12 _Unwind_VRS_Interpret
 36520: 00073dcd   336 FUNC    LOCAL  HIDDEN    12 _Unwind_VRS_Pop
 36521: 00073d21   172 FUNC    LOCAL  HIDDEN    12 _Unwind_VRS_Set
 39058: 00000000     0 FUNC    GLOBAL DEFAULT  UND __gnu_Unwind_Find_exidx

@pendyalasyam
Copy link
Author

@DanAlbert , I am still struggling to understand and resolve this problem.

Could you please help me understand the following

You are saying _Unwind* symbols has to be present in all the three libs and all of them has to be "LOCAL HIDDEN" except those you pointed out.

As mentioned earlier,

  1. lib1 function calls lib2 function
  2. lib2 function calls lib3 function
  3. lib3 is the one who is using _Unwind_Backtrace function. So if we don't link lib3 with libunwind.a, linking will fail. So we have to link lib3 with libunwind.a.
    Since we are linking libunwind.a with lib3, it is expected behavior that lib3 will contain _Unwind symbols. But why lib2 and lib1 have to have _Unwind symbols even when they don't contain any specific code related to unwind?

Please help me understand.

@DanAlbert
Copy link
Member

DanAlbert commented Oct 31, 2018

If you have time, could you please tell me why these symbols has to be LOCAL HIDDEN. What is the dependency? why cant they be GLOBAL DEFAULT?

To ensure that the unwinder invoked from any given callsite is the same unwinder throughout the unwind. An unwinder is linked into every binary. Different unwinders have different implementations but the same API. When not hidden, it's possible for the loader to resolve some of the symbols from one unwinder and some from the other. The two implementations then call into each other during the same unwind. This is bad.

But why lib2 and lib1 have to have _Unwind symbols even when they don't contain any specific code related to unwind?

If they didn't do any unwinding they wouldn't have references to unwind symbols. You can disassemble the binaries to see where that's coming from if you want.

@pendyalasyam
Copy link
Author

Interestingly, the same code base is working on android x86 emulator. But crashing on actual physical phones i.e. arm android phones.

@DanAlbert
Copy link
Member

Yes, ARM and x86 use different unwinders on Android.

@pendyalasyam
Copy link
Author

pendyalasyam commented Mar 14, 2019

Finally, we got it working. Until now, we haven't linked our libs with cxx-stl\llvm-libc++\libs\armeabi-v7a\libunwind.a. When we started linking, it started working.

@DanAlbert,
my assumption is when we are not linking with libunwind.a, application may be using unwinder provided by libgcc and so it is crashing. Is my assumption correct?

Also could you please help me understand why there is no libunwind.a present under cxx-stl\llvm-libc++\libs\x86?

@DanAlbert
Copy link
Member

The doc I linked above explains it.

@enh
Copy link
Contributor

enh commented Mar 14, 2019

specifically https://android.googlesource.com/platform/ndk/+/master/docs/BuildSystemMaintainers.md#Unwinding which says "To avoid this problem, libraries should always be built with -Wl,--exclude-library,libgcc.a and -Wl,--exclude-library,libunwind.a (the latter is only necessary for 32-bit ARM) to ensure that unwind symbols are not re-exported from shared libraries."

@falken42
Copy link

@enh Shouldn't that actually be -Wl,--exclude-libs and not -Wl,--exclude-library?

@DanAlbert
Copy link
Member

@falken42 yes, good catch. The other places it was used in that doc were correct, at least: https://android-review.googlesource.com/c/platform/ndk/+/929717

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants