Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extremely high Segfault crash rate on Android #468

Closed
2 of 3 tasks
tobi512 opened this issue Jan 26, 2021 · 17 comments
Closed
2 of 3 tasks

Extremely high Segfault crash rate on Android #468

tobi512 opened this issue Jan 26, 2021 · 17 comments

Comments

@tobi512
Copy link

tobi512 commented Jan 26, 2021

Description

Hi there,
we are experiencing extremely high rate of Segfault crashes since roughly one week (70k+ events right now). Currently we're using v0.4.4 (probably transitive through the normal Sentry SDK right?).
Now the latest version 0.4.5 shows an interesting changelog entry Fixed a potential segfault when doing concurrent scope modification., so could anyone tell us if this fix would apply to our Segfault crashes as well?

We'll definitely update as soon as possible, is the new version already used somewhere? We're using both @sentry_react-native and sentry-android in our project, I think you will release a new version of sentry-android that includes the native SDK soon, correct?

When does the problem happen

  • During build
  • During run-time
  • When capturing a hard crash (?)

Environment

All kind of Android devices, crashes like this happen to us since quite a while, but with the latest release of our app, it totally escalated and destroys our Sentry quota at the moment.

Steps To Reproduce

Log output

Couple example crash outputs (raw):

OS Version: Android 8.0.0 (AGS2-W09 8.0.0.317(OCEC431))
Report Version: 104

Exception Type: Unknown (SIGSEGV)

Application Specific Information:
Segfault

Thread 0 Crashed:
0   libhwui.so                      0x7efef09860        <unknown> + 545443059808
1   libhwui.so                      0x7efeee18e4        <unknown> + 545442896100
2   libhwui.so                      0x7efef09930        <unknown> + 545443060016
3   libhwui.so                      0x7efeee18e4        <unknown> + 545442896100
4   libhwui.so                      0x7efef09930        <unknown> + 545443060016
5   libhwui.so                      0x7efeee18e4        <unknown> + 545442896100
6   libhwui.so                      0x7efef09930        <unknown> + 545443060016
7   libhwui.so                      0x7efeee18e4        <unknown> + 545442896100
8   libhwui.so                      0x7efef09930        <unknown> + 545443060016
9   libhwui.so                      0x7efeee18e4        <unknown> + 545442896100
10  libhwui.so                      0x7efef09930        <unknown> + 545443060016
11  libhwui.so                      0x7efeee18e4        <unknown> + 545442896100
12  libhwui.so                      0x7efef09930        <unknown> + 545443060016
13  libhwui.so                      0x7efeee18e4        <unknown> + 545442896100
14  libhwui.so                      0x7efef09930        <unknown> + 545443060016
15  libhwui.so                      0x7efeee18e4        <unknown> + 545442896100
16  libhwui.so                      0x7efef09930        <unknown> + 545443060016
17  libhwui.so                      0x7efeee18e4        <unknown> + 545442896100
18  libhwui.so                      0x7efef09930        <unknown> + 545443060016
19  libhwui.so                      0x7efef09680        android::uirenderer::RenderNode::prepareTree
20  libandroid_runtime.so           0x7eff3d5d94        <unknown> + 545448091028
21  libhwui.so                      0x7efeec3b44        <unknown> + 545442773828
22  libhwui.so                      0x7efeec784c        <unknown> + 545442789452
23  libhwui.so                      0x7efeec7664        <unknown> + 545442788964
24  libhwui.so                      0x7efeecdf20        android::uirenderer::renderthread::RenderThread::threadLoop
25  libutils.so                     0x7f010a06f8        android::Thread::_threadLoop
26  libandroid_runtime.so           0x7eff38c3a0        android::AndroidRuntime::javaThreadShell
27  libc.so                         0x7f003f91bc        <unknown> + 545465012668
28  libc.so                         0x7f003b0ee8        <unknown> + 545464717032
29  <unknown>                       0x0                 <unknown>



EOF
OS Version: Android 10 (QPYS30.52-22-8-7)
Report Version: 104

Exception Type: Unknown (SIGSEGV)

Application Specific Information:
Segfault

Thread 0 Crashed:
0   libandroid_runtime.so           0xb0dd0420          <unknown> + 2967274528
1   libandroid_runtime.so           0xb0dd037d          <unknown> + 2967274365
2   boot-core-libart.oat            0x70f303bb          <unknown> + 1894974395



EOF
OS Version: Android 10 (EML-L29 10.0.0.171(C432E5R1P3))
Report Version: 104

Exception Type: Unknown (SIGSEGV)

Application Specific Information:
Segfault

Thread 0 Crashed:
0   libhermes.so                    0x780419ab00        <unknown> + 515464866560
1   libhermes.so                    0x780410d908        <unknown> + 515464288520
2   libhermes.so                    0x78040d55fc        <unknown> + 515464058364
3   libhermes.so                    0x78040d681c        <unknown> + 515464063004
4   libhermes.so                    0x78040d5940        <unknown> + 515464059200
5   libhermes.so                    0x78040c5b10        <unknown> + 515463994128
6   libhermes.so                    0x78040d4a0c        <unknown> + 515464055308
7   libhermes.so                    0x78040d7e38        <unknown> + 515464068664
8   libhermes.so                    0x78040d5940        <unknown> + 515464059200
9   libhermes.so                    0x78040c5b10        <unknown> + 515463994128
10  libhermes.so                    0x78040d4a0c        <unknown> + 515464055308
11  libhermes.so                    0x78040d7e38        <unknown> + 515464068664
12  libhermes.so                    0x78040d5940        <unknown> + 515464059200
13  libhermes.so                    0x78040c4e2c        <unknown> + 515463990828
14  libhermes.so                    0x7804146b64        <unknown> + 515464522596
15  libhermes.so                    0x78040c5fb0        <unknown> + 515463995312
16  libhermes.so                    0x78040d49f4        <unknown> + 515464055284
17  libhermes.so                    0x78040d7e38        <unknown> + 515464068664
18  libhermes.so                    0x78040d5940        <unknown> + 515464059200
19  libhermes.so                    0x78040c5b10        <unknown> + 515463994128
20  libhermes.so                    0x78040b610c        facebook::hermes::HermesRuntimeImpl::call
21  libhermes-executor-release.so   0x7811a7d338        facebook::jsi::Function::call<T>
22  libhermes-executor-release.so   0x7811a7d194        <unknown> + 515692286356
23  libhermes-executor-release.so   0x7811a775e8        std::__ndk1::__invoke_void_return_wrapper<T>::__call<T>
24  libhermes-executor-release.so   0x7811a7a038        facebook::react::JSIExecutor::callFunction
25  libreactnativejni.so            0x77f3e61d5c        <unknown> + 515193052508
26  libreactnativejni.so            0x77f3e63354        <unknown> + 515193058132
27  libreactnativejni.so            0x77f3e29d4c        <unknown> + 515192823116
28  libreactnativejni.so            0x77f3e1aa74        facebook::jni::detail::MethodWrapper<T>::dispatch
29  libreactnativejni.so            0x77f3e1a9f0        facebook::jni::detail::FunctionWrapper<T>::call
30  base.odex                       0x782e7d8780        <unknown> + 516176054144



EOF

If you need more information, just let me know...
Thanks in advance!

@jan-auer
Copy link
Member

@tobi512 All of these stack traces occur in libraries like Hermes, or even in system libraries. What makes you conclude that these are being triggered by the Sentry SDK?

Additionally, these crashes should be reported to your Sentry project. Provided you have uploaded symbols, you'll be able to see fully symbolicated crash reports.

@marandaneto
Copy link
Contributor

@tobi512 what's the version of your @sentry_react-native dep? are you integrating any other Sentry SDK other than @sentry_react-native manually? just trying to understand your setup.

@tobi512
Copy link
Author

tobi512 commented Jan 26, 2021

@jan-auer What I forgot to say is that we updated sentry-android from v3.1.3 to v3.2.0 and @sentry/react-native from v2.0.2 to v2.1.0 with the release that has exploding Segfault crashes, so this is probably related to some changes in the implementation.
Additionally, we don't see the high number of crashes in our Play Console, which normally correlates pretty good.

@Swatinem
Copy link
Member

@marandaneto when did we add scope sync?

@tobi512 Indeed the recently discovered scope sync bug can lead to crashes when you do a lot of scope/breadcrumb changes concurrently. I don’t think that android will hit that frequently.

Also yes, the stacks look a lot like foreign libraries. And they are being reported by sentry native.

@tobi512
Copy link
Author

tobi512 commented Jan 26, 2021

@marandaneto @sentry/react-native is on v2.1.0 right now (we upgraded from v2.0.2). We only integrate the two SDKs (@sentry/react-native for RN and sentry-android for Android) manually, nothing else.

@marandaneto
Copy link
Contributor

@tobi512 the thing is, @sentry/react-native ships sentry-android underneath as a transitive dependency, and versions should be compatible, not telling this is the root of the issue, but it's something to look at, you should not upgrade sentry-android alone, version should be managed by @sentry/react-native.

@Swatinem scope's sync is available since 3.0.0 but I bet it's not that as they're using Sentry Android 3.1.x already and I'm not even sure RN offers an option to enable that, it's opt-in by default, @jennmueng is it even possible? Scope's sync from RN to Java to Native?

There are no stack traces from Sentry itself, why do you think it's Sentry's issue @tobi512 ? based on what? Have you upgraded the RN engine itself or any other lib other than Sentry in your new release?

@tobi512
Copy link
Author

tobi512 commented Jan 26, 2021

Thanks everybody for the quick and helpful feedback, always a pleasure to work with you!

@marandaneto We know about the possible problems when using both SDKs, but we were running pretty good with it for quite a while now.

There was a similar issue couple months ago with a high ANR rate that was caused by a changed implementation in the Sentry SDK (getsentry/sentry-android#406) so somehow I saw the same effects here (especially the spike in events after an update of the SDK). I might be wrong, it was just the first thing that came to my mind when talking about the problem with my colleagues.

We did not change any major libraries as far as I know (e.g. RN/Hermes, is it even possible to change system libs like ART? I guess not, right?) with the new release, which is another indication for the issue not being on our side...

Bildschirmfoto 2021-01-26 um 16 01 33

@marandaneto
Copy link
Contributor

thanks for the quick answer @tobi512

System libs are not possible to upgrade unless users upgraded the OS version, which is unlikely the case based on the number of affected users.

True with the ANR thing, but I believe any severe bug would cause an increase of events, unfortunately only that is hard to work with.

If you did bump the RN/Hermes engine, it might be interesting looking at their repo and reported bugs.

Are you uploading the debug symbols of your native libs?
https://docs.sentry.io/platforms/android/proguard/#gradle-configuration

remember to set uploadNativeSymbols=true, this would help with the symbolication of the <unknown> frames, without that, not sure how we can help.

I'd also consider keeping Sentry RN and Android SDK versions aligned, bumping versions yourself are not tested by us, if you decide to do it, be sure to test yourself.

@jennmueng
Copy link
Member

@marandaneto Yes it actually is possible to sync scope from RN -> Java -> Native if they set enableNdkScopeSync = true in the RN options since 2.0.0. However I agree that it's probably not that as they've been on 2.0.2 on RN alongside Android 3.1.3.

@Swatinem
Copy link
Member

Just a hunch here:

It is in theory possible to protect memory pages and have accesses trigger a segfault on purpose, for instrumentation purposes.
Something like is described here on the very bottom of the page: https://blog.libtorrent.org/2013/12/memory-cache-optimizations/

Not sure if hermes started to do something like that, on purpose?

Anyhow, these signal handlers can be made to work like a linked list that you can push things to (popping is more difficult). Sentry installs its signal handler, and forwards signals to the next one, which usually is the system that finally crashes the program. But maybe those signals are forwarded to some mechanism in hermes that expects them and just goes about its business as usual.

Like I said, this is just pure speculation on my part.

@tobi512
Copy link
Author

tobi512 commented Jan 28, 2021

Hi guys,
was quite busy at work yesterday, sorry for the late answer.

@marandaneto We'll set uploadNativeSymbols to true for the next release, thanks for the tip! Additionally, I'll also have a look at their reported bugs, seen a few similar ones, but all of them are really old and therefore not really related here imho.

Our last RN update (which might have included a Hermes update) was 4 months ago with just a bugfix version bump from 0.63.1 to 0.63.2 and we did a couple of releases in the meantime with no spikes in crashes, so I'm not sure how this could be related. @Swatinem

We are now at almost 100k events and still pretty lost, when looking through the reports by hand, the Hermes crashes are maybe 50% (btw. is there a way to filter how many in a grouped crash include e.g. "hermes"?). The rest are completely different libs, which still makes me think that this can't be caused by RN/Hermes, no?!

OS Version: Android 8.0.0 (PRA-LX1 8.0.0.407(C432))
Report Version: 104

Exception Type: Unknown (SIGSEGV)

Application Specific Information:
Segfault

Thread 0 Crashed:
0   libsentry.so                    0x75cf0e1120        sentry__unwind_stack_libunwindstack
1   libsentry.so                    0x75cf0e0c80        <unknown> + 505984978048
2   libsentry.so                    0x75cf0e0a30        <unknown> + 505984977456
3   libsentry.so                    0x75cf0e0ecc        <unknown> + 505984978636
4   libsentry.so                    0x75cf0e0a30        <unknown> + 505984977456
5   libsentry.so                    0x75cf0e0ecc        <unknown> + 505984978636
6   <unknown>                       0x75ef400140        <unknown>

EOF

Do you plan to release a version of sentry-java that includes sentry-native in v0.4.5 or higher with that possible fix for the Segfault crashes soon? 3.2.1 is still the latest version (released 9 days ago).

@marandaneto
Copy link
Contributor

@tobi512 be sure to check this out https://docs.sentry.io/platforms/android/usage/advanced-usage/#integrating-the-ndk

android:extractNativeLibs and/or android.bundle.enableUncompressedNativeLibs flags are really important for full symbolication of native events.

if you haven't enabled the enableNdkScopeSync flag, the fix is probably not gonna be useful.

a new RN SDK version is gonna be released either today or tomorrow, which includes the latest Android SDK as well (and the latest sentry-native)

@tobi512
Copy link
Author

tobi512 commented Jan 28, 2021

Quick correction from our side:

I got the old versions of the SDKs wrong (versions got updated multiple times inside an MR and I only looked at the last diff, shame on me). We upgraded from the following versions, don't know if that makes a difference and sorry for the confusion:

sentry-react-native from 1.4.4 to 2.1.0
sentry-java from 2.1.6 to 3.2.0

So we did two major version bumps, any ideas what change could explain our spike in events?

@marandaneto Regarding the compatibility between the two SDKs again: we always look into sentry-react-native and use the matching version of sentry-java, so that shouldn't be a problem. Also, what would the minimal setup for native crash symbolication look like?
We use AGP 4.1.2 and ship as Android App Bundle, so after reading the docs, we would need to set extractNativeLibs to true and android.bundle.enableUncompressedNativeLibs to false, correct? (beyond uploadNativeSymbols true of course)

@marandaneto
Copy link
Contributor

@marandaneto Regarding the compatibility between the two SDKs again: we always look into sentry-react-native and use the matching version of sentry-java, so that shouldn't be a problem. Also, what would the minimal setup for native crash symbolication look like?
We use AGP 4.1.2 and ship as Android App Bundle, so after reading the docs, we would need to set extractNativeLibs to true and android.bundle.enableUncompressedNativeLibs to false, correct? (beyond uploadNativeSymbols true of course)

I don't follow why you are managing the version yourself then if you are at the end using the very same version? I'd just let sentry-react-native taking care of it.

only what I've meant before, enabling uploadNativeSymbols (Sentry Gradle Plugin) so debug symbols will be uploaded automatically and checking extractNativeLibs and android.bundle.enableUncompressedNativeLibs flags, that's all you need to get native events symbolicated.

@tobi512
Copy link
Author

tobi512 commented Feb 11, 2021

Hi everybody,
quick update here:

We did some changes under the hood (enabled native symbolication and removed the initialization of sentry-java to leave the whole handling to sentry-react-native) and published a new release in our beta group.
So far, we don't see any strange native crashes, we'll slowly take the release to the public channel in the coming days and hope it stays like that.

@marandaneto Our special handling with both SDKs was due to the fact, that we didn't want to miss out on crashes, that happen directly at startup of the app before React Native was initialized (and therefore sentry-react-native). However, we'll try to stick with the Google Play crash reporting for these kind of crashes now to see if that's enough for us...

I'll keep you guys updated if there are any news!

Cheers

@marandaneto
Copy link
Contributor

@tobi512 thanks for the report, keep us updated.

were you using the shouldInitializeNativeSdk flag? related to: getsentry/sentry-react-native#1259

the important bits are to keep versions in sync.

@tobi512
Copy link
Author

tobi512 commented Mar 3, 2021

@marandaneto We did not use the shouldInitializeNativeSdk flag as far as I know.

Final update

Sorry for the late reply, but it took us quite some time to ship the release to all users. We are now fully live since some days and don't see any new Segfault crashes, so I'd say this is fixed. Still not 100% sure what the actual problem was, but since we are good again, I'll close this issue and definitely advise no one to use both SDKs at the same time.

Thanks a lot for the help everybody!

@tobi512 tobi512 closed this as completed Mar 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants