Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sentry 8.11.0 hangs Main thread on internal sync lock #3277

Closed
username0x0a opened this issue Sep 12, 2023 · 12 comments · Fixed by #3295
Closed

Sentry 8.11.0 hangs Main thread on internal sync lock #3277

username0x0a opened this issue Sep 12, 2023 · 12 comments · Fixed by #3295

Comments

@username0x0a
Copy link

Platform

iOS

Installed

Swift Package Manager

Version

8.11.0

Steps to Reproduce

  1. Integrate Sentry
  2. Launch the app
  3. Wait until the hang is triggered

Expected Result

App not hanging

Actual Result

App hanging. We've only discovered after…finding an increasing number of WatchdogTermination issues in Sentry soon after a release. :crazy-laugh: 🤦‍♂️

This is caused by a re-entry into the @sychrnoized(...) lock in -[SentryDependencyContainer uiDeviceWrapper]:

#if TARGET_OS_IOS
- (SentryUIDeviceWrapper *)uiDeviceWrapper
{
    if (_uiDeviceWrapper == nil) {
        @synchronized(sentryDependencyContainerLock) {
            if (_uiDeviceWrapper == nil) {
                _uiDeviceWrapper = [[SentryUIDeviceWrapper alloc] init];
            }
        }
    }
    return _uiDeviceWrapper;
}
#endif // TARGET_OS_IOS

which has been introduced in 8.11.0 update & has a potential to deadlock if re-entries around the @synchronized(sentryDependencyContainerLock) aren't sanitised properly.

“Luckily”, this also involves Main thread at some point so that it causes the issue to become visible sooner or later (app freezes dead); I can't even imagine the catastrophe if this'd silently happen purely on non-Main threads.

Here's a stack trace of colliding threads locking simultaneously:

Thread 1 Queue : com.apple.main-thread (serial)
#0	0x00000001d9ce77dc in __ulock_wait ()
#1	0x00000001faacefb0 in _os_unfair_lock_lock_slow ()
#2	0x0000000193dbf84c in objc_sync_enter ()
#3	0x0000000106fd80c4 in -[SentryDependencyContainer uiDeviceWrapper] ()
#4	0x0000000106f9c2a8 in -[SentryExtraContextProvider getExtraDeviceContext] ()
#5	0x0000000106f9c0bc in -[SentryExtraContextProvider getExtraContext] ()
#6	0x0000000106f7e5d8 in -[SentryClient applyExtraDeviceContextToEvent:] ()
#7	0x0000000106f7d640 in -[SentryClient prepareEvent:withScope:alwaysAttachStacktrace:isCrashEvent:] ()
#8	0x0000000106f7c2e4 in -[SentryClient sendEvent:withScope:alwaysAttachStacktrace:isCrashEvent:additionalEnvelopeItems:] ()
#9	0x0000000106f9ed94 in -[SentryHub captureEvent:withScope:additionalEnvelopeItems:] ()
#10	0x0000000106fc97fc in +[SentrySDK captureEvent:withScope:] ()
#11	0x0000000106fc9688 in +[SentrySDK captureEvent:] ()
#12	0x0000000106f996bc in -[SentryNetworkTracker captureFailedRequests:] ()
#13	0x0000000106f98adc in -[SentryNetworkTracker urlSessionTask:setState:] ()
#14	0x0000000106f7aea4 in __57+[SentryNetworkTrackingIntegration swizzleURLSessionTask]_block_invoke_2.29 ()
#15	0x000000010ee7c520 in _dispatch_call_block_and_release ()
#16	0x000000010ee7e038 in _dispatch_client_callout ()
#17	0x000000010ee8e798 in _dispatch_main_queue_drain ()
#18	0x000000010ee8e2dc in _dispatch_main_queue_callback_4CF ()
#19	0x000000019adbfc28 in __CFRUNLOOP_IS_SERVICING_THE_MAIN_DISPATCH_QUEUE__ ()
#20	0x000000019ada1560 in __CFRunLoopRun ()
#21	0x000000019ada63ec in CFRunLoopRunSpecific ()
#22	0x00000001d62bc35c in GSEventRunModal ()
#23	0x000000019d132f58 in -[UIApplication _run] ()
#24	0x000000019d132bbc in UIApplicationMain ()
#25	0x0000000104d6dfdc in main at /Volumes/Development/kiwi-ios/App/App/Sources/AppDelegate.swift:11
#26	0x00000001ba2d8dec in start ()
Enqueued from com.apple.NSURLSession-work (Thread 18) Queue : com.apple.NSURLSession-work (serial)
...
Thread 18 Queue : com.apple.NSURLSession-work (serial)
#0	0x00000001d9ce77dc in __ulock_wait ()
#1	0x000000010ee7ebb8 in _dlock_wait ()
#2	0x000000010ee7e96c in _dispatch_thread_event_wait_slow ()
#3	0x000000010ee8fbe8 in __DISPATCH_WAIT_FOR_QUEUE__ ()
#4	0x000000010ee8f520 in _dispatch_sync_f_slow ()
#5	0x0000000106f78940 in -[SentryDispatchQueueWrapper dispatchSyncOnMainQueue:] ()
#6	0x0000000106fa4408 in -[SentryUIDeviceWrapper init] ()
#7	0x0000000106fd80dc in -[SentryDependencyContainer uiDeviceWrapper] ()
#8	0x0000000106f9c2a8 in -[SentryExtraContextProvider getExtraDeviceContext] ()
#9	0x0000000106f9c0bc in -[SentryExtraContextProvider getExtraContext] ()
#10	0x0000000106f7e5d8 in -[SentryClient applyExtraDeviceContextToEvent:] ()
#11	0x0000000106f7d640 in -[SentryClient prepareEvent:withScope:alwaysAttachStacktrace:isCrashEvent:] ()
#12	0x0000000106f7c2e4 in -[SentryClient sendEvent:withScope:alwaysAttachStacktrace:isCrashEvent:additionalEnvelopeItems:] ()
#13	0x0000000106f9ed94 in -[SentryHub captureEvent:withScope:additionalEnvelopeItems:] ()
#14	0x0000000106fc97fc in +[SentrySDK captureEvent:withScope:] ()
#15	0x0000000106fc9688 in +[SentrySDK captureEvent:] ()
#16	0x0000000106f996bc in -[SentryNetworkTracker captureFailedRequests:] ()
#17	0x0000000106f98adc in -[SentryNetworkTracker urlSessionTask:setState:] ()
#18	0x0000000106f7aea4 in __57+[SentryNetworkTrackingIntegration swizzleURLSessionTask]_block_invoke_2.29 ()
#19	0x000000019bd45168 in ___lldb_unnamed_symbol2753 ()
#20	0x000000010ee7c520 in _dispatch_call_block_and_release ()
#21	0x000000010ee7e038 in _dispatch_client_callout ()
#22	0x000000010ee860b0 in _dispatch_lane_serial_drain ()
#23	0x000000010ee86e28 in _dispatch_lane_invoke ()
#24	0x000000010ee93c74 in _dispatch_workloop_worker_thread ()
#25	0x00000001fab65ddc in _pthread_wqthread ()
Enqueued from com.apple.main-thread (Thread 1) Queue : com.apple.main-thread (serial)
...

Are you willing to submit a PR?

No response

@scogeo
Copy link

scogeo commented Sep 14, 2023

While, I'm not seeing this exact same issue. I am experiencing some race conditions when starting and finishing spans that are causing memory corruption after upgrading to 8.11.0. The issues all go away when downgrading to 8.10.0. I don't have time to create an isolated use case at the moment, but it seems there are multiple thread related issues that have been introduced in the 8.11.0 release.

@tclyhaiqing-xu
Copy link

I have the same bug. closing the file io trace fix this issue

@armcknight
Copy link
Member

Thanks for the reports and info everyone, we're investigating.

@armcknight
Copy link
Member

So far I haven't been able to reproduce this. @username0x0a , can you tell me if this is in a SwiftUI app or uses Swift async? Are you starting the Sentry SDK from a main actor context or otherwise? Any additional info you can provide would be helpful!

@scogeo
Copy link

scogeo commented Sep 16, 2023

@armcknight I will try and create a reproducible test case for the issue I am seeing which may or may not be related to this bug report. In my case, it's a SwiftUI app and sentry is being invoked from a Swift async context and not the main thread or @MainActor. In particular, the call to finish on a sentry transaction seems to create some unwanted side effects downstream in my app.

Removing the finish call in the async context or reverting to 8.10.0 resolves the issue for me.

@getsantry getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 Sep 16, 2023
@username0x0a
Copy link
Author

username0x0a commented Sep 18, 2023

@armcknight We're initialising Sentry as part of our Dependency container, so obviously pretty early in the app init phase. Main thread (@MainActor-checked), mixed SwiftUI/UIKit codebase, async in use, too.

The situation is quite simple, tho:

  • 1️⃣ something triggers -[SentryNetworkTracker captureFailedRequests:] on non-main thread
    • I'd guess any URLSessionTask living around changing its state, triggering this swizzle of yours
  • 2️⃣ that wraps down to -[SentryExtraContentProvider getExtraDeviceContext]
  • 3️⃣ that calls SentryUIDeviceWrapper *deviceWrapper = SentryDependencyContainer.sharedInstance.uiDeviceWrapper;
  • 4️⃣ that enters the @synchronized lock in - (SentryUIDeviceWrapper *)uiDeviceWrapper
  • 5️⃣ that calls dispatch_sync(...) [!!!] on Main queue
  • 6️⃣ BUT right before that, something (any network task around) triggered the same swizzled flow ⬆️ on Main thread, too
  • 7️⃣ Main thread is stuck at 4️⃣ as the @synchronized lock is already locked by the ⬆️ thread
  • 8️⃣ the ⬆️ thread is locked as well as dispatch_sync cannot return 💥

TL;DR: some inits are internally very complicated & it only calls for a catastrophe like this on different places. 🫠

Also, the above swizzle of URLSessionTask.setState can be triggered by pretty much ANY piece of code, internal or 3rd-party, when a network task is triggered, on any thread depending on a call site. Basically even before any of our or your code is triggered as many SDKs can do swizzles, early +load overloads & perform anything (including sending network tasks) freely. So I guess you definitely need to unroll the issue around that newly introduced lock.

@armcknight
Copy link
Member

Thanks for the very informative replies!

@username0x0a, I also looked at that -[SentryUIDeviceWrapper init] method's synchronous dispatch with suspicion. My first instinct is to move that to a separate method that can be called outside the codepaths that init those objects and are contending for the lock in SentryDependencyContainer.

As for the data race in span start/finish, that is something we do have some tests for but so far unable to reproduce. I'm standing by to see if you are able to put something together @scogeo , or to try more experiments if you're unable to.

@username0x0a
Copy link
Author

username0x0a commented Sep 18, 2023

Happy to help. 👍

I guess this code piece:

- (SentryUIDeviceWrapper *)uiDeviceWrapper
{
    if (_uiDeviceWrapper == nil) {
        @synchronized(sentryDependencyContainerLock) {
            if (_uiDeviceWrapper == nil) {
                _uiDeviceWrapper = [[SentryUIDeviceWrapper alloc] init];
                //                                         ^^^^^^^^^^^
            }
        }
    }
    return _uiDeviceWrapper;
}

which immediately triggers:

    if (self = [super init]) {
        //                           |---------- another `@synchronized` call
        //                           v
        [SentryDependencyContainer.sharedInstance.dispatchQueueWrapper dispatchSyncOnMainQueue:^{
            // Needed to read the device orientation on demand
            if (!UIDevice.currentDevice.isGeneratingDeviceOrientationNotifications) {
                self.cleanupDeviceOrientationNotifications = YES;
                [UIDevice.currentDevice beginGeneratingDeviceOrientationNotifications];
            }

from within the lock which re-enters the lock again 😅 talks for itself.

I'd say as far as -[SentryUIDeviceWrapper init] is only called once in the SentryDependencyContainer, it's safe to get rid of that dispatchSyncOnMainQueue: wrap completely, but the complexity in that whole flow from a simple call for additional device attributes is just atrocious. 😓 Maybe SentryUIDeviceWrapper could have some lock of its own instead of relying on Main thread as a sync mechanism, but that could also end up with re-entrance in the future. 😄 Locks are…hard, when not thought through very carefully. When there's a chance of inversion & even something external (outside of the instance boundary) is called, it's a trouble waiting to happen. 💥

Of course, one another thing would be to have dependencies properly aligned & not use any shared instance ☝️ as that would prevent any inversion in the resolution flow from happening.

@getsantry getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 Sep 18, 2023
@scogeo
Copy link

scogeo commented Sep 18, 2023

As for the data race in span start/finish, that is something we do have some tests for but so far unable to reproduce. I'm standing by to see if you are able to put something together @scogeo , or to try more experiments if you're unable to.

@armcknight I may have some time to look at this towards the end of the week. If you find anything in the interim please let me know.

@gabors
Copy link
Contributor

gabors commented Sep 20, 2023

@brustolin

You have closed #3293 but that may be a separate issue, you may want to re-open and at least investigate.

When I run the iOS target of my app, which shares the same code base as the macCatalyst app I do not get the app hanging on startup. App starts up just fine on iOS 17 sim with Sentry SDK 8.12

@github-project-automation github-project-automation bot moved this from Needs Discussion to Done in Mobile & Cross Platform SDK Sep 25, 2023
@scogeo
Copy link

scogeo commented Sep 26, 2023

@armcknight FYI, Updating to 8.13.0 resolved the issue I was seeing with 8.12.0.

@armcknight
Copy link
Member

That's great news @scogeo ! 🎉 Thanks to @brustolin for writing the fix 🙏🏻 🥔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Archived in project
Development

Successfully merging a pull request may close this issue.

5 participants