Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

-[Replication_Tests test04_ReplicateAttachments] fails with SSL #1170

Closed
snej opened this issue Mar 15, 2016 · 13 comments
Closed

-[Replication_Tests test04_ReplicateAttachments] fails with SSL #1170

snej opened this issue Mar 15, 2016 · 13 comments

Comments

@snej
Copy link
Contributor

snej commented Mar 15, 2016

-[Replication_Tests test04_ReplicateAttachments] is failing on both the dev and master branches when run on the iOS simulator (iOS 9.2). Apparently this has been going on for a while but wasn't noticed; I can reproduce it on commit b2eee39, which was the head of the master branch on March 1. (The previous master commit has a build problem on iOS, and the one before that, a72c942, from Feb 18, works.)

The test failure looks like a deadlock in CFStream. The replicator thread is blocked writing to an output stream:

* thread #17: tid = 0x6d5d79, 0x0000000107a52de6 libsystem_kernel.dylib`__psynch_mutexwait + 10, name = 'CouchbaseLite'
  * frame #0: 0x0000000107a52de6 libsystem_kernel.dylib`__psynch_mutexwait + 10
    frame #1: 0x0000000107a17e4a libsystem_pthread.dylib`_pthread_mutex_lock_wait + 89
    frame #2: 0x000000010724fa54 CoreFoundation`boundPairWrite + 52
    frame #3: 0x000000010717d575 CoreFoundation`CFWriteStreamWrite + 437
    frame #4: 0x0000000102fc371d CBL Test`-[CBLMultiStreamWriter writeToOutput](self=0x000060c000bc5f00, _cmd="writeToOutput") + 429 at CBLMultiStreamWriter.m:254
    frame #5: 0x0000000102fc42d7 CBL Test`-[CBLMultiStreamWriter stream:handleEvent:](self=0x000060c000bc5f00, _cmd="stream:handleEvent:", stream=0x000060d0001467d0, event=NSStreamEventHasSpaceAvailable) + 1511 at CBLMultiStreamWriter.m:288
    frame #6: 0x00000001071dba44 CoreFoundation`_signalEventSync + 180
    frame #7: 0x000000010720fd4e CoreFoundation`_cfstream_shared_signalEventSync + 478
    frame #8: 0x000000010719da31 CoreFoundation`__CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__ + 17
    frame #9: 0x000000010719395c CoreFoundation`__CFRunLoopDoSources0 + 556
    frame #10: 0x0000000107192e13 CoreFoundation`__CFRunLoopRun + 867
    frame #11: 0x0000000107192828 CoreFoundation`CFRunLoopRunSpecific + 488
    frame #12: 0x00000001066772f1 Foundation`-[NSRunLoop(NSRunLoop) runMode:beforeDate:] + 267
    frame #13: 0x0000000102f02f70 CBL Test`-[CBL_RunLoopServer runServerThread](self=0x0000603000ac8d00, _cmd="runServerThread") + 1168 at CBL_Server.m:193
    frame #14: 0x0000000106714dfb Foundation`__NSThread__start__ + 1198
    frame #15: 0x0000000107a1799d libsystem_pthread.dylib`_pthread_body + 131
    frame #16: 0x0000000107a1791a libsystem_pthread.dylib`_pthread_start + 168
    frame #17: 0x0000000107a15351 libsystem_pthread.dylib`thread_start + 13

Meanwhile the CFNetwork thread is also blocked, apparently on the same mutex, which apparently belongs to the same stream:

* thread #9: tid = 0x6d53da, 0x0000000107a52de6 libsystem_kernel.dylib`__psynch_mutexwait + 10, name = 'com.apple.NSURLConnectionLoader'
  * frame #0: 0x0000000107a52de6 libsystem_kernel.dylib`__psynch_mutexwait + 10
    frame #1: 0x0000000107a17e4a libsystem_pthread.dylib`_pthread_mutex_lock_wait + 89
    frame #2: 0x000000010724f437 CoreFoundation`boundPairRead + 151
    frame #3: 0x00000001071ceaf5 CoreFoundation`CFReadStreamRead + 389
    frame #4: 0x0000000104cf6a04 CFNetwork`RequestBodyStreamProvider::readBodyStream(bool) + 308
    frame #5: 0x00000001071dba44 CoreFoundation`_signalEventSync + 180
    frame #6: 0x00000001071db97b CoreFoundation`_cfstream_solo_signalEventSync + 251
    frame #7: 0x00000001071db82c CoreFoundation`_CFStreamSignalEvent + 476
    frame #8: 0x000000010724f6a8 CoreFoundation`boundPairRead + 776
    frame #9: 0x00000001071ceaf5 CoreFoundation`CFReadStreamRead + 389
    frame #10: 0x0000000104cf6a04 CFNetwork`RequestBodyStreamProvider::readBodyStream(bool) + 308
    frame #11: 0x0000000104cf529c CFNetwork`___ZN25RequestBodyStreamProvider22scheduleReadBodyStreamEb_block_invoke + 22
    frame #12: 0x00000001076e44a7 libdispatch.dylib`_dispatch_client_callout + 8
    frame #13: 0x00000001076cb223 libdispatch.dylib`_dispatch_block_invoke + 408
    frame #14: 0x0000000104c091ac CFNetwork`RunloopBlockContext::_invoke_block(void const*, void*) + 24
    frame #15: 0x0000000107170ee4 CoreFoundation`CFArrayApplyFunction + 68
    frame #16: 0x0000000104c090a5 CFNetwork`RunloopBlockContext::perform() + 137
    frame #17: 0x0000000104c08f5e CFNetwork`MultiplexerSource::perform() + 282
    frame #18: 0x0000000104c08d80 CFNetwork`MultiplexerSource::_perform(void*) + 72
    frame #19: 0x000000010719da31 CoreFoundation`__CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__ + 17
    frame #20: 0x00000001071938d7 CoreFoundation`__CFRunLoopDoSources0 + 423
    frame #21: 0x0000000107192e13 CoreFoundation`__CFRunLoopRun + 867
    frame #22: 0x0000000107192828 CoreFoundation`CFRunLoopRunSpecific + 488
    frame #23: 0x0000000104c9a3c4 CFNetwork`+[NSURLConnection(Loader) _resourceLoadLoop:] + 412
    frame #24: 0x0000000106714dfb Foundation`__NSThread__start__ + 1198
    frame #25: 0x0000000107a1799d libsystem_pthread.dylib`_pthread_body + 131
    frame #26: 0x0000000107a1791a libsystem_pthread.dylib`_pthread_start + 168
    frame #27: 0x0000000107a15351 libsystem_pthread.dylib`thread_start + 13

I've never seen this happen on Mac OS. It seems to be a bit of a race condition on iOS, because if I turn on more logging it often goes away.

@snej
Copy link
Contributor Author

snej commented Mar 16, 2016

It's failing exactly the same way in the Mac unit tests too, now. This is very strange because I run the Mac unit tests all the time (since they're so easy to run), definitely before any commit. Moreover, I can't reproduce it on my home iMac which is running stock 10.11.3 (My work MBP is on the latest 10.11.4 beta.)

I'm very suspicious this is a regression in the OS, specifically CFNetwork. I've filed a bug report with Apple (rdar://25197186).

@snej
Copy link
Contributor Author

snej commented Mar 16, 2016

Ah: it's triggered by using HTTPS. I just committed a fix to dev (730eb28) to make the Mac unit tests respect ATS on 10.11, so that switched them to using all SSL. If I back out that change, so they don't use SSL, the test passes again.

@snej snej changed the title -[Replication_Tests test04_ReplicateAttachments] fails on iOS -[Replication_Tests test04_ReplicateAttachments] fails with SSL Mar 21, 2016
@snej
Copy link
Contributor Author

snej commented Mar 22, 2016

Apple added a comment to the bug report:

This code appears to be hanging in HTTP/2 code in CFNetwork. Does the test server use HTTP/2? Was the test server recently upgraded to use HTTP/2 or a newer version of HTTP/2? There were some changes in this area in 10.11.4.

I am indeed running a new build of Sync Gateway, built with Go 1.6 so it supports HTTP/2. In my earlier tests it seemed to be working, but it's definitely what caused this.

@pasin
Copy link
Contributor

pasin commented Jun 21, 2016

Once couchbase/sync_gateway#1888 is fixed for 1.3, we could defer this issue to post 1.3.

@snej
Copy link
Contributor Author

snej commented Oct 7, 2016

If this only happens with HTTP2, it's lower priority, since SG doesn't enable HTTP2 by default.
We may not be able to work around this in iOS 9, but we should test with iOS 10 to see if the same bug occurs.

@snej snej added the backlog label Oct 7, 2016
@djpongh djpongh removed the testing label Oct 18, 2016
@basememara
Copy link

Any progress on this? I'm experiencing this with iOS 10 and the latest 1.3.1 client and 1.3.1 sync gateway. It hangs the app on CBLMultiStreamWriter.writeToOutput... definitely a showstopper.

capturfiles_88

@basememara
Copy link

Disabling HTTPS completely made the issue go away, but of course we can't deploy a non-SSL SG. I tried disabling http2 on sync gateway using this config: unsupported": { "http2": { "enabled": false } }, but no luck, issue remains.

@pasin
Copy link
Contributor

pasin commented Nov 18, 2016

@sethrosetter Can you help running the functional test with SSL enabled? We may need to modify iOS LiteServ to be able to enable SSL.

PS: I don't see the issue when running unit tests.

@basememara
Copy link

What ended up resolving this was switching from an Amazon load balancer to a Nginx load balancer. My best guess is related to SSL-termination and HTTP/2.

@pasin pasin added ready and removed backlog labels Dec 21, 2016
@pasin pasin self-assigned this Dec 21, 2016
pasin added a commit to couchbaselabs/liteserv-ios that referenced this issue Dec 28, 2016
@pasin
Copy link
Contributor

pasin commented Dec 28, 2016

@sethrosetter I added SSL support to LiteServ-iOS. Let me know if that works.

@snej
Copy link
Contributor Author

snej commented Jan 19, 2017

From what @basememara said above, it sounds like this is still specific to HTTP/2 connections? I had thought that we worked around this by disabling HTTP/2 by default on SG, but I hadn't considered that intermediate gateways/proxies might serve HTTP/2.

I think we'll have to find a reproducible case to submit to Apple, so they can work out what's going wrong.

@pasin pasin assigned sethrosetter and unassigned pasin Jan 20, 2017
@djpongh
Copy link

djpongh commented Jan 25, 2017

@sethrosetter Any chance to look at this?

@djpongh djpongh removed this from the 1.4.0 milestone Mar 7, 2017
@djpongh djpongh added icebox and removed backlog labels Apr 10, 2017
@sethrosetter sethrosetter removed their assignment Aug 3, 2017
@djpongh djpongh added this to the 1.4.2 milestone Nov 29, 2017
@djpongh djpongh modified the milestones: 1.4.2, 1.4.x Dec 7, 2017
@jayahariv
Copy link
Contributor

Closing 1.x issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants