Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'Database is busy' #34

Closed
nl opened this issue Apr 9, 2013 · 20 comments
Closed

'Database is busy' #34

nl opened this issue Apr 9, 2013 · 20 comments
Assignees
Milestone

Comments

@nl
Copy link

nl commented Apr 9, 2013

It seems the underlying FMDB database is sometimes accessed from multiple threads at the 'same time'. My app isn't using any concurrency, dispatch queues or threads, yet I'm running into these issues when using live queries during a pull from a remote sync gateway. Not sure if it's relevant, but most of the documents being pulled contain attachments.

The log below show two different threads are trying to use the FMDatabase.

2013-04-09 07:12:13.414 Puller[58132:c07] -[FMDatabase executeUpdate:error:withArgumentsInArray:orVAList:]:740 Database busy (/Volumes/Home/Library/Application Support/iPhone Simulator/6.1/Applications/FDD4EE76-786A-4B29-8FF2-0ED674F2B847/Library/Application Support/CouchbaseLite/user-b56e5c11a783b8a0078d91d25b00588f.touchdb)
2013-04-09 07:12:13.415 Puller[58132:c07] Database busy
07:12:13.416| �WARNING*** : CouchbaseLite: Failed to rebuild view 'files/byIndex': 590
07:12:13.416| �WARNING*** : Failed to update view index: 590
2013-04-09 07:12:13.417 Puller[58132:4107] -[FMDatabase executeUpdate:error:withArgumentsInArray:orVAList:]:740 Database busy (/Volumes/Home/Library/Application Support/iPhone Simulator/6.1/Applications/FDD4EE76-786A-4B29-8FF2-0ED674F2B847/Library/Application Support/CouchbaseLite/user-b56e5c11a783b8a0078d91d25b00588f.touchdb)
2013-04-09 07:12:13.417 Puller[58132:4107] Database busy
07:12:13.418‖ �WARNING*** : CouchbaseLite: Failed to rebuild view 'files/byIndex': 590
07:12:13.418‖ �WARNING*** : Failed to update view index: 590
07:12:13.424| �WARNING*** : <CBLLiveQuery: 0x9379870>: Error updating rows: 590
@nl
Copy link
Author

nl commented Apr 12, 2013

It might be relevant that I'm using multiple pull CBLReplications to pull data from the same remote database, to the same CBLDatabase instance, with different query_params for the sync_gateway/channels filter.

@rismay
Copy link

rismay commented Apr 12, 2013

I think you need to create multiple CBLManager instances and create pull replications from each of those instances with the different query_params.

@snej
Copy link
Contributor

snej commented Apr 15, 2013

It seems the underlying FMDB database is sometimes accessed from multiple threads at the 'same time'.

That should never happen — FMDB doesn't support multithreaded access. But the warning for this is different, reading "The FMDatabase %@ is currently in use." It will also trigger an assertion failure in a debug build.

The error you're getting looks like the database file is being used by multiple threads (with one FMDB connection per thread) and one of the connections is leaving the database locked too long. This is possible, especially since looking at the retry logic in FMDB it'll give up after 200ms.

Try this: edit the following line in CBLDatabase+Internal.m:
_fmdb.busyRetryTimeout = 10;
You can change the value to 0, which will disable busy timeouts entirely. It leaves open the possibility that the thread will lock forever if the database never unlocks, but the only reason for the database to be stuck locked would be if another thread were already stuck in the middle of a transaction, in which case the app is in bad shape anyway.

Let me know if that fixes the problem, and if you see any unexpectedly long pauses.

@nl
Copy link
Author

nl commented Apr 15, 2013

I changed the value to zero. In lieu of having the warning messages, I know have FMDB lock the entire app by sleeping the main thread indefinitely.

Here's a backtrace after pausing in lldb when the app locks up.


(lldb) bt
* thread #1: tid = 0x1c03, 0x908f1c72 libsystem_kernel.dylib`__semwait_signal + 10, stop reason = signal SIGSTOP
    frame #0: 0x908f1c72 libsystem_kernel.dylib`__semwait_signal + 10
    frame #1: 0x02343770 libsystem_sim_c.dylib`nanosleep + 196
    frame #2: 0x02343632 libsystem_sim_c.dylib`usleep + 60
    frame #3: 0x000c1ec6 SPApp`-[FMDatabase executeUpdate:error:withArgumentsInArray:orVAList:](self=0x07c84070, arrayArgs=0xbfff0100, args=0xbfffdff8) + 1271 at FMDatabase.m:737
    frame #4: 0x000c20f9 SPApp`-[FMDatabase executeUpdate:](self=0x07c84070, _cmd=0x000db150, sql=0x00127674) + 66 at FMDatabase.m:801
    frame #5: 0x00079c9b SPApp`-[CBLView(self=0x10b42290) updateIndex] + 768 at CBLView+Internal.m:145
    frame #6: 0x000b0704 SPApp`-[CBLDatabase(, _cmd=0x000dd00a, viewName=0x00125a14, options=(null)) queryViewNamed:options:lastSequence:status:] + 163 at CBLQuery.m:561
    frame #7: 0x000ae026 SPApp`-[CBLQuery rows](self=0x10b415f0, _cmd=0x012bafaa) + 233 at CBLQuery.m:142

@snej
Copy link
Contributor

snej commented Apr 15, 2013

Could you get a dump of all threads? (I'm not sure what the LLDB command for that is...) There's probably some other thread that's locked up inside an FMDB call. Or if not it's really bad news, because that means something previously managed to return while leaving a transaction open, which should be impossible.

@nl
Copy link
Author

nl commented Apr 15, 2013

There doesn't seem to be anything else happening on other threads. Here's the full bt.


(lldb) bt all
* thread #1: tid = 0x2403, 0x39c2f6a4 libsystem_kernel.dylib`__semwait_signal + 24, stop reason = signal SIGSTOP
    frame #0: 0x39c2f6a4 libsystem_kernel.dylib`__semwait_signal + 24
    frame #1: 0x39b8d3d2 libsystem_c.dylib`nanosleep + 142
    frame #2: 0x39b8d33e libsystem_c.dylib`usleep + 50
    frame #3: 0x0014d00c SPApp`-[FMDatabase executeUpdate:error:withArgumentsInArray:orVAList:](self=0x200ba940, _cmd=0x00165728, sql=0x00191f74, arrayArgs=0x00000000) + 1012 at FMDatabase.m:737
    frame #4: 0x0014d1d8 SPApp`-[FMDatabase executeUpdate:]() + 40 at FMDatabase.m:801
    frame #5: 0x00114424 SPApp`-[CBLView(Internal) updateIndex] + 656 at CBLView+Internal.m:145
    frame #6: 0x0013f2ec SPApp`-[CBLDatabase(, options=(null)) queryViewNamed:options:lastSequence:status:] + 176 at CBLQuery.m:561
    frame #7: 0x0013d46c SPApp`-[CBLQuery rows](self=0x2004bed0) + 240 at CBLQuery.m:142
    frame #8: 0x000db6fa SPApp`-[SPContentDatabase acceptedShares](self=0x1edf2700, _cmd=0x0015c979) + 134 at SPContentDatabase.m:314
    frame #9: 0x000da160 SPApp`-[SPContentDatabase currentChannels](self=0x1edf2700, _cmd=0x0015c073) + 92 at SPContentDatabase.m:241
    frame #10: 0x000d30ec SPApp`-[CSDatabase newSharedPull](self=0x1edf2700, _cmd=0x0015c198) + 64 at CSDatabase.m:301
    frame #11: 0x000d4266 SPApp`__30-[CSDatabase startReplicating]_block_invoke_2194(.block_descriptor=0x1ede5610) + 78 at CSDatabase.m:394
    frame #12: 0x000f026c SPApp`__39-[CSReplication noteFirstSyncCompleted]_block_invoke(.block_descriptor=0x001957c8, block=0x1ede5610, stop=0x2fd51e18) + 44 at CSReplication.m:228
    frame #13: 0x318a0496 CoreFoundation`__NSSetEnumerate + 682
    frame #14: 0x000f01e0 SPApp`-[CSReplication noteFirstSyncCompleted](self=0x1ed2db50, _cmd=0x0015dffd) + 100 at CSReplication.m:227
    frame #15: 0x000eff7c SPApp`__38-[CSReplication noteProgressDidChange]_block_invoke() + 44 at CSReplication.m:184
    frame #16: 0x39b4911e libdispatch.dylib`_dispatch_call_block_and_release + 10
    frame #17: 0x39b484b6 libdispatch.dylib`_dispatch_client_callout + 22
    frame #18: 0x39b4d1bc libdispatch.dylib`_dispatch_main_queue_callback_4CF$VARIANT$mp + 224
    frame #19: 0x3186ef3a CoreFoundation`__CFRunLoopRun + 1290
    frame #20: 0x317e1ebc CoreFoundation`CFRunLoopRunSpecific + 356
    frame #21: 0x317e1d48 CoreFoundation`CFRunLoopRunInMode + 104
    frame #22: 0x353962ea GraphicsServices`GSEventRunModal + 74
    frame #23: 0x336f7300 UIKit`UIApplicationMain + 1120
    frame #24: 0x000af664 SPApp`main(argc=1, argv=0x2fd52d10) + 116 at main.m:13
    frame #25: 0x39b68b20 libdyld.dylib`start + 4

  thread #2: tid = 0x2703, 0x39c1f648 libsystem_kernel.dylib`kevent64 + 24
    frame #0: 0x39c1f648 libsystem_kernel.dylib`kevent64 + 24
    frame #1: 0x39b4f978 libdispatch.dylib`_dispatch_mgr_invoke + 796
    frame #2: 0x39b4f658 libdispatch.dylib`_dispatch_mgr_thread$VARIANT$mp + 36

  thread #5: tid = 0x2a03, 0x39c1eeb4 libsystem_kernel.dylib`mach_msg_trap + 20
    frame #0: 0x39c1eeb4 libsystem_kernel.dylib`mach_msg_trap + 20
    frame #1: 0x39c1f04c libsystem_kernel.dylib`mach_msg + 40
    frame #2: 0x31870044 CoreFoundation`__CFRunLoopServiceMachPort + 128
    frame #3: 0x3186eda2 CoreFoundation`__CFRunLoopRun + 882
    frame #4: 0x317e1ebc CoreFoundation`CFRunLoopRunSpecific + 356
    frame #5: 0x317e1d48 CoreFoundation`CFRunLoopRunInMode + 104
    frame #6: 0x377d1504 WebCore`RunWebThread(void*) + 444
    frame #7: 0x39b88310 libsystem_c.dylib`_pthread_start + 308

  thread #6: tid = 0x231f, 0x39c1eeb4 libsystem_kernel.dylib`mach_msg_trap + 20
    frame #0: 0x39c1eeb4 libsystem_kernel.dylib`mach_msg_trap + 20
    frame #1: 0x39c1f04c libsystem_kernel.dylib`mach_msg + 40
    frame #2: 0x31870044 CoreFoundation`__CFRunLoopServiceMachPort + 128
    frame #3: 0x3186eda2 CoreFoundation`__CFRunLoopRun + 882
    frame #4: 0x317e1ebc CoreFoundation`CFRunLoopRunSpecific + 356
    frame #5: 0x317e1d48 CoreFoundation`CFRunLoopRunInMode + 104
    frame #6: 0x32104f96 Foundation`-[NSRunLoop(NSRunLoop) runMode:beforeDate:] + 254
    frame #7: 0x0011837e SPApp`-[CBL_Server runServerThread](self=0x1ed80470) + 310 at CBL_Server.m:116
    frame #8: 0x321b1e84 Foundation`__NSThread__main__ + 972
    frame #9: 0x39b88310 libsystem_c.dylib`_pthread_start + 308

  thread #13: tid = 0x310b, 0x39c1eeb4 libsystem_kernel.dylib`mach_msg_trap + 20
    frame #0: 0x39c1eeb4 libsystem_kernel.dylib`mach_msg_trap + 20
    frame #1: 0x39c1f04c libsystem_kernel.dylib`mach_msg + 40
    frame #2: 0x31870044 CoreFoundation`__CFRunLoopServiceMachPort + 128
    frame #3: 0x3186eda2 CoreFoundation`__CFRunLoopRun + 882
    frame #4: 0x317e1ebc CoreFoundation`CFRunLoopRunSpecific + 356
    frame #5: 0x317e1d48 CoreFoundation`CFRunLoopRunInMode + 104
    frame #6: 0x3212e3d4 Foundation`+[NSURLConnection(Loader) _resourceLoadLoop:] + 308
    frame #7: 0x321b1e84 Foundation`__NSThread__main__ + 972
    frame #8: 0x39b88310 libsystem_c.dylib`_pthread_start + 308

  thread #14: tid = 0x3203, 0x39c2f594 libsystem_kernel.dylib`select$DARWIN_EXTSN + 20
    frame #0: 0x39c2f594 libsystem_kernel.dylib`select$DARWIN_EXTSN + 20
    frame #1: 0x318741f6 CoreFoundation`__CFSocketManager + 678
    frame #2: 0x39b88310 libsystem_c.dylib`_pthread_start + 308

  thread #15: tid = 0x3303, 0x39c2fd98 libsystem_kernel.dylib`__workq_kernreturn + 8
    frame #0: 0x39c2fd98 libsystem_kernel.dylib`__workq_kernreturn + 8
    frame #1: 0x39b7dcfa libsystem_c.dylib`_pthread_workq_return + 18
    frame #2: 0x39b7da16 libsystem_c.dylib`_pthread_wqthread + 366

@snej
Copy link
Contributor

snej commented Apr 15, 2013

You've got a CBL_Server running on thread 6, so there must have been some background activity. Either you've started the HTTP listener, or you've called -asyncTellDatabaseNamed:to:. (If you don't believe you've done either, set a breakpoint at -[CBLManager backgroundServer] and see how it gets hit.)

@nl
Copy link
Author

nl commented Apr 15, 2013

I don't believe I've done either :-)
Apparently, it gets hit in CBL(Live)Query – which i do use a lot.


* thread #1: tid = 0x1c03, 0x000b2087 SPApp`-[CBLManager backgroundServer](self=0x000b21b2, _cmd=0x000b2084) + 3 at CBLManager.m:196, stop reason = breakpoint 1.1
    frame #0: 0x000b2087 SPApp`-[CBLManager backgroundServer](self=0x000b21b2, _cmd=0x000b2084) + 3 at CBLManager.m:196
    frame #1: 0x000b21e3 SPApp`-[CBLManager asyncTellDatabaseNamed:to:](self=0x07c46950, _cmd=0x000dbf12, dbName=0x07c27b80, block=0xbfffe010) + 63 at CBLManager.m:209
    frame #2: 0x000ad4ef SPApp`-[CBLQuery runAsync:](, _cmd=0x000dbf40, onComplete=0xbfffe0b8) + 506 at CBLQuery.m:155
    frame #3: 0x000ae2e1 SPApp`-[CBLLiveQuery update](self=0x07c5f330, _cmd=0x00533c55) + 103 at CBLQuery.m:265
    frame #4: 0x000ae020 SPApp`-[CBLLiveQuery start](, _cmd=0x020828dd) + 197 at CBLQuery.m:215
    frame #5: 0x000ae19d SPApp`-[CBLLiveQuery addObserver:forKeyPath:options:context:](self=0x07c5f330, _cmd=0x0197d2a1, observer=0x07c5f4c0, keyPath=0x00124724, options=NSKeyValueObservingOptionInitial, context=0x00000000) + 63 at CBLQuery.m:245

@snej
Copy link
Contributor

snej commented Apr 15, 2013

Oh right, sorry — I did that last month to optimize CBLLiveQuery. OK, so that narrows down what could be running on the other FMDB connection. Still can't figure out what could be wrong, though.

snej added a commit that referenced this issue Apr 15, 2013
@snej
Copy link
Contributor

snej commented Apr 15, 2013

Could you retry with the latest commit? It adds some assertions and warnings about transactions. I think it was possible for the transaction level to get confused, but only in cases where sqlite itself fails to commit or back out a transaction, and I don't think either is likely to have happened.

@nl
Copy link
Author

nl commented Apr 16, 2013

Ok, I tried again and I am getting the same messages as before.

@snej
Copy link
Contributor

snej commented Apr 16, 2013

/scratching my head/

Could you please turn on CBLDatabase logging (and turn everything else off) and email or post the logs?

This looks like a nasty problem, and I'm glad you've run into it before I moved replication to the background thread; hopefully it'll be easier to figure out in this more limited scenario.

@nl
Copy link
Author

nl commented Apr 16, 2013

Sure. I used the code below to enable logging and sent the logs to you by email.

    EnableLog(YES);
    EnableLogTo(CBLDatabase, YES);

@snej
Copy link
Contributor

snej commented Apr 16, 2013

You can enable logs by going into the Arguments tab of the Run item of the scheme editor and adding items

-Log YES
-LogCBLDatabase YES

Then you can check and uncheck them. Also, Cmd-Opt-R will open that editor before running the target, making it super easy to update the arguments.

@snej
Copy link
Contributor

snej commented Apr 16, 2013

Got the emailed logs, thanks. That's actually less detail than I wanted — I forgot to have you enable CBL_DatabaseVerbose too (that makes FMDB log the SQL queries.)

I think I know what's going on, though. This has to do with SQLite's "deferred transaction" feature. I'll have to go reread the docs, but IIRC, the thread may legitimately have to back out of its transaction and retry, if another thread is already holding a lock it needs. I ran into this once before; but I hadn't noticed that the "savepoint" mechanism TouchDB and CBL use is based on deferred transactions.

@snej
Copy link
Contributor

snej commented Apr 16, 2013

@nl
Copy link
Author

nl commented Apr 16, 2013

Good to know, thanks. It doesn't seem to have any devastating effect on data as far as I can tell.
Sometimes I'm really grateful CBL is abstracting all of this low level stuff away from us =)

@snej snej closed this as completed in 99fcca3 Apr 16, 2013
@snej
Copy link
Contributor

snej commented Apr 16, 2013

Whew — I just pushed a big commit that should cause CBL to handle db-busy errors correctly by backing off and retrying. There are a lot of code changes, but it passes all the unit tests. I don't have any code that reproduces your bug, though. Could you give this a try with your app?

If the SQLITE_BUSY error happens internally, you should see a log message like "Db busy, retrying transaction (#1)...". This will show up if any logging is enabled (-Log YES), you don't have to turn on CBLDatabase logging. There might be more than one of these if the other thread is taking a long time; but at some point it should succeed.

@nl
Copy link
Author

nl commented Apr 17, 2013

On 'Database busy' I now see '05:46:53.307| CBLDatabase[<0x88accf0>user-e1b1c275376b67ba7ce277e6560207cc]: Db busy, retrying transaction (#1)...' like you said, but I no longer have any view update failure messages!
You fixed it, thanks!

@snej
Copy link
Contributor

snej commented Apr 17, 2013

W00t! High-five!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants