-
Notifications
You must be signed in to change notification settings - Fork 10
[experimental] rgw/sfs: use only one sqlite database connection #209
[experimental] rgw/sfs: use only one sqlite database connection #209
Conversation
📦 quay.io/s3gw/s3gw:pr-6002dde8ea5e72212a3253ded2e53afce53bc394-6169337814-1 https://quay.io/repository/s3gw/s3gw?tab=tags&tag=pr-6002dde8ea5e72212a3253ded2e53afce53bc394-6169337814-1 |
Yeah, OK, this ain't gonna work (I know you've already been through this @0xavi0, but I was in the area and wanted to experiment a bit for the sake of my own understanding of how all the pieces fit). TL;DR: if everything uses the same connection, we get a bunch of |
Eh, while I'm here, why not...? ;-) |
OK, that's actually a lot better with the mutex! No more |
📦 quay.io/s3gw/s3gw:pr-2cf997b980ca03ee54b87ec46241ebbdac036ed9-6181182460-1 https://quay.io/repository/s3gw/s3gw?tab=tags&tag=pr-2cf997b980ca03ee54b87ec46241ebbdac036ed9-6181182460-1 |
Yeah, that |
Yeah, but take into account that I looked at this when the We also had a mutex at the beginning, for example, before WAL was enabled. I found this funny answer in the FAQ of |
Oh I love that :-) |
#210 should take care of that. |
There's something still not quite right here:
A bit more tracing and testing locally shows that sqlite_orm is somehow still internally using two or three different sqlite3* pointers, which shouldn't be possible with the changes I've made to have only one connection. Of course I must be missing some detail, I just haven't figured out what that detail is yet... |
As for these other failures...
...it turns out they're due to sqlite_orm's retain_count not being thread safe. Happily, this is fixed in fnc12/sqlite_orm#1054, but unfortunately we're still using sqlite_orm v1.7.1, which doesn't include that fix. We need to either update @jecluis's package (https://build.opensuse.org/package/show/home:jluis/libsqliteorm) to v1.8+, or bring sqlite_orm in as a submodule as @irq0 suggested in https://github.com/aquarist-labs/s3gw/issues/683 to pick up this fix. |
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
e6c1ca4
to
5ddc2ba
Compare
5ddc2ba
to
dbff419
Compare
Hrm. TestSFSConcurrency is still failing even after the update to sqlite_orm v1.8.2. Given it works fine when I run that test on my machine locally, I'm going to assume for the moment that because the build environment still also has our older libsqliteorm package installed system wide, that maybe the build is picking that up instead. I guess this'll have to wait for #212 and https://github.com/aquarist-labs/s3gw/pull/713 for further testing. |
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
dbff419
to
ec52a78
Compare
ec52a78
to
a88394f
Compare
📦 quay.io/s3gw/s3gw:pr-c6699302f5ccace6467b85d20484f292028b38b7-6248808399-2 https://quay.io/repository/s3gw/s3gw?tab=tags&tag=pr-c6699302f5ccace6467b85d20484f292028b38b7-6248808399-2 |
…saction() Signed-off-by: Tim Serong <[email protected]>
This ensures we only have one storage object, and everything that needs to use it does so via StorageRef, which is a std::shared_ptr. Storage has been renamed StorageImpl, to make sure I caught all the previous uses of Storage elsewhere. If we want to change this to a different type in future, it should be possible to do so just by changing the type of StorageRef in dbconn.h and dbconn.cc - so long as the new type behaves like a pointer, everything Should Just Work(TM). Signed-off-by: Tim Serong <[email protected]>
Signed-off-by: Tim Serong <[email protected]>
I'm almost confident about this now, except for one niggling little problem - when running the unit tests locally (specifically TestSFSWALCheckpoint), I've occasionally hit this:
I'd still like to see if I can get a reliable reproducer for that, to verify if it's something stupid in my test, or if there's some underlying issue that needs to be dealt with. |
do you have a log file for that? Might be interesting to see what's happening. Because that smells a lot like a bug. |
No, no log file, just that snippet that was spat out while running the unit tests, which is very little to go on :-/ |
Just a hunch, but going single connection means we loose some isolation. |
My read of SQLite is that going single connection means we serialize everything on that connection. With serialized thread mode there is a database-level mutex around operations including sqlite3_step. Microbenchmarks confirm and explain why the system benchmarks don't show any difference. Setup: 100000 distinct objects. 1M queries spread over $nproc threads (nproc=32). Measuring query start to finish time (execution + busy timeout) multiple connections + sqlite multithreaded: avg rate: 11273.957159/s - all CPUs 100% The system benchmark reports about 130 S3 OP/s. There is usually just a handful of SQLite queries per S3 OP. |
My choice would be returning a pointer and having DBConn own all Storage objects. It already has the same lifetime as the SFS backend. So, we don't need refcounting. I think pooling the Storage object would be worthwhile. Maybe with a map of thread id -> Storage, lazy initialized Storage on first access. This would be bounded by the limited number of beast worker threads we have. Always handing out the same connection per thread should mix well with what we have right now. Should we ever decide on going async we might run into trouble with isolation tough. |
That's interesting, although in the case of the intermittent assert I'm seeing, AFAICT looking at the code the writes should all be done ( ceph/src/test/rgw/sfs/test_rgw_sfs_wal_checkpoint.cc Lines 83 to 98 in a570e9c
I must be missing some detail somewhere... |
No, that was just me misreading the data :-/ It means 534 queries/second with single connection serialized, vs. ~10K queries/second in the other two cases. |
There's no point using a shared_ptr for the Storage object as its lifetime is bound to the DBConn, so just return a regular pointer. Signed-off-by: Tim Serong <[email protected]>
Right, makes sense.
Alright, I'll see what I can do :-) |
This gives one Storage object per thread, created on demand as a copy of the first Storage object created in the DBConn constructor. I'm not sure whether we need the lock in get_storage(), but suspect we probably do given we're potentially modifying the storage_pool map. Right now this is just a std::map<pthread_t, StorageImpl>, but I reckon I might be able to change the latter to a struct or a pair or something that _also_ included a pointer to each StorageImpl's raw sqlite3 connection, via some trickery in the storage->on_open callback. This would potentially allow calling sqlite3_db_status() on each connection to get runtime stats. We might also be able to get rid of the "sfs_db_lock" mutex I added earlier if we're back to using multiple connections, becasue multiple connections should avoid the "transaction inside transaction" problem I had earlier. Signed-off-by: Tim Serong <[email protected]>
Signed-off-by: Tim Serong <[email protected]>
This shouldn't be necessary now that we've got separate connections per thread again. I've left it there, but renamed to DBConn::sfs_db_lock just in case we do need it again in future. Signed-off-by: Tim Serong <[email protected]>
Signed-off-by: Tim Serong <[email protected]>
...and with the connection pool, we're back to potential WAL explosion, so I'll have to change those tests back again. |
Signed-off-by: Tim Serong <[email protected]>
This mostly reverts a570e9c, because now that we're using multiple connections again (albeit pooled), the WAL growth happens again. Signed-off-by: Tim Serong <[email protected]>
Since adding storage_pool, the WAL doesn't explode quite as much as it used to with 10 threads and 1000 objects each (previously with multiple unpooled connections it'd reliably go over 500MB, but I've just seen a unit test where it "only" got to ~450MB, so let's drop the value we test against a bit such that we still confirm the problem, but have more wiggle room). Signed-off-by: Tim Serong <[email protected]>
This changes storage_pool to std::unordered_map (faster) and switches to using a std::shared_mutex in DBConn::get_storage() which is better for concurrency. Credit goes to Marcel Lauhoff for this implementation. Signed-off-by: Tim Serong <[email protected]>
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
Closing in favour of #233 |
Currently there is one long lived connection to the sqlite database, but also many other threads use their own connections (for some analysis of this, see #201).
This PR changes all the copies of the storage object to references, which means we're actually only using one db connection now. It's a bit irritating to do this, because it's way too easy to accidentally make a copy if you leave the '&' off :-/ I'd really want to somehow disable copy construction of the Storage object, but I didn't figure out how to do that yet.
One interesting effect of this change is that, prior to this commit, the SFS status page only showed SQLite stats for the connection from the status thread, which is not overly helpful. With this commit (all threads using the same connection), the figures from the SQLite stats will actually change over time while s3gw is being used.
Note that as we're building with -DSQLITE_THREADSAFE=1, i.e. we're using Serialized mode, it's totally cool to have one connection shared by multiple threads (see https://www.sqlite.org/threadsafe.html)