Skip to content
This repository has been archived by the owner on Jan 3, 2024. It is now read-only.

[experimental] rgw/sfs: don't use storage.open_forever() #201

Closed

Conversation

tserong
Copy link
Member

@tserong tserong commented Sep 1, 2023

Given we have multiple threads which independently open the database, I'm not sure having the first connection open forever makes sense or does anything useful. Let's see if it changes the benchmarks.

@tserong tserong requested review from jecluis, irq0 and 0xavi0 September 1, 2023 10:35
Given we have multiple threads which independently open the database,
I'm not sure having the first connection open forever makes sense
or does anything useful.  Let's see if it changes the benchmarks.

Signed-off-by: Tim Serong <[email protected]>
@tserong tserong force-pushed the wip-sfs-dont-open-db-forever branch from 87e8663 to 2316024 Compare September 1, 2023 10:35
@tserong
Copy link
Member Author

tserong commented Sep 1, 2023

A bit more context may be in order. I observed that when running s3gw, we end up with two long-lived FDs open, pointing to s3gw.db (ignore the wal and shm lines here, they belong to the first connection):

> ls -l /proc/$(pgrep radosgw)/fd | grep db
lrwx------ 1 tserong users 64 Sep  1 15:03 24 -> /home/tserong/src/github/aquarist-labs/ceph/build/dev/rgw.bar/s3gw.db
lrwx------ 1 tserong users 64 Sep  1 15:03 25 -> /home/tserong/src/github/aquarist-labs/ceph/build/dev/rgw.bar/s3gw.db-wal
lrwx------ 1 tserong users 64 Sep  1 15:03 26 -> /home/tserong/src/github/aquarist-labs/ceph/build/dev/rgw.bar/s3gw.db-shm
lrwx------ 1 tserong users 64 Sep  1 15:03 27 -> /home/tserong/src/github/aquarist-labs/ceph/build/dev/rgw.bar/s3gw.db

But, when anything interesting happens (e.g.: putting an object, or garbage collection), those things happen in separate threads, which open their own connections to the database, in addition to the above. So, why bother having the initial connection live forever if it's not (or barely) used?

@jecluis jecluis marked this pull request as draft September 5, 2023 09:36
@jecluis
Copy link
Member

jecluis commented Sep 5, 2023

Marking as draft because this is definitely not ready for consumption :)

Do we have further information on what happens when we have a lot of in-flight requests?

@tserong
Copy link
Member Author

tserong commented Sep 5, 2023

Do we have further information on what happens when we have a lot of in-flight requests?

A whole lot of db connections get opened and closed separately one after another, in separate threads. AFAICT they don't use the "main" long lived connection at all, which is what this PR gets rid of.

I know this because I've watched a bunch of straces, and stepped a bunch of stuff in gdb while running GC and a handful of object puts, but this is all manual work. I'll try to capture some of this and write it up in more detail and/or more sanely :-)

(I opened this PR because Marcel was interested in taking it to see if it had any effect on benchmarks, and I assumed we could revert it afterwards if necessary.)

@irq0
Copy link
Member

irq0 commented Sep 5, 2023

(I opened this PR because Marcel was interested in taking it to see if it had any effect on benchmarks, and I assumed we could revert it afterwards if necessary.)

Merging and reverting does not /feel/ right. Maybe #202 does what I hope it does - build and push an image for a labeled PR. Once we have that experiments like this become easier :)

@tserong
Copy link
Member Author

tserong commented Sep 6, 2023

Do we have further information on what happens when we have a lot of in-flight requests?

I've done some sketchy instrumentation of sqlite_orm.h, as follows:

--- sqlite_orm.h.orig	2023-09-01 15:17:06.513939439 +1000
+++ sqlite_orm.h	2023-09-06 18:34:40.116985427 +1000
@@ -9620,6 +9620,8 @@
 #include <string>  //  std::string
 #include <system_error>  //  std::system_error
 
+#include <iostream>
+
 // #include "error_code.h"
 
 namespace sqlite_orm {
@@ -9630,9 +9632,20 @@
 
             connection_holder(std::string filename_) : filename(move(filename_)) {}
 
+            void tserong_log_call(const char * fn) {
+                pthread_t me = pthread_self();
+                std::cerr << "thread " << std::hex << me;
+                char name[16];
+                if (0 == pthread_getname_np(me, name, 16)) {
+                  std::cerr << " (" << name << ")";
+                }
+		std::cerr << " calling " << fn << std::endl;
+            } 
+
             void retain() {
                 ++this->_retain_count;
                 if(1 == this->_retain_count) {
+                    tserong_log_call("sqlite3_open");
                     auto rc = sqlite3_open(this->filename.c_str(), &this->db);
                     if(rc != SQLITE_OK) {
                         throw std::system_error(std::error_code(sqlite3_errcode(this->db), get_sqlite_error_category()),
@@ -9644,6 +9657,7 @@
             void release() {
                 --this->_retain_count;
                 if(0 == this->_retain_count) {
+                    tserong_log_call("sqlite3_close");
                     auto rc = sqlite3_close(this->db);
                     if(rc != SQLITE_OK) {
                         throw std::system_error(std::error_code(sqlite3_errcode(this->db), get_sqlite_error_category()),

Then, I ran bin/radosgw -f (not -d or I would have ended up with my crappy tracing smooshed into the usual log output), created a bucket (s3cmd mb s3://foo) and uploaded a 100MB multipart file (dd if=/dev/zero of=/dev/stdout bs=10M count=10 | s3cmd put - s3://foo/100mb).

Note: I did NOT have the patch from this PR applied during this first test.

Here's what happened:

  • On startup, the main thread (7fed8fe50f40) called sqlite3_open() four times, and sqlite3_close() three times. The first call will be that long lived connection that uses storage.open_forever(). One of the others will be DBConn::check_metadata_is_compatible(). I'm not sure about the other two.
  • Right after that, the GC thread (7fed8a5386c0) calls sqlite3_open() and sqlite3_close() three times.
  • When I created a bucket, another thread (7fed804ef6c0) opened and closed the database twice, and yet another thread opened and closed the database five times.
  • When I uploaded a 100MB multipart object, it looks like the database was opened and closed 145 times by 34 threads. None of those threads are the main thread.
  • Finally, when I hit CTRL-C, sqlite3_close() was called one last time, taking out that long-lived connection.

Here's the raw output in case anyone wants to verify my counting:

> bin/radosgw -i bar -f --no-mon-config --debug-rgw 15   --rgw-backend-store sfs   --rgw-data $(pwd)/dev/rgw.bar --run-dir $(pwd)/dev/rgw.bar --rgw-sfs-data-path $(pwd)/dev/rgw.bar --rgw-gc-processor-period 3600 
did not load config file, using default settings.
2023-09-06T18:50:07.181+1000 7fed8fe50f40 -1 Errors while parsing config file!
2023-09-06T18:50:07.181+1000 7fed8fe50f40 -1 can't open ceph.conf: (2) No such file or directory
2023-09-06T18:50:07.181+1000 7fed8fe50f40 -1 Errors while parsing config file!
2023-09-06T18:50:07.181+1000 7fed8fe50f40 -1 can't open ceph.conf: (2) No such file or directory
thread 7fed8fe50f40 (radosgw) calling sqlite3_open
thread 7fed8fe50f40 (radosgw) calling sqlite3_open
thread 7fed8fe50f40 (radosgw) calling sqlite3_close
thread 7fed8fe50f40 (radosgw) calling sqlite3_open
thread 7fed8fe50f40 (radosgw) calling sqlite3_close
thread 7fed8fe50f40 (radosgw) calling sqlite3_open
thread 7fed8fe50f40 (radosgw) calling sqlite3_close
thread 7fed8a5386c0 (rgw_gc) calling sqlite3_open
thread 7fed8a5386c0 (rgw_gc) calling sqlite3_close
thread 7fed8a5386c0 (rgw_gc) calling sqlite3_open
thread 7fed8a5386c0 (rgw_gc) calling sqlite3_close
thread 7fed8a5386c0 (rgw_gc) calling sqlite3_open
thread 7fed8a5386c0 (rgw_gc) calling sqlite3_close
ABOVE IS STARTUP
NOW LETS MAKE A BUCKET
thread 7fed804ef6c0 (radosgw) calling sqlite3_open
thread 7fed804ef6c0 (radosgw) calling sqlite3_close
thread 7fed804ef6c0 (radosgw) calling sqlite3_open
thread 7fed804ef6c0 (radosgw) calling sqlite3_close
thread 7fed814f16c0 (radosgw) calling sqlite3_open
thread 7fed814f16c0 (radosgw) calling sqlite3_close
thread 7fed814f16c0 (radosgw) calling sqlite3_open
thread 7fed814f16c0 (radosgw) calling sqlite3_close
thread 7fed814f16c0 (radosgw) calling sqlite3_open
thread 7fed814f16c0 (radosgw) calling sqlite3_close
thread 7fed814f16c0 (radosgw) calling sqlite3_open
thread 7fed814f16c0 (radosgw) calling sqlite3_close
thread 7fed814f16c0 (radosgw) calling sqlite3_open
thread 7fed814f16c0 (radosgw) calling sqlite3_close
NOW LETS UPLOAD AN OBJECT
thread 7fed77cde6c0 (radosgw) calling sqlite3_open
thread 7fed77cde6c0 (radosgw) calling sqlite3_close
thread 7fed77cde6c0 (radosgw) calling sqlite3_open
thread 7fed77cde6c0 (radosgw) calling sqlite3_close
thread 7fed77cde6c0 (radosgw) calling sqlite3_open
thread 7fed77cde6c0 (radosgw) calling sqlite3_close
thread 7fed72cd46c0 (radosgw) calling sqlite3_open
thread 7fed72cd46c0 (radosgw) calling sqlite3_close
thread 7fed72cd46c0 (radosgw) calling sqlite3_open
thread 7fed72cd46c0 (radosgw) calling sqlite3_close
thread 7fed72cd46c0 (radosgw) calling sqlite3_open
thread 7fed72cd46c0 (radosgw) calling sqlite3_close
thread 7fed72cd46c0 (radosgw) calling sqlite3_open
thread 7fed72cd46c0 (radosgw) calling sqlite3_close
thread 7fed72cd46c0 (radosgw) calling sqlite3_open
thread 7fed72cd46c0 (radosgw) calling sqlite3_close
thread 7fed72cd46c0 (radosgw) calling sqlite3_open
thread 7fed72cd46c0 (radosgw) calling sqlite3_close
thread 7fed6eccc6c0 (radosgw) calling sqlite3_open
thread 7fed6eccc6c0 (radosgw) calling sqlite3_close
thread 7fed6eccc6c0 (radosgw) calling sqlite3_open
thread 7fed6eccc6c0 (radosgw) calling sqlite3_close
thread 7fed6eccc6c0 (radosgw) calling sqlite3_open
thread 7fed6eccc6c0 (radosgw) calling sqlite3_close
thread 7fed6eccc6c0 (radosgw) calling sqlite3_open
thread 7fed6eccc6c0 (radosgw) calling sqlite3_close
thread 7fed6eccc6c0 (radosgw) calling sqlite3_open
thread 7fed6eccc6c0 (radosgw) calling sqlite3_close
thread 7fed6eccc6c0 (radosgw) calling sqlite3_open
thread 7fed6eccc6c0 (radosgw) calling sqlite3_close
thread 7fed6eccc6c0 (radosgw) calling sqlite3_open
thread 7fed6eccc6c0 (radosgw) calling sqlite3_close
thread 7fed6eccc6c0 (radosgw) calling sqlite3_open
thread 7fed6eccc6c0 (radosgw) calling sqlite3_close
thread 7fed6eccc6c0 (radosgw) calling sqlite3_open
thread 7fed6eccc6c0 (radosgw) calling sqlite3_close
thread 7fed6eccc6c0 (radosgw) calling sqlite3_open
thread 7fed6eccc6c0 (radosgw) calling sqlite3_close
thread 7fece33b56c0 (radosgw) calling sqlite3_open
thread 7fece33b56c0 (radosgw) calling sqlite3_close
thread 7fec95b1a6c0 (radosgw) calling sqlite3_open
thread 7fec95b1a6c0 (radosgw) calling sqlite3_close
thread 7fed674bd6c0 (radosgw) calling sqlite3_open
thread 7fed674bd6c0 (radosgw) calling sqlite3_close
thread 7fed674bd6c0 (radosgw) calling sqlite3_open
thread 7fed674bd6c0 (radosgw) calling sqlite3_close
thread 7fed674bd6c0 (radosgw) calling sqlite3_open
thread 7fed674bd6c0 (radosgw) calling sqlite3_close
thread 7fed674bd6c0 (radosgw) calling sqlite3_open
thread 7fed674bd6c0 (radosgw) calling sqlite3_close
thread 7fed674bd6c0 (radosgw) calling sqlite3_open
thread 7fed674bd6c0 (radosgw) calling sqlite3_close
thread 7fed5fcae6c0 (radosgw) calling sqlite3_open
thread 7fed5fcae6c0 (radosgw) calling sqlite3_close
thread 7fed5fcae6c0 (radosgw) calling sqlite3_open
thread 7fed5fcae6c0 (radosgw) calling sqlite3_close
thread 7fed5fcae6c0 (radosgw) calling sqlite3_open
thread 7fed5fcae6c0 (radosgw) calling sqlite3_close
thread 7fed5fcae6c0 (radosgw) calling sqlite3_open
thread 7fed5fcae6c0 (radosgw) calling sqlite3_close
thread 7fed5fcae6c0 (radosgw) calling sqlite3_open
thread 7fed5fcae6c0 (radosgw) calling sqlite3_close
thread 7fed5fcae6c0 (radosgw) calling sqlite3_open
thread 7fed5fcae6c0 (radosgw) calling sqlite3_close
thread 7fed5fcae6c0 (radosgw) calling sqlite3_open
thread 7fed5fcae6c0 (radosgw) calling sqlite3_close
thread 7fed5fcae6c0 (radosgw) calling sqlite3_open
thread 7fed5fcae6c0 (radosgw) calling sqlite3_close
thread 7fed5fcae6c0 (radosgw) calling sqlite3_open
thread 7fed5fcae6c0 (radosgw) calling sqlite3_close
thread 7fed5fcae6c0 (radosgw) calling sqlite3_open
thread 7fed5fcae6c0 (radosgw) calling sqlite3_close
thread 7fed7b4e56c0 (radosgw) calling sqlite3_open
thread 7fed7b4e56c0 (radosgw) calling sqlite3_close
thread 7fed1fc2e6c0 (radosgw) calling sqlite3_open
thread 7fed1fc2e6c0 (radosgw) calling sqlite3_close
thread 7fed0b4056c0 (radosgw) calling sqlite3_open
thread 7fed0b4056c0 (radosgw) calling sqlite3_close
thread 7feccf38d6c0 (radosgw) calling sqlite3_open
thread 7feccf38d6c0 (radosgw) calling sqlite3_close
thread 7feccf38d6c0 (radosgw) calling sqlite3_open
thread 7feccf38d6c0 (radosgw) calling sqlite3_close
thread 7feccf38d6c0 (radosgw) calling sqlite3_open
thread 7feccf38d6c0 (radosgw) calling sqlite3_close
thread 7feccf38d6c0 (radosgw) calling sqlite3_open
thread 7feccf38d6c0 (radosgw) calling sqlite3_close
thread 7fed1cc286c0 (radosgw) calling sqlite3_open
thread 7fed1cc286c0 (radosgw) calling sqlite3_close
thread 7fed1cc286c0 (radosgw) calling sqlite3_open
thread 7fed1cc286c0 (radosgw) calling sqlite3_close
thread 7fed1cc286c0 (radosgw) calling sqlite3_open
thread 7fed1cc286c0 (radosgw) calling sqlite3_close
thread 7fed1cc286c0 (radosgw) calling sqlite3_open
thread 7fed1cc286c0 (radosgw) calling sqlite3_close
thread 7fed1cc286c0 (radosgw) calling sqlite3_open
thread 7fed1cc286c0 (radosgw) calling sqlite3_close
thread 7fed1cc286c0 (radosgw) calling sqlite3_open
thread 7fed1cc286c0 (radosgw) calling sqlite3_close
thread 7fed1cc286c0 (radosgw) calling sqlite3_open
thread 7fed1cc286c0 (radosgw) calling sqlite3_close
thread 7fed1cc286c0 (radosgw) calling sqlite3_open
thread 7fed1cc286c0 (radosgw) calling sqlite3_close
thread 7fed1cc286c0 (radosgw) calling sqlite3_open
thread 7fed1cc286c0 (radosgw) calling sqlite3_close
thread 7fed1cc286c0 (radosgw) calling sqlite3_open
thread 7fed1cc286c0 (radosgw) calling sqlite3_close
thread 7fed0cc086c0 (radosgw) calling sqlite3_open
thread 7fed0cc086c0 (radosgw) calling sqlite3_close
thread 7feca8b406c0 (radosgw) calling sqlite3_open
thread 7feca8b406c0 (radosgw) calling sqlite3_close
thread 7fed784df6c0 (radosgw) calling sqlite3_open
thread 7fed784df6c0 (radosgw) calling sqlite3_close
thread 7fed3e46b6c0 (radosgw) calling sqlite3_open
thread 7fed3e46b6c0 (radosgw) calling sqlite3_close
thread 7fed3e46b6c0 (radosgw) calling sqlite3_open
thread 7fed3e46b6c0 (radosgw) calling sqlite3_close
thread 7fed3e46b6c0 (radosgw) calling sqlite3_open
thread 7fed3e46b6c0 (radosgw) calling sqlite3_close
thread 7fed3e46b6c0 (radosgw) calling sqlite3_open
thread 7fed3e46b6c0 (radosgw) calling sqlite3_close
thread 7fed214316c0 (radosgw) calling sqlite3_open
thread 7fed214316c0 (radosgw) calling sqlite3_close
thread 7fed214316c0 (radosgw) calling sqlite3_open
thread 7fed214316c0 (radosgw) calling sqlite3_close
thread 7fed214316c0 (radosgw) calling sqlite3_open
thread 7fed214316c0 (radosgw) calling sqlite3_close
thread 7fed214316c0 (radosgw) calling sqlite3_open
thread 7fed214316c0 (radosgw) calling sqlite3_close
thread 7fed214316c0 (radosgw) calling sqlite3_open
thread 7fed214316c0 (radosgw) calling sqlite3_close
thread 7fed214316c0 (radosgw) calling sqlite3_open
thread 7fed214316c0 (radosgw) calling sqlite3_close
thread 7fed214316c0 (radosgw) calling sqlite3_open
thread 7fed214316c0 (radosgw) calling sqlite3_close
thread 7fed214316c0 (radosgw) calling sqlite3_open
thread 7fed214316c0 (radosgw) calling sqlite3_close
thread 7fed214316c0 (radosgw) calling sqlite3_open
thread 7fed214316c0 (radosgw) calling sqlite3_close
thread 7fed214316c0 (radosgw) calling sqlite3_open
thread 7fed214316c0 (radosgw) calling sqlite3_close
thread 7feca6b3c6c0 (radosgw) calling sqlite3_open
thread 7feca6b3c6c0 (radosgw) calling sqlite3_close
thread 7fed58ca06c0 (radosgw) calling sqlite3_open
thread 7fed58ca06c0 (radosgw) calling sqlite3_close
thread 7fed2d4496c0 (radosgw) calling sqlite3_open
thread 7fed2d4496c0 (radosgw) calling sqlite3_close
thread 7fed2d4496c0 (radosgw) calling sqlite3_open
thread 7fed2d4496c0 (radosgw) calling sqlite3_close
thread 7fed2d4496c0 (radosgw) calling sqlite3_open
thread 7fed2d4496c0 (radosgw) calling sqlite3_close
thread 7fed2d4496c0 (radosgw) calling sqlite3_open
thread 7fed2d4496c0 (radosgw) calling sqlite3_close
thread 7fed2d4496c0 (radosgw) calling sqlite3_open
thread 7fed2d4496c0 (radosgw) calling sqlite3_close
thread 7fed29c426c0 (radosgw) calling sqlite3_open
thread 7fed29c426c0 (radosgw) calling sqlite3_close
thread 7fed29c426c0 (radosgw) calling sqlite3_open
thread 7fed29c426c0 (radosgw) calling sqlite3_close
thread 7fed29c426c0 (radosgw) calling sqlite3_open
thread 7fed29c426c0 (radosgw) calling sqlite3_close
thread 7fed29c426c0 (radosgw) calling sqlite3_open
thread 7fed29c426c0 (radosgw) calling sqlite3_close
thread 7fed29c426c0 (radosgw) calling sqlite3_open
thread 7fed29c426c0 (radosgw) calling sqlite3_close
thread 7fed29c426c0 (radosgw) calling sqlite3_open
thread 7fed29c426c0 (radosgw) calling sqlite3_close
thread 7fed29c426c0 (radosgw) calling sqlite3_open
thread 7fed29c426c0 (radosgw) calling sqlite3_close
thread 7fed29c426c0 (radosgw) calling sqlite3_open
thread 7fed29c426c0 (radosgw) calling sqlite3_close
thread 7fed29c426c0 (radosgw) calling sqlite3_open
thread 7fed29c426c0 (radosgw) calling sqlite3_close
thread 7fed29c426c0 (radosgw) calling sqlite3_open
thread 7fed29c426c0 (radosgw) calling sqlite3_close
thread 7fed3b4656c0 (radosgw) calling sqlite3_open
thread 7fed3b4656c0 (radosgw) calling sqlite3_close
thread 7fecd8ba06c0 (radosgw) calling sqlite3_open
thread 7fecd8ba06c0 (radosgw) calling sqlite3_close
thread 7fec9c3276c0 (radosgw) calling sqlite3_open
thread 7fec9c3276c0 (radosgw) calling sqlite3_close
thread 7fed724d36c0 (radosgw) calling sqlite3_open
thread 7fed724d36c0 (radosgw) calling sqlite3_close
thread 7fed724d36c0 (radosgw) calling sqlite3_open
thread 7fed724d36c0 (radosgw) calling sqlite3_close
thread 7fed724d36c0 (radosgw) calling sqlite3_open
thread 7fed724d36c0 (radosgw) calling sqlite3_close
thread 7fed724d36c0 (radosgw) calling sqlite3_open
thread 7fed724d36c0 (radosgw) calling sqlite3_close
thread 7fed434756c0 (radosgw) calling sqlite3_open
thread 7fed434756c0 (radosgw) calling sqlite3_close
thread 7fed434756c0 (radosgw) calling sqlite3_open
thread 7fed434756c0 (radosgw) calling sqlite3_close
thread 7fed434756c0 (radosgw) calling sqlite3_open
thread 7fed434756c0 (radosgw) calling sqlite3_close
thread 7fed434756c0 (radosgw) calling sqlite3_open
thread 7fed434756c0 (radosgw) calling sqlite3_close
thread 7fed434756c0 (radosgw) calling sqlite3_open
thread 7fed434756c0 (radosgw) calling sqlite3_close
thread 7fed434756c0 (radosgw) calling sqlite3_open
thread 7fed434756c0 (radosgw) calling sqlite3_close
thread 7fed434756c0 (radosgw) calling sqlite3_open
thread 7fed434756c0 (radosgw) calling sqlite3_close
thread 7fed434756c0 (radosgw) calling sqlite3_open
thread 7fed434756c0 (radosgw) calling sqlite3_close
thread 7fed434756c0 (radosgw) calling sqlite3_open
thread 7fed434756c0 (radosgw) calling sqlite3_close
thread 7fed434756c0 (radosgw) calling sqlite3_open
thread 7fed434756c0 (radosgw) calling sqlite3_close
thread 7fecb33556c0 (radosgw) calling sqlite3_open
thread 7fecb33556c0 (radosgw) calling sqlite3_close
thread 7fed6e4cb6c0 (radosgw) calling sqlite3_open
thread 7fed6e4cb6c0 (radosgw) calling sqlite3_close
thread 7fed4a4836c0 (radosgw) calling sqlite3_open
thread 7fed4a4836c0 (radosgw) calling sqlite3_close
thread 7fecd6b9c6c0 (radosgw) calling sqlite3_open
thread 7fecd6b9c6c0 (radosgw) calling sqlite3_close
thread 7fecd6b9c6c0 (radosgw) calling sqlite3_open
thread 7fecd6b9c6c0 (radosgw) calling sqlite3_close
thread 7fecd6b9c6c0 (radosgw) calling sqlite3_open
thread 7fecd6b9c6c0 (radosgw) calling sqlite3_close
thread 7fecd6b9c6c0 (radosgw) calling sqlite3_open
thread 7fecd6b9c6c0 (radosgw) calling sqlite3_close
thread 7fed3ac646c0 (radosgw) calling sqlite3_open
thread 7fed3ac646c0 (radosgw) calling sqlite3_close
thread 7fed3ac646c0 (radosgw) calling sqlite3_open
thread 7fed3ac646c0 (radosgw) calling sqlite3_close
thread 7fed3ac646c0 (radosgw) calling sqlite3_open
thread 7fed3ac646c0 (radosgw) calling sqlite3_close
thread 7fed3ac646c0 (radosgw) calling sqlite3_open
thread 7fed3ac646c0 (radosgw) calling sqlite3_close
thread 7fed3ac646c0 (radosgw) calling sqlite3_open
thread 7fed3ac646c0 (radosgw) calling sqlite3_close
thread 7fed3ac646c0 (radosgw) calling sqlite3_open
thread 7fed3ac646c0 (radosgw) calling sqlite3_close
thread 7fed3ac646c0 (radosgw) calling sqlite3_open
thread 7fed3ac646c0 (radosgw) calling sqlite3_close
thread 7fed3ac646c0 (radosgw) calling sqlite3_open
thread 7fed3ac646c0 (radosgw) calling sqlite3_close
thread 7fed3ac646c0 (radosgw) calling sqlite3_open
thread 7fed3ac646c0 (radosgw) calling sqlite3_close
thread 7fed3ac646c0 (radosgw) calling sqlite3_open
thread 7fed3ac646c0 (radosgw) calling sqlite3_close
thread 7fed44c786c0 (radosgw) calling sqlite3_open
thread 7fed44c786c0 (radosgw) calling sqlite3_close
thread 7fed0dc0a6c0 (radosgw) calling sqlite3_open
thread 7fed0dc0a6c0 (radosgw) calling sqlite3_close
thread 7fed0dc0a6c0 (radosgw) calling sqlite3_open
thread 7fed0dc0a6c0 (radosgw) calling sqlite3_close
thread 7fed0dc0a6c0 (radosgw) calling sqlite3_open
thread 7fed0dc0a6c0 (radosgw) calling sqlite3_close
thread 7fed0dc0a6c0 (radosgw) calling sqlite3_open
thread 7fed0dc0a6c0 (radosgw) calling sqlite3_close
thread 7fed0dc0a6c0 (radosgw) calling sqlite3_open
thread 7fed0dc0a6c0 (radosgw) calling sqlite3_close
thread 7fed294416c0 (radosgw) calling sqlite3_open
thread 7fed294416c0 (radosgw) calling sqlite3_close
thread 7fed294416c0 (radosgw) calling sqlite3_open
thread 7fed294416c0 (radosgw) calling sqlite3_close
thread 7fed294416c0 (radosgw) calling sqlite3_open
thread 7fed294416c0 (radosgw) calling sqlite3_close
thread 7fed294416c0 (radosgw) calling sqlite3_open
thread 7fed294416c0 (radosgw) calling sqlite3_close
thread 7fed294416c0 (radosgw) calling sqlite3_open
thread 7fed294416c0 (radosgw) calling sqlite3_close
thread 7fed294416c0 (radosgw) calling sqlite3_open
thread 7fed294416c0 (radosgw) calling sqlite3_close
thread 7fed294416c0 (radosgw) calling sqlite3_open
thread 7fed294416c0 (radosgw) calling sqlite3_close
thread 7fed294416c0 (radosgw) calling sqlite3_open
thread 7fed294416c0 (radosgw) calling sqlite3_close
thread 7fed294416c0 (radosgw) calling sqlite3_open
thread 7fed294416c0 (radosgw) calling sqlite3_close
thread 7fed294416c0 (radosgw) calling sqlite3_open
thread 7fed294416c0 (radosgw) calling sqlite3_close
thread 7fed294416c0 (radosgw) calling sqlite3_open
thread 7fed294416c0 (radosgw) calling sqlite3_close
thread 7fed294416c0 (radosgw) calling sqlite3_open
thread 7fed294416c0 (radosgw) calling sqlite3_close
thread 7fed294416c0 (radosgw) calling sqlite3_open
thread 7fed294416c0 (radosgw) calling sqlite3_close
thread 7fed294416c0 (radosgw) calling sqlite3_open
thread 7fed294416c0 (radosgw) calling sqlite3_close
thread 7fed294416c0 (radosgw) calling sqlite3_open
thread 7fed294416c0 (radosgw) calling sqlite3_close
thread 7fed294416c0 (radosgw) calling sqlite3_open
thread 7fed294416c0 (radosgw) calling sqlite3_close
thread 7fed294416c0 (radosgw) calling sqlite3_open
thread 7fed294416c0 (radosgw) calling sqlite3_close
thread 7fed294416c0 (radosgw) calling sqlite3_open
thread 7fed294416c0 (radosgw) calling sqlite3_close
NOW LET'S QUIT
^C2023-09-06T18:51:30.027+1000 7fed8d53e6c0 -1 received  signal: Interrupt, si_code : 128, si_value (int): 0, si_value (ptr): 0, si_errno: 0, si_pid : 0, si_uid : 0, si_addr0, si_status0
2023-09-06T18:51:30.027+1000 7fed8fe50f40 -1 shutting down
thread 7fed8fe50f40 (radosgw) calling sqlite3_close

Repeating the same test with the patch from this PR applied gives:

  • Main thread on startup opens and closes database 11 times (vs. three and a half times in the previous test)
  • GC thread opens and closes database 3 times (same as above)
  • Making a bucket, two threads, one open/close twice, one five times (same as above)
  • Uploading the same multipart object, 145 opens and closes by 34 threads (same as above)
  • There's no final close on exit, because the db connection isn't long lived anymore.

Here's the raw output from that second run:

> bin/radosgw -i bar -f --no-mon-config --debug-rgw 15   --rgw-backend-store sfs   --rgw-data $(pwd)/dev/rgw.bar --run-dir $(pwd)/dev/rgw.bar --rgw-sfs-data-path $(pwd)/dev/rgw.bar --rgw-gc-processor-period 3600 
did not load config file, using default settings.
2023-09-06T19:19:08.352+1000 7f98ef873f40 -1 Errors while parsing config file!
2023-09-06T19:19:08.352+1000 7f98ef873f40 -1 can't open ceph.conf: (2) No such file or directory
2023-09-06T19:19:08.352+1000 7f98ef873f40 -1 Errors while parsing config file!
2023-09-06T19:19:08.352+1000 7f98ef873f40 -1 can't open ceph.conf: (2) No such file or directory
thread 7f98ef873f40 (radosgw) calling sqlite3_open
thread 7f98ef873f40 (radosgw) calling sqlite3_close
thread 7f98ef873f40 (radosgw) calling sqlite3_open
thread 7f98ef873f40 (radosgw) calling sqlite3_close
thread 7f98ef873f40 (radosgw) calling sqlite3_open
thread 7f98ef873f40 (radosgw) calling sqlite3_close
thread 7f98ef873f40 (radosgw) calling sqlite3_open
thread 7f98ef873f40 (radosgw) calling sqlite3_close
thread 7f98ef873f40 (radosgw) calling sqlite3_open
thread 7f98ef873f40 (radosgw) calling sqlite3_close
thread 7f98ef873f40 (radosgw) calling sqlite3_open
thread 7f98ef873f40 (radosgw) calling sqlite3_close
thread 7f98ef873f40 (radosgw) calling sqlite3_open
thread 7f98ef873f40 (radosgw) calling sqlite3_close
thread 7f98ef873f40 (radosgw) calling sqlite3_open
thread 7f98ef873f40 (radosgw) calling sqlite3_close
thread 7f98ef873f40 (radosgw) calling sqlite3_open
thread 7f98ef873f40 (radosgw) calling sqlite3_close
thread 7f98ef873f40 (radosgw) calling sqlite3_open
thread 7f98ef873f40 (radosgw) calling sqlite3_close
thread 7f98ef873f40 (radosgw) calling sqlite3_open
thread 7f98ef873f40 (radosgw) calling sqlite3_close
thread 7f98ea2996c0 (rgw_gc) calling sqlite3_open
thread 7f98ea2996c0 (rgw_gc) calling sqlite3_close
thread 7f98ea2996c0 (rgw_gc) calling sqlite3_open
thread 7f98ea2996c0 (rgw_gc) calling sqlite3_close
thread 7f98ea2996c0 (rgw_gc) calling sqlite3_open
thread 7f98ea2996c0 (rgw_gc) calling sqlite3_close
ABOVE IS STARTUP
NOW LETS MAKE A BUCKET
thread 7f98e12526c0 (radosgw) calling sqlite3_open
thread 7f98e12526c0 (radosgw) calling sqlite3_close
thread 7f98e12526c0 (radosgw) calling sqlite3_open
thread 7f98e12526c0 (radosgw) calling sqlite3_close
thread 7f98e0a516c0 (radosgw) calling sqlite3_open
thread 7f98e0a516c0 (radosgw) calling sqlite3_close
thread 7f98e0a516c0 (radosgw) calling sqlite3_open
thread 7f98e0a516c0 (radosgw) calling sqlite3_close
thread 7f98e0a516c0 (radosgw) calling sqlite3_open
thread 7f98e0a516c0 (radosgw) calling sqlite3_close
thread 7f98e0a516c0 (radosgw) calling sqlite3_open
thread 7f98e0a516c0 (radosgw) calling sqlite3_close
thread 7f98e0a516c0 (radosgw) calling sqlite3_open
thread 7f98e0a516c0 (radosgw) calling sqlite3_close
NOW LETS UPLOAD AN OBJECT
thread 7f98d82406c0 (radosgw) calling sqlite3_open
thread 7f98d82406c0 (radosgw) calling sqlite3_close
thread 7f98d82406c0 (radosgw) calling sqlite3_open
thread 7f98d82406c0 (radosgw) calling sqlite3_close
thread 7f98d82406c0 (radosgw) calling sqlite3_open
thread 7f98d82406c0 (radosgw) calling sqlite3_close
thread 7f98d3a376c0 (radosgw) calling sqlite3_open
thread 7f98d3a376c0 (radosgw) calling sqlite3_close
thread 7f98d3a376c0 (radosgw) calling sqlite3_open
thread 7f98d3a376c0 (radosgw) calling sqlite3_close
thread 7f98d3a376c0 (radosgw) calling sqlite3_open
thread 7f98d3a376c0 (radosgw) calling sqlite3_close
thread 7f98d3a376c0 (radosgw) calling sqlite3_open
thread 7f98d3a376c0 (radosgw) calling sqlite3_close
thread 7f98d3a376c0 (radosgw) calling sqlite3_open
thread 7f98d3a376c0 (radosgw) calling sqlite3_close
thread 7f98d3a376c0 (radosgw) calling sqlite3_open
thread 7f98d3a376c0 (radosgw) calling sqlite3_close
thread 7f98d02306c0 (radosgw) calling sqlite3_open
thread 7f98d02306c0 (radosgw) calling sqlite3_close
thread 7f98d02306c0 (radosgw) calling sqlite3_open
thread 7f98d02306c0 (radosgw) calling sqlite3_close
thread 7f98d02306c0 (radosgw) calling sqlite3_open
thread 7f98d02306c0 (radosgw) calling sqlite3_close
thread 7f98d02306c0 (radosgw) calling sqlite3_open
thread 7f98d02306c0 (radosgw) calling sqlite3_close
thread 7f98d02306c0 (radosgw) calling sqlite3_open
thread 7f98d02306c0 (radosgw) calling sqlite3_close
thread 7f98d02306c0 (radosgw) calling sqlite3_open
thread 7f98d02306c0 (radosgw) calling sqlite3_close
thread 7f98d02306c0 (radosgw) calling sqlite3_open
thread 7f98d02306c0 (radosgw) calling sqlite3_close
thread 7f98d02306c0 (radosgw) calling sqlite3_open
thread 7f98d02306c0 (radosgw) calling sqlite3_close
thread 7f98d02306c0 (radosgw) calling sqlite3_open
thread 7f98d02306c0 (radosgw) calling sqlite3_close
thread 7f98d02306c0 (radosgw) calling sqlite3_open
thread 7f98d02306c0 (radosgw) calling sqlite3_close
thread 7f98481206c0 (radosgw) calling sqlite3_open
thread 7f98481206c0 (radosgw) calling sqlite3_close
thread 7f97fb8876c0 (radosgw) calling sqlite3_open
thread 7f97fb8876c0 (radosgw) calling sqlite3_close
thread 7f98bc2086c0 (radosgw) calling sqlite3_open
thread 7f98bc2086c0 (radosgw) calling sqlite3_close
thread 7f98bc2086c0 (radosgw) calling sqlite3_open
thread 7f98bc2086c0 (radosgw) calling sqlite3_close
thread 7f98bc2086c0 (radosgw) calling sqlite3_open
thread 7f98bc2086c0 (radosgw) calling sqlite3_close
thread 7f98bc2086c0 (radosgw) calling sqlite3_open
thread 7f98bc2086c0 (radosgw) calling sqlite3_close
thread 7f98bc2086c0 (radosgw) calling sqlite3_open
thread 7f98bc2086c0 (radosgw) calling sqlite3_close
thread 7f98d02306c0 (radosgw) calling sqlite3_open
thread 7f98d02306c0 (radosgw) calling sqlite3_close
thread 7f98d02306c0 (radosgw) calling sqlite3_open
thread 7f98d02306c0 (radosgw) calling sqlite3_close
thread 7f98d02306c0 (radosgw) calling sqlite3_open
thread 7f98d02306c0 (radosgw) calling sqlite3_close
thread 7f98d02306c0 (radosgw) calling sqlite3_open
thread 7f98d02306c0 (radosgw) calling sqlite3_close
thread 7f98d02306c0 (radosgw) calling sqlite3_open
thread 7f98d02306c0 (radosgw) calling sqlite3_close
thread 7f98d02306c0 (radosgw) calling sqlite3_open
thread 7f98d02306c0 (radosgw) calling sqlite3_close
thread 7f98d02306c0 (radosgw) calling sqlite3_open
thread 7f98d02306c0 (radosgw) calling sqlite3_close
thread 7f98d02306c0 (radosgw) calling sqlite3_open
thread 7f98d02306c0 (radosgw) calling sqlite3_close
thread 7f98d02306c0 (radosgw) calling sqlite3_open
thread 7f98d02306c0 (radosgw) calling sqlite3_close
thread 7f98d02306c0 (radosgw) calling sqlite3_open
thread 7f98d02306c0 (radosgw) calling sqlite3_close
thread 7f98028956c0 (radosgw) calling sqlite3_open
thread 7f98028956c0 (radosgw) calling sqlite3_close
thread 7f97fb8876c0 (radosgw) calling sqlite3_open
thread 7f97fb8876c0 (radosgw) calling sqlite3_close
thread 7f986c1686c0 (radosgw) calling sqlite3_open
thread 7f986c1686c0 (radosgw) calling sqlite3_close
thread 7f98330f66c0 (radosgw) calling sqlite3_open
thread 7f98330f66c0 (radosgw) calling sqlite3_close
thread 7f98330f66c0 (radosgw) calling sqlite3_open
thread 7f98330f66c0 (radosgw) calling sqlite3_close
thread 7f98330f66c0 (radosgw) calling sqlite3_open
thread 7f98330f66c0 (radosgw) calling sqlite3_close
thread 7f98330f66c0 (radosgw) calling sqlite3_open
thread 7f98330f66c0 (radosgw) calling sqlite3_close
thread 7f98911b26c0 (radosgw) calling sqlite3_open
thread 7f98911b26c0 (radosgw) calling sqlite3_close
thread 7f98911b26c0 (radosgw) calling sqlite3_open
thread 7f98911b26c0 (radosgw) calling sqlite3_close
thread 7f98911b26c0 (radosgw) calling sqlite3_open
thread 7f98911b26c0 (radosgw) calling sqlite3_close
thread 7f98911b26c0 (radosgw) calling sqlite3_open
thread 7f98911b26c0 (radosgw) calling sqlite3_close
thread 7f98911b26c0 (radosgw) calling sqlite3_open
thread 7f98911b26c0 (radosgw) calling sqlite3_close
thread 7f98911b26c0 (radosgw) calling sqlite3_open
thread 7f98911b26c0 (radosgw) calling sqlite3_close
thread 7f98911b26c0 (radosgw) calling sqlite3_open
thread 7f98911b26c0 (radosgw) calling sqlite3_close
thread 7f98911b26c0 (radosgw) calling sqlite3_open
thread 7f98911b26c0 (radosgw) calling sqlite3_close
thread 7f98911b26c0 (radosgw) calling sqlite3_open
thread 7f98911b26c0 (radosgw) calling sqlite3_close
thread 7f98911b26c0 (radosgw) calling sqlite3_open
thread 7f98911b26c0 (radosgw) calling sqlite3_close
thread 7f989d1ca6c0 (radosgw) calling sqlite3_open
thread 7f989d1ca6c0 (radosgw) calling sqlite3_close
thread 7f98270de6c0 (radosgw) calling sqlite3_open
thread 7f98270de6c0 (radosgw) calling sqlite3_close
thread 7f98048996c0 (radosgw) calling sqlite3_open
thread 7f98048996c0 (radosgw) calling sqlite3_close
thread 7f98a39d76c0 (radosgw) calling sqlite3_open
thread 7f98a39d76c0 (radosgw) calling sqlite3_close
thread 7f98a39d76c0 (radosgw) calling sqlite3_open
thread 7f98a39d76c0 (radosgw) calling sqlite3_close
thread 7f98a39d76c0 (radosgw) calling sqlite3_open
thread 7f98a39d76c0 (radosgw) calling sqlite3_close
thread 7f98a39d76c0 (radosgw) calling sqlite3_open
thread 7f98a39d76c0 (radosgw) calling sqlite3_close
thread 7f98298e36c0 (radosgw) calling sqlite3_open
thread 7f98298e36c0 (radosgw) calling sqlite3_close
thread 7f98298e36c0 (radosgw) calling sqlite3_open
thread 7f98298e36c0 (radosgw) calling sqlite3_close
thread 7f98298e36c0 (radosgw) calling sqlite3_open
thread 7f98298e36c0 (radosgw) calling sqlite3_close
thread 7f98298e36c0 (radosgw) calling sqlite3_open
thread 7f98298e36c0 (radosgw) calling sqlite3_close
thread 7f98298e36c0 (radosgw) calling sqlite3_open
thread 7f98298e36c0 (radosgw) calling sqlite3_close
thread 7f98298e36c0 (radosgw) calling sqlite3_open
thread 7f98298e36c0 (radosgw) calling sqlite3_close
thread 7f98298e36c0 (radosgw) calling sqlite3_open
thread 7f98298e36c0 (radosgw) calling sqlite3_close
thread 7f98298e36c0 (radosgw) calling sqlite3_open
thread 7f98298e36c0 (radosgw) calling sqlite3_close
thread 7f98298e36c0 (radosgw) calling sqlite3_open
thread 7f98298e36c0 (radosgw) calling sqlite3_close
thread 7f98298e36c0 (radosgw) calling sqlite3_open
thread 7f98298e36c0 (radosgw) calling sqlite3_close
thread 7f98168bd6c0 (radosgw) calling sqlite3_open
thread 7f98168bd6c0 (radosgw) calling sqlite3_close
thread 7f98a91e26c0 (radosgw) calling sqlite3_open
thread 7f98a91e26c0 (radosgw) calling sqlite3_close
thread 7f986695d6c0 (radosgw) calling sqlite3_open
thread 7f986695d6c0 (radosgw) calling sqlite3_close
thread 7f98521346c0 (radosgw) calling sqlite3_open
thread 7f98521346c0 (radosgw) calling sqlite3_close
thread 7f98521346c0 (radosgw) calling sqlite3_open
thread 7f98521346c0 (radosgw) calling sqlite3_close
thread 7f98521346c0 (radosgw) calling sqlite3_open
thread 7f98521346c0 (radosgw) calling sqlite3_close
thread 7f98521346c0 (radosgw) calling sqlite3_open
thread 7f98521346c0 (radosgw) calling sqlite3_close
thread 7f98228d56c0 (radosgw) calling sqlite3_open
thread 7f98228d56c0 (radosgw) calling sqlite3_close
thread 7f98228d56c0 (radosgw) calling sqlite3_open
thread 7f98228d56c0 (radosgw) calling sqlite3_close
thread 7f98228d56c0 (radosgw) calling sqlite3_open
thread 7f98228d56c0 (radosgw) calling sqlite3_close
thread 7f98228d56c0 (radosgw) calling sqlite3_open
thread 7f98228d56c0 (radosgw) calling sqlite3_close
thread 7f98228d56c0 (radosgw) calling sqlite3_open
thread 7f98228d56c0 (radosgw) calling sqlite3_close
thread 7f98228d56c0 (radosgw) calling sqlite3_open
thread 7f98228d56c0 (radosgw) calling sqlite3_close
thread 7f98228d56c0 (radosgw) calling sqlite3_open
thread 7f98228d56c0 (radosgw) calling sqlite3_close
thread 7f98228d56c0 (radosgw) calling sqlite3_open
thread 7f98228d56c0 (radosgw) calling sqlite3_close
thread 7f98228d56c0 (radosgw) calling sqlite3_open
thread 7f98228d56c0 (radosgw) calling sqlite3_close
thread 7f98228d56c0 (radosgw) calling sqlite3_open
thread 7f98228d56c0 (radosgw) calling sqlite3_close
thread 7f98b41f86c0 (radosgw) calling sqlite3_open
thread 7f98b41f86c0 (radosgw) calling sqlite3_close
thread 7f98739776c0 (radosgw) calling sqlite3_open
thread 7f98739776c0 (radosgw) calling sqlite3_close
thread 7f981a8c56c0 (radosgw) calling sqlite3_open
thread 7f981a8c56c0 (radosgw) calling sqlite3_close
thread 7f97fa8856c0 (radosgw) calling sqlite3_open
thread 7f97fa8856c0 (radosgw) calling sqlite3_close
thread 7f97fa8856c0 (radosgw) calling sqlite3_open
thread 7f97fa8856c0 (radosgw) calling sqlite3_close
thread 7f97fa8856c0 (radosgw) calling sqlite3_open
thread 7f97fa8856c0 (radosgw) calling sqlite3_close
thread 7f97fa8856c0 (radosgw) calling sqlite3_open
thread 7f97fa8856c0 (radosgw) calling sqlite3_close
thread 7f982d8eb6c0 (radosgw) calling sqlite3_open
thread 7f982d8eb6c0 (radosgw) calling sqlite3_close
thread 7f982d8eb6c0 (radosgw) calling sqlite3_open
thread 7f982d8eb6c0 (radosgw) calling sqlite3_close
thread 7f982d8eb6c0 (radosgw) calling sqlite3_open
thread 7f982d8eb6c0 (radosgw) calling sqlite3_close
thread 7f982d8eb6c0 (radosgw) calling sqlite3_open
thread 7f982d8eb6c0 (radosgw) calling sqlite3_close
thread 7f982d8eb6c0 (radosgw) calling sqlite3_open
thread 7f982d8eb6c0 (radosgw) calling sqlite3_close
thread 7f982d8eb6c0 (radosgw) calling sqlite3_open
thread 7f982d8eb6c0 (radosgw) calling sqlite3_close
thread 7f982d8eb6c0 (radosgw) calling sqlite3_open
thread 7f982d8eb6c0 (radosgw) calling sqlite3_close
thread 7f982d8eb6c0 (radosgw) calling sqlite3_open
thread 7f982d8eb6c0 (radosgw) calling sqlite3_close
thread 7f982d8eb6c0 (radosgw) calling sqlite3_open
thread 7f982d8eb6c0 (radosgw) calling sqlite3_close
thread 7f982d8eb6c0 (radosgw) calling sqlite3_open
thread 7f982d8eb6c0 (radosgw) calling sqlite3_close
thread 7f98699636c0 (radosgw) calling sqlite3_open
thread 7f98699636c0 (radosgw) calling sqlite3_close
thread 7f98491226c0 (radosgw) calling sqlite3_open
thread 7f98491226c0 (radosgw) calling sqlite3_close
thread 7f98b11f26c0 (radosgw) calling sqlite3_open
thread 7f98b11f26c0 (radosgw) calling sqlite3_close
thread 7f98989c16c0 (radosgw) calling sqlite3_open
thread 7f98989c16c0 (radosgw) calling sqlite3_close
thread 7f98989c16c0 (radosgw) calling sqlite3_open
thread 7f98989c16c0 (radosgw) calling sqlite3_close
thread 7f98989c16c0 (radosgw) calling sqlite3_open
thread 7f98989c16c0 (radosgw) calling sqlite3_close
thread 7f98989c16c0 (radosgw) calling sqlite3_open
thread 7f98989c16c0 (radosgw) calling sqlite3_close
thread 7f97e80606c0 (radosgw) calling sqlite3_open
thread 7f97e80606c0 (radosgw) calling sqlite3_close
thread 7f97e80606c0 (radosgw) calling sqlite3_open
thread 7f97e80606c0 (radosgw) calling sqlite3_close
thread 7f97e80606c0 (radosgw) calling sqlite3_open
thread 7f97e80606c0 (radosgw) calling sqlite3_close
thread 7f97e80606c0 (radosgw) calling sqlite3_open
thread 7f97e80606c0 (radosgw) calling sqlite3_close
thread 7f97e80606c0 (radosgw) calling sqlite3_open
thread 7f97e80606c0 (radosgw) calling sqlite3_close
thread 7f97e80606c0 (radosgw) calling sqlite3_open
thread 7f97e80606c0 (radosgw) calling sqlite3_close
thread 7f97e80606c0 (radosgw) calling sqlite3_open
thread 7f97e80606c0 (radosgw) calling sqlite3_close
thread 7f97e80606c0 (radosgw) calling sqlite3_open
thread 7f97e80606c0 (radosgw) calling sqlite3_close
thread 7f97e80606c0 (radosgw) calling sqlite3_open
thread 7f97e80606c0 (radosgw) calling sqlite3_close
thread 7f97e80606c0 (radosgw) calling sqlite3_open
thread 7f97e80606c0 (radosgw) calling sqlite3_close
thread 7f98038976c0 (radosgw) calling sqlite3_open
thread 7f98038976c0 (radosgw) calling sqlite3_close
thread 7f98c6a1d6c0 (radosgw) calling sqlite3_open
thread 7f98c6a1d6c0 (radosgw) calling sqlite3_close
thread 7f989c9c96c0 (radosgw) calling sqlite3_open
thread 7f989c9c96c0 (radosgw) calling sqlite3_close
thread 7f989c9c96c0 (radosgw) calling sqlite3_open
thread 7f989c9c96c0 (radosgw) calling sqlite3_close
thread 7f989c9c96c0 (radosgw) calling sqlite3_open
thread 7f989c9c96c0 (radosgw) calling sqlite3_close
thread 7f989c9c96c0 (radosgw) calling sqlite3_open
thread 7f989c9c96c0 (radosgw) calling sqlite3_close
thread 7f98901b06c0 (radosgw) calling sqlite3_open
thread 7f98901b06c0 (radosgw) calling sqlite3_close
thread 7f98901b06c0 (radosgw) calling sqlite3_open
thread 7f98901b06c0 (radosgw) calling sqlite3_close
thread 7f98901b06c0 (radosgw) calling sqlite3_open
thread 7f98901b06c0 (radosgw) calling sqlite3_close
thread 7f98901b06c0 (radosgw) calling sqlite3_open
thread 7f98901b06c0 (radosgw) calling sqlite3_close
thread 7f98901b06c0 (radosgw) calling sqlite3_open
thread 7f98901b06c0 (radosgw) calling sqlite3_close
thread 7f98901b06c0 (radosgw) calling sqlite3_open
thread 7f98901b06c0 (radosgw) calling sqlite3_close
thread 7f98901b06c0 (radosgw) calling sqlite3_open
thread 7f98901b06c0 (radosgw) calling sqlite3_close
thread 7f98901b06c0 (radosgw) calling sqlite3_open
thread 7f98901b06c0 (radosgw) calling sqlite3_close
thread 7f98901b06c0 (radosgw) calling sqlite3_open
thread 7f98901b06c0 (radosgw) calling sqlite3_close
thread 7f98901b06c0 (radosgw) calling sqlite3_open
thread 7f98901b06c0 (radosgw) calling sqlite3_close
thread 7f98901b06c0 (radosgw) calling sqlite3_open
thread 7f98901b06c0 (radosgw) calling sqlite3_close
thread 7f98901b06c0 (radosgw) calling sqlite3_open
thread 7f98901b06c0 (radosgw) calling sqlite3_close
thread 7f98901b06c0 (radosgw) calling sqlite3_open
thread 7f98901b06c0 (radosgw) calling sqlite3_close
thread 7f98901b06c0 (radosgw) calling sqlite3_open
thread 7f98901b06c0 (radosgw) calling sqlite3_close
thread 7f98901b06c0 (radosgw) calling sqlite3_open
thread 7f98901b06c0 (radosgw) calling sqlite3_close
thread 7f98901b06c0 (radosgw) calling sqlite3_open
thread 7f98901b06c0 (radosgw) calling sqlite3_close
thread 7f98901b06c0 (radosgw) calling sqlite3_open
thread 7f98901b06c0 (radosgw) calling sqlite3_close
thread 7f98901b06c0 (radosgw) calling sqlite3_open
thread 7f98901b06c0 (radosgw) calling sqlite3_close
NOW LET'S QUIT
^C2023-09-06T19:19:50.605+1000 7f98ed29f6c0 -1 received  signal: Interrupt, si_code : 128, si_value (int): 0, si_value (ptr): 0, si_errno: 0, si_pid : 0, si_uid : 0, si_addr0, si_status0
2023-09-06T19:19:50.605+1000 7f98ef873f40 -1 shutting down

@tserong
Copy link
Member Author

tserong commented Sep 6, 2023

Repeating the above two tests, but running under strace -f -e trace=open,openat, gives:

  • without this patch (so, db is open forever):
    • 3 opens of s3gw.db
    • 165 opens of s3gw.db-wal
    • 1 open of s3gw.db-shm
  • with this patch:
    • 165 opens of s3gw.db
    • 164 opens of s3gw.db-wal
    • 164 opens of s3gw.db-shm

So either way, the WAL is opened and closed many times, but with the db open forever, the main database file is only opened three times vs. 165 times with this patch applied.

@0xavi0
Copy link

0xavi0 commented Sep 7, 2023

Very interesting experiment.
Number of threads created is impressive, indeed.

Looking at the sqlite_orm code... retain() is only called if:

  • We explicitly call open_forever() --> released when process exists.
  • The database is in memory --> released when process exists.
  • When starting a transaction --> released on commit or rollback

What about disabling the open_forever() call and activating the status page? wasn't the status page the one using that open connection?

@irq0 irq0 added the ci/build-s3gw-image Build and push a container image label Sep 7, 2023
@tserong
Copy link
Member Author

tserong commented Sep 7, 2023

What about disabling the open_forever() call and activating the status page? wasn't the status page the one using that open connection?

OK, that gets really interesting:

Running radosgw [...] --rgw-frontends "beast port=7480,status" WITH open_forever(), I go hit the following URLs:

One of these -- http://localhost:9090/sfs -- results in a single sqlite3_open/sqlite3_close pair, from the "status-server" thread. The others do nothing obvious or interesting.

Running WITHOUT open_forever(), if I hit the same set of URLs, http://localhost:9090/ and http://localhost:9090/telemetry do nothing interesting, but the others give me:

2023-09-07T18:47:38.436+1000 7fcbca18e6c0 -1 [SQLITE] (21) API call with invalid database connection pointer
2023-09-07T18:47:38.436+1000 7fcbca18e6c0 -1 [SQLITE] (21) misuse at line 23432 of [831d0fb283]
2023-09-07T18:47:38.436+1000 7fcbca18e6c0 -1 [SQLITE] (21) API call with invalid database connection pointer
2023-09-07T18:47:38.436+1000 7fcbca18e6c0 -1 [SQLITE] (21) misuse at line 23432 of [831d0fb283]
2023-09-07T18:47:38.436+1000 7fcbca18e6c0 -1 [SQLITE] (21) API call with invalid database connection pointer
2023-09-07T18:47:38.436+1000 7fcbca18e6c0 -1 [SQLITE] (21) misuse at line 23432 of [831d0fb283]
2023-09-07T18:47:38.436+1000 7fcbca18e6c0 -1 [SQLITE] (21) API call with invalid database connection pointer
2023-09-07T18:47:38.436+1000 7fcbca18e6c0 -1 [SQLITE] (21) misuse at line 23432 of [831d0fb283]
2023-09-07T18:48:06.742+1000 7fcbca18e6c0 -1 [SQLITE] (21) API call with invalid database connection pointer
2023-09-07T18:48:06.742+1000 7fcbca18e6c0 -1 [SQLITE] (21) misuse at line 23432 of [831d0fb283]
2023-09-07T18:48:06.742+1000 7fcbca18e6c0 -1 [SQLITE] (21) API call with invalid database connection pointer
2023-09-07T18:48:06.742+1000 7fcbca18e6c0 -1 [SQLITE] (21) misuse at line 23432 of [831d0fb283]
2023-09-07T18:48:06.742+1000 7fcbca18e6c0 -1 [SQLITE] (21) API call with invalid database connection pointer
2023-09-07T18:48:06.742+1000 7fcbca18e6c0 -1 [SQLITE] (21) misuse at line 23432 of [831d0fb283]
2023-09-07T18:48:41.575+1000 7fcbca18e6c0 -1 [SQLITE] (21) API call with invalid database connection pointer
2023-09-07T18:48:41.575+1000 7fcbca18e6c0 -1 [SQLITE] (21) misuse at line 23432 of [831d0fb283]
2023-09-07T18:48:41.575+1000 7fcbca18e6c0 -1 [SQLITE] (21) API call with invalid database connection pointer
2023-09-07T18:48:41.575+1000 7fcbca18e6c0 -1 [SQLITE] (21) misuse at line 23432 of [831d0fb283]
2023-09-07T18:48:41.575+1000 7fcbca18e6c0 -1 [SQLITE] (21) API call with invalid database connection pointer
2023-09-07T18:48:41.575+1000 7fcbca18e6c0 -1 [SQLITE] (21) misuse at line 23432 of [831d0fb283]

@irq0
Copy link
Member

irq0 commented Sep 7, 2023

Running WITHOUT open_forever(), if I hit the same set of URLs, http://localhost:9090 and http://localhost:9090/telemetry do nothing interesting, but the others give me:
[...]
2023-09-07T18:47:38.436+1000 7fcbca18e6c0 -1 [SQLITE] (21) API call with invalid database connection pointer
2023-09-07T18:47:38.436+1000 7fcbca18e6c0 -1 [SQLITE] (21) misuse at line 23432 of [831d0fb283]

They all use the sqlite db pointer created on start and take a copy. They don't renew. If that pointer breaks, sqlite error logs that we called a sqlite3_* function with a broken pointer.

@tserong
Copy link
Member Author

tserong commented Sep 7, 2023

They all use the sqlite db pointer created on start and take a copy. They don't renew. If that pointer breaks, sqlite error logs that we called a sqlite3_* function with a broken pointer.

So the stats thread is somehow behaving differently than the GC thread and the worker(?) threads that handle object puts etc., because the latter all seem to open their own connections. Why take a different approach for the stats thread?

@0xavi0
Copy link

0xavi0 commented Sep 7, 2023

Yeah, I think it's different because GC is accessing the database opening a new connection while the status page is using the raw pointer we get when we do the first connection.

@tserong
Copy link
Member Author

tserong commented Sep 7, 2023

But why?

I mean, why does the status page take a copy of the raw pointer, but every other thread makes its own connections? Or, conversely, why does every other thread not take a copy of the raw pointer?

@tserong
Copy link
Member Author

tserong commented Sep 8, 2023

Repeating the above two tests, but running under strace -f -e trace=open,openat, gives:

* without this patch (so, db is open forever):
  
  * 3 opens of s3gw.db
  * 165 opens of s3gw.db-wal
  * 1 open of s3gw.db-shm

* with this patch:
  
  * 165 opens of s3gw.db
  * 164 opens of s3gw.db-wal
  * 164 opens of s3gw.db-shm

So either way, the WAL is opened and closed many times, but with the db open forever, the main database file is only opened three times vs. 165 times with this patch applied.

Just out of curiosity, I ran with --rgw-backend-store dbstore under strace and did the same test (make a bucket, upload a 100MB file), and I got:

  • 1 open of dbstore-default_ns.db
  • 27 opens of dbstore-default_ns.db-journal

@irq0 irq0 added ci/build-s3gw-image Build and push a container image and removed ci/build-s3gw-image Build and push a container image labels Sep 8, 2023
@github-actions
Copy link

github-actions bot commented Sep 8, 2023

📦 quay.io/s3gw/s3gw:pr-b5ff3a323b24b3e50a5d91c2ba3a2eeecc22b940-6119665587-1 https://quay.io/repository/s3gw/s3gw?tab=tags&tag=pr-b5ff3a323b24b3e50a5d91c2ba3a2eeecc22b940-6119665587-1

@tserong
Copy link
Member Author

tserong commented Sep 11, 2023

Merging and reverting does not /feel/ right. Maybe #202 does what I hope it does - build and push an image for a labeled PR. Once we have that experiments like this become easier :)

Yep, that worked :-) (For anyone else reading this in future, it looks like it landed in #203).

Perf looks pretty much the same with/without this (6119665587 refers to this PR):

S3GW Performance Report pr201.pdf

(Sorry about the PDF, github won't let me upload HTML, which is the original)

@tserong
Copy link
Member Author

tserong commented Sep 12, 2023

But why?

I mean, why does the status page take a copy of the raw pointer, but every other thread makes its own connections? Or, conversely, why does every other thread not take a copy of the raw pointer?

To answer my own question: because the status thread needs to call sqlite3_* API functions, which it can't do without the raw pointer to the connection. The only way to get that pointer out of sqlite_orm is to capture it in storage.on_open, and of course that connection needs to live forever, because if it didn't, any use of the raw pointer would result in everything collapsing into a smoking wreck.

@tserong
Copy link
Member Author

tserong commented Sep 12, 2023

TL;DR: we need storage.open_forever() and #208 will fix the database corruption issue, so while this has been a very interesting learning experience for me, we really don't want to merge this change.

@tserong tserong closed this Sep 12, 2023
tserong added a commit to tserong/ceph that referenced this pull request Sep 13, 2023
Currently there is one long lived connection to the sqlite database,
but also many other threads use their own connections (for some
analysis of this, see aquarist-labs#201).

This PR changes all the copies of the storage object to references,
which means we're actually only using one db connection now.  It's a
bit irritating to do this, because it's way too easy to accidentally
make a copy if you leave the '&' off :-/  I'd really want to somehow
disable copy construction of the Storage object, but I didn't figure
out how to do that yet.

One interesting effect of this change is that, prior to this commit,
the SFS status page only showed SQLite stats for the connection from
the status thread, which is not overly helpful.  With this commit
(all threads using the same connection), the figures from the SQLite
stats will actually change over time while s3gw is being used.

Note that as we're building with -DSQLITE_THREADSAFE=1, i.e. we're
using Serialized mode, it's totally cool to have one connection
shared by multiple threads (see https://www.sqlite.org/threadsafe.html)

Signed-off-by: Tim Serong <[email protected]>
tserong added a commit to tserong/ceph that referenced this pull request Sep 19, 2023
Currently there is one long lived connection to the sqlite database,
but also many other threads use their own connections (for some
analysis of this, see aquarist-labs#201).

This PR changes all the copies of the storage object to references,
which means we're actually only using one db connection now.  It's a
bit irritating to do this, because it's way too easy to accidentally
make a copy if you leave the '&' off :-/  I'd really want to somehow
disable copy construction of the Storage object, but I didn't figure
out how to do that yet.

One interesting effect of this change is that, prior to this commit,
the SFS status page only showed SQLite stats for the connection from
the status thread, which is not overly helpful.  With this commit
(all threads using the same connection), the figures from the SQLite
stats will actually change over time while s3gw is being used.

Note that as we're building with -DSQLITE_THREADSAFE=1, i.e. we're
using Serialized mode, it's totally cool to have one connection
shared by multiple threads (see https://www.sqlite.org/threadsafe.html)

Signed-off-by: Tim Serong <[email protected]>
tserong added a commit to tserong/ceph that referenced this pull request Sep 20, 2023
Currently there is one long lived connection to the sqlite database,
but also many other threads use their own connections (for some
analysis of this, see aquarist-labs#201).

This PR changes all the copies of the storage object to references,
which means we're actually only using one db connection now.  It's a
bit irritating to do this, because it's way too easy to accidentally
make a copy if you leave the '&' off :-/  I'd really want to somehow
disable copy construction of the Storage object, but I didn't figure
out how to do that yet.

One interesting effect of this change is that, prior to this commit,
the SFS status page only showed SQLite stats for the connection from
the status thread, which is not overly helpful.  With this commit
(all threads using the same connection), the figures from the SQLite
stats will actually change over time while s3gw is being used.

Note that as we're building with -DSQLITE_THREADSAFE=1, i.e. we're
using Serialized mode, it's totally cool to have one connection
shared by multiple threads (see https://www.sqlite.org/threadsafe.html)

Signed-off-by: Tim Serong <[email protected]>
tserong added a commit to tserong/ceph that referenced this pull request Sep 20, 2023
Currently there is one long lived connection to the sqlite database,
but also many other threads use their own connections (for some
analysis of this, see aquarist-labs#201).

This PR changes all the copies of the storage object to references,
which means we're actually only using one db connection now.  It's a
bit irritating to do this, because it's way too easy to accidentally
make a copy if you leave the '&' off :-/  I'd really want to somehow
disable copy construction of the Storage object, but I didn't figure
out how to do that yet.

One interesting effect of this change is that, prior to this commit,
the SFS status page only showed SQLite stats for the connection from
the status thread, which is not overly helpful.  With this commit
(all threads using the same connection), the figures from the SQLite
stats will actually change over time while s3gw is being used.

Note that as we're building with -DSQLITE_THREADSAFE=1, i.e. we're
using Serialized mode, it's totally cool to have one connection
shared by multiple threads (see https://www.sqlite.org/threadsafe.html)

Signed-off-by: Tim Serong <[email protected]>
tserong added a commit to tserong/ceph that referenced this pull request Sep 25, 2023
Currently there is one long lived connection to the sqlite database,
but also many other threads use their own connections (for some
analysis of this, see aquarist-labs#201).

This PR changes all the copies of the storage object to references,
which means we're actually only using one db connection now.  It's a
bit irritating to do this, because it's way too easy to accidentally
make a copy if you leave the '&' off :-/  I'd really want to somehow
disable copy construction of the Storage object, but I didn't figure
out how to do that yet.

One interesting effect of this change is that, prior to this commit,
the SFS status page only showed SQLite stats for the connection from
the status thread, which is not overly helpful.  With this commit
(all threads using the same connection), the figures from the SQLite
stats will actually change over time while s3gw is being used.

Note that as we're building with -DSQLITE_THREADSAFE=1, i.e. we're
using Serialized mode, it's totally cool to have one connection
shared by multiple threads (see https://www.sqlite.org/threadsafe.html)

Signed-off-by: Tim Serong <[email protected]>
tserong added a commit to tserong/ceph that referenced this pull request Oct 14, 2023
Currently there is one long lived connection to the sqlite database,
but also many other threads use their own connections (for some
analysis of this, see aquarist-labs#201).

This PR changes all the copies of the storage object to references,
which means we're actually only using one db connection now.  It's a
bit irritating to do this, because it's way too easy to accidentally
make a copy if you leave the '&' off :-/  I'd really want to somehow
disable copy construction of the Storage object, but I didn't figure
out how to do that yet.

One interesting effect of this change is that, prior to this commit,
the SFS status page only showed SQLite stats for the connection from
the status thread, which is not overly helpful.  With this commit
(all threads using the same connection), the figures from the SQLite
stats will actually change over time while s3gw is being used.

Note that as we're building with -DSQLITE_THREADSAFE=1, i.e. we're
using Serialized mode, it's totally cool to have one connection
shared by multiple threads (see https://www.sqlite.org/threadsafe.html)

Signed-off-by: Tim Serong <[email protected]>
tserong added a commit to tserong/ceph that referenced this pull request Oct 17, 2023
Currently there is one long lived connection to the sqlite database,
but also many other threads use their own connections (for some
analysis of this, see aquarist-labs#201).

This PR changes all the copies of the storage object to references,
which means we're actually only using one db connection now.  It's a
bit irritating to do this, because it's way too easy to accidentally
make a copy if you leave the '&' off :-/  I'd really want to somehow
disable copy construction of the Storage object, but I didn't figure
out how to do that yet.

One interesting effect of this change is that, prior to this commit,
the SFS status page only showed SQLite stats for the connection from
the status thread, which is not overly helpful.  With this commit
(all threads using the same connection), the figures from the SQLite
stats will actually change over time while s3gw is being used.

Note that as we're building with -DSQLITE_THREADSAFE=1, i.e. we're
using Serialized mode, it's totally cool to have one connection
shared by multiple threads (see https://www.sqlite.org/threadsafe.html)

Signed-off-by: Tim Serong <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
ci/build-s3gw-image Build and push a container image
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants