Use a single-striped connection pool for each database layer instead of a single shared connection #2416

KtorZ · 2020-12-31T15:52:08Z

Issue Number

Overview

73df15f
📍 use a single-striped connection pool for each database layer

It is a rather common practice to use a pool of database connection
when dealing with databases. So far, we've been using a single shared
connection per wallet worker with, in front of each connection a lock
preventing concurrent access to the database. The lock is only
necessary because of the way persistent handles query statements
internally, in principle, SQLite handles concurrent database accesses
just well.

For basic wallets, this is a relatively useless change. But for larger
wallets like those manipulated by exchanges, we've observed very slow
response time due to concurrent access of the database lock. Indeed,
some requests may grab the lock for 10 or 20 seconds, preventing any
requests from going throug. However, most requests are read-only
requests and could be executed in parallel, at the discretion of
the SQLite engine. I hope that the introduction of a connection pool
will improve the overall experience for large wallets by better
serving concurrent requests on the database. Finger crossed.

Comments

KtorZ · 2020-12-31T15:55:46Z

lib/core/src/Cardano/DB/Sqlite.hs

+    let createConnection = do
+            let info = mkSqliteConnectionInfo connStr
+            conn <- Sqlite.open connStr
+            executeManualMigration migration conn


Note that I originally executed the migration outside of the createConnection with a bracket open / close / executeMigration ... and it was mostly fine except for database views in the DB.Pool. The database property tests would fail with a no such table: <name of the view> as if the migration had not been executed.

I have looked a bit in the documentation but didn't find anything that suggested that views are scoped to a single connection but it seems to be the case. At least, it should be safe running migration on every connection anyway... but still strange.

Hmm. It seems preferable for the migrations to be executed before other pool connections are allowed. Perhaps there needs to be a transaction commit or something?

KtorZ · 2020-12-31T16:05:42Z

bors try

iohk-bors · 2020-12-31T16:38:05Z

try

Build failed:

ci/hydra-build:required

  test/unit/Cardano/Wallet/DB/StateMachine.hs:1333:28:
  1) Cardano.Wallet.DB.Sqlite, Validate generators & shrinkers, Shrinker for CreateWallet
       Timed out.

#2393

rvl

Great! This will help a lot. I don't know why we didn't do this earlier, since resource-pool is so easy to use. I actually assumed that the API server got to use a different DB connection from the restore thread - oops, wrong.

rvl · 2021-01-02T03:29:07Z

lib/core/src/Cardano/DB/Sqlite.hs

+    -- unmasked with the acquired resource. If an asynchronous exception occurs,
+    -- the resource is NOT placed back in the pool.


Docs say that an exception of any type means that the resource is not placed back in the pool.
This will be quite helpful for error recovery I think.

rvl · 2021-01-02T03:38:34Z

lib/core/src/Cardano/DB/Sqlite.hs

+    let createConnection = do
+            let info = mkSqliteConnectionInfo connStr
+            conn <- Sqlite.open connStr
+            executeManualMigration migration conn


Hmm. It seems preferable for the migrations to be executed before other pool connections are allowed. Perhaps there needs to be a transaction commit or something?

rvl · 2021-01-02T03:41:48Z

lib/core/src/Cardano/DB/Sqlite.hs

+        createConnection
+        destroyConnection
+        numberOfStripes
+        timeToLive


If the pool thread is cancelled (i.e. wallet is deleted or server needs to exit), those lingering pool connections are going to get cleaned up promptly, right?

Yes. I swapped the destroyDBLayer function with:

destroyDBLayer :: Tracer IO DBLog -> SqliteContext -> IO () destroyDBLayer tr SqliteContext{connectionPool,dbFile} = do traceWith tr (MsgDestroyConnectionPool dbFile) destroyAllResources connectionPool

So this is done as part of the withDBLayer bracket. The same quirks as before with regards to closing a single SQLite connection still applies but destroyConnection now does exactly what destroyDBLayer was doing, that is, retry for some cherry-picked exceptions and handle already closed connections as well.

lib/core/src/Cardano/DB/Sqlite.hs

KtorZ · 2021-01-04T09:27:38Z

I don't know why we didn't do this earlier

Yes. Sometimes we forget the basics.

KtorZ · 2021-01-04T09:27:43Z

bors merge

2416: Use a single-striped connection pool for each database layer instead of a single shared connection r=KtorZ a=KtorZ # Issue Number  ADP-586 # Overview  - 73df15f 📍 **use a single-striped connection pool for each database layer** It is a rather common practice to use a pool of database connection when dealing with databases. So far, we've been using a single shared connection per wallet worker with, in front of each connection a lock preventing concurrent access to the database. The lock is only necessary because of the way persistent handles query statements internally, in principle, SQLite handles concurrent database accesses just well. For basic wallets, this is a relatively useless change. But for larger wallets like those manipulated by exchanges, we've observed very slow response time due to concurrent access of the database lock. Indeed, some requests may grab the lock for 10 or 20 seconds, preventing any requests from going throug. However, most requests are read-only requests and could be executed in parallel, at the discretion of the SQLite engine. I hope that the introduction of a connection pool will improve the overall experience for large wallets by better serving concurrent requests on the database. Finger crossed. # Comments   Co-authored-by: KtorZ <[email protected]> Co-authored-by: IOHK <[email protected]>

iohk-bors · 2021-01-04T10:01:08Z

Build failed:

ci/hydra-build:required

#expected

KtorZ · 2021-01-04T12:47:38Z

☝️ looks pretty serious, it's full chaos in the integration cluster. My guess is the database reference counter not counting properly since the mutable counter only count DBLayer but not individual connections so we are back with race-condition with regards to database deletion.

rvl · 2021-01-05T10:40:09Z

lib/core/src/Cardano/DB/Sqlite.hs

    -- ^ A handle to the Persistent SQL backend.
+    , isDatabaseActive :: TVar Bool


A link to https://www.sqlite.org/wal.html#busy might help.

KtorZ · 2021-01-05T12:55:30Z

bors merge

2416: Use a single-striped connection pool for each database layer instead of a single shared connection r=KtorZ a=KtorZ # Issue Number  ADP-586 # Overview  - 73df15f 📍 **use a single-striped connection pool for each database layer** It is a rather common practice to use a pool of database connection when dealing with databases. So far, we've been using a single shared connection per wallet worker with, in front of each connection a lock preventing concurrent access to the database. The lock is only necessary because of the way persistent handles query statements internally, in principle, SQLite handles concurrent database accesses just well. For basic wallets, this is a relatively useless change. But for larger wallets like those manipulated by exchanges, we've observed very slow response time due to concurrent access of the database lock. Indeed, some requests may grab the lock for 10 or 20 seconds, preventing any requests from going throug. However, most requests are read-only requests and could be executed in parallel, at the discretion of the SQLite engine. I hope that the introduction of a connection pool will improve the overall experience for large wallets by better serving concurrent requests on the database. Finger crossed. # Comments   Co-authored-by: KtorZ <[email protected]> Co-authored-by: IOHK <[email protected]>

KtorZ · 2021-01-05T12:55:54Z

bors r-

iohk-bors · 2021-01-05T12:55:56Z

Canceled.

KtorZ · 2021-01-05T12:56:12Z

Forgot to push the latest change :|

KtorZ · 2021-01-05T17:25:01Z

bors try

iohk-bors · 2021-01-05T17:59:10Z

try

Build failed:

buildkite/cardano-wallet

#expected: extra dependencies detected by weeder

rvl

Excellent, thanks for this!

If I run the integration tests with NO_CLEANUP=1, then ctrl-c them, there are no longer -shm and -wal files left behind in the tests temporary directory. 🎉

Even better, wallets which were supposed to be deleted, actually seem to have been deleted.

rvl · 2021-01-06T06:16:04Z

lib/core/src/Cardano/DB/Sqlite.hs

+               -- runSqlConn is guarded with a lock because it's not threadsafe in
+               -- general.It is also masked, so that the SqlBackend state is not
+               -- corrupted if a thread gets cancelled while running a query.
+               -- See: https://github.com/yesodweb/persistent/issues/981


Perhaps in future we could drop the lock and mask_?

If we are confident that any pool connection will only be used by one thread at a time, then the lock isn't needed.

If the thread is cancelled during a database operation, then so be it. SQLite will rollback the uncommitted transaction. The connection won't be returned to the pool.

(1) is only true if we can start the database in full_mutex I guess, which persistent doesn't really allow us to set that flag (although, we've forked it so it's easily added). This will not prevent retries however in case two writing threads are trying to write something at the same time, but at least it should give more concurrency to readers 🤔 Also "if we are confident that any pool connection will only be used by one thread at a time" ==> I have no such confidence at this stage unfortunately :(

(2) Hmmm. That's right. Sounds scary though 😱

rvl · 2021-01-06T06:17:56Z

lib/core/src/Cardano/DB/Sqlite.hs

+retryOnBusy :: Tracer IO DBLog -> IO a -> IO a
+retryOnBusy tr action =
+    recovering policy (mkRetryHandler isBusy) $ \RetryStatus{rsIterNumber} -> do
+        when (rsIterNumber > 0) $ traceWith tr (MsgRetryOnBusy rsIterNumber)


I like that there's logging. With the resource pool improvements. it will be interesting to see if this error ever happens in tests.

rvl · 2021-01-06T06:28:32Z

Fixed a weeder error and rebased.

bors r+

It is a rather common practice to use a pool of database connection when dealing with databases. So far, we've been using a single shared connection per wallet worker with, in front of each connection a lock preventing concurrent access to the database. The lock is only necessary because of the way persistent handles query statements internally, in principle, SQLite handles concurrent database accesses just well. For basic wallets, this is a relatively useless change. But for larger wallets like those manipulated by exchanges, we've observed very slow response time due to concurrent access of the database lock. Indeed, some requests may grab the lock for 10 or 20 seconds, preventing any requests from going throug. However, most requests are read-only requests and could be executed in parallel, at the discretion of the SQLite engine. I hope that the introduction of a connection pool will improve the overall experience for large wallets by better serving concurrent requests on the database. Finger crossed.

I ran into quite a few issues with the integration tests since the unliftio merge and rebase (I think, as I am pretty I did observe unit and integration tests doing just fine with the resource pool at least once). I've been investigating this for most of the day, and found a few interesting cases: (a) SQLite may return 'SQLITE_BUSY' on pretty much any requests if two concurrent write queries hit the engine; though we currently only catch this kind of exception when we try closing the database so I generalized a bit our error handling here. (b) It seems that calling destroyAllResources from resource-pool does not prevent new threads from acquiring new resources. And there's no way with the resource-pool library itself to prevent the creation of new resources after a certain point. So it may happen that while the database layer is being destroyed, new database connections are created and start causing conflicts between each others.

This avoids the need for an extra 'TVar Bool' to guard the connection pool from threads whishing to acquire new resources. Instead, we can wrap the pool acquisition in a bracket: `bracket createPool destroyAllResources` so that the pool is cleaned up when done and we are sure that no thread will attempt to acquire a new resource while destroyAllResources is called. This sole change wasn't as straightforward as I wanted because it moves the control of the `SqliteContext` up in the stack and therefore requires reviewing many more parts of both the pool and wallet db layers. I think it's for a greater good in the end and make them both slightly better / robust. In the end, it is still a bit "awkward" that we have constructors / functions in those modules that are solely used by the test code and not by the actual application (this is the case of 'withDBLayer' for instance...). To not over-complicate things, I ended up handling the in-memory and in-file SqliteContext setup a bit differently. Incidentally I realized later that we run most of our unit-tests on the 'in-memory' version; which means that we aren't testing the resource pool in the context of the unit tests. I am not sure whether this is a good thing or not: it makes the unit tests a bit more focus on testing the actual business logic, and we still have the system-level integration tests to put the resource pool under great stress.

This function will be useful in criterion benchmarks. It also fixes handling of exceptions while allocating resources.

If the thread is cancelled, we want the query stopped immediately. Anything uncommitted will be rolled back by persistent.

The deleteWallet handler opens a database connection then tries to remove the database before closing the connection. But the sqlite removeDatabase function waits for all connections to be closed first. The livelock is only broken after the 1 minute timeout in removeDatabase.

It was bothering me.

And add logging of database checkpoint cache

rvl · 2021-03-09T09:12:46Z

Rebased and fixed merge conflicts again....

bors r+

2416: Use a single-striped connection pool for each database layer instead of a single shared connection r=rvl a=KtorZ # Issue Number  ADP-586 # Overview  - 73df15f 📍 **use a single-striped connection pool for each database layer** It is a rather common practice to use a pool of database connection when dealing with databases. So far, we've been using a single shared connection per wallet worker with, in front of each connection a lock preventing concurrent access to the database. The lock is only necessary because of the way persistent handles query statements internally, in principle, SQLite handles concurrent database accesses just well. For basic wallets, this is a relatively useless change. But for larger wallets like those manipulated by exchanges, we've observed very slow response time due to concurrent access of the database lock. Indeed, some requests may grab the lock for 10 or 20 seconds, preventing any requests from going throug. However, most requests are read-only requests and could be executed in parallel, at the discretion of the SQLite engine. I hope that the introduction of a connection pool will improve the overall experience for large wallets by better serving concurrent requests on the database. Finger crossed. # Comments   Co-authored-by: KtorZ <[email protected]> Co-authored-by: Rodney Lorrimar <[email protected]>

iohk-bors · 2021-03-09T09:58:58Z

Build failed:

buildkite/cardano-wallet

Failures:

  src/Test/Integration/Scenario/API/Shelley/Transactions.hs:2140:27:
  1) API Specifications, SHELLEY_TRANSACTIONS, TRANS_LIST_02,03x - Can limit/order results with start, end and order
       While verifying (Status {statusCode = 200, statusMessage = "OK"},Right [])
       expected: 1
        but got: 0

  To rerun use: --match "/API Specifications/SHELLEY_TRANSACTIONS/TRANS_LIST_02,03x - Can limit/order results with start, end and order/"

Randomized with seed 64164505

Finished in 1651.8216 seconds
756 examples, 1 failure, 40 pending

rvl · 2021-03-10T02:01:22Z

Please

bors r+

iohk-bors · 2021-03-10T03:10:19Z

Build succeeded:

KtorZ added the IMPROVEMENT Mark a PR as an improvement, for auto-generated CHANGELOG label Dec 31, 2020

KtorZ requested a review from rvl December 31, 2020 15:52

KtorZ self-assigned this Dec 31, 2020

KtorZ commented Dec 31, 2020

View reviewed changes

KtorZ force-pushed the KtorZ/ADP-586/database-connection-pool branch from 7619e4c to e559e43 Compare December 31, 2020 16:05

iohk-bors bot added a commit that referenced this pull request Dec 31, 2020

Try #2416:

51098b8

rvl approved these changes Jan 2, 2021

View reviewed changes

rvl reviewed Jan 5, 2021

View reviewed changes

KtorZ force-pushed the KtorZ/ADP-586/database-connection-pool branch 3 times, most recently from 224f675 to 1b088a4 Compare January 5, 2021 11:32

KtorZ force-pushed the KtorZ/ADP-586/database-connection-pool branch from 1b088a4 to 2f5246c Compare January 5, 2021 17:24

iohk-bors bot added a commit that referenced this pull request Jan 5, 2021

Try #2416:

0f3638e

rvl force-pushed the KtorZ/ADP-586/database-connection-pool branch from 2f5246c to d8f3c74 Compare January 6, 2021 06:10

rvl approved these changes Jan 6, 2021

View reviewed changes

KtorZ added 3 commits March 9, 2021 13:40

rvl force-pushed the KtorZ/ADP-586/database-connection-pool branch from b3059d0 to 0cec0b5 Compare March 9, 2021 08:28

rvl added 18 commits March 9, 2021 18:52

Regenerate nix

037e95e

newConnectionPool -> withConnectionPool

da6afb3

Add tests for aroundAll

6009b5d

Split Test.Utils.Resource.unBracket from Test.Hspec.Extra.aroundAll

660f63e

This function will be useful in criterion benchmarks. It also fixes handling of exceptions while allocating resources.

Let bench:db work using withConnectionPool

8ef586c

Straighten out connection pool logging a little

400a69e

No need for mask in runQuery

6d53144

If the thread is cancelled, we want the query stopped immediately. Anything uncommitted will be rolled back by persistent.

Elide retryOnBusy logging

16f3564

Fix MsgApplyBlocks logging

96b08d3

It was bothering me.

Simplify sqlite connection setup

8f5e54c

Switch checkpoint cache from IORef to MVar

ac11b50

Straighten out SqliteContext

ffc0a30

Lightly refactor checkpoint cache

1e27df5

Decouple DBFactory from DBLayer a little

201b670

DBLayer: Attempt to make the checkpoint cache threadsafe

f925eed

And add logging of database checkpoint cache

Add a query lock temporarily to let the checkpoint cache work

b062938

Properly resolve merge conflicts from rebase

7729917

rvl force-pushed the KtorZ/ADP-586/database-connection-pool branch from 0cec0b5 to 7729917 Compare March 9, 2021 08:54

iohk-bors bot merged commit 5379123 into master Mar 10, 2021

iohk-bors bot deleted the KtorZ/ADP-586/database-connection-pool branch March 10, 2021 03:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use a single-striped connection pool for each database layer instead of a single shared connection #2416

Use a single-striped connection pool for each database layer instead of a single shared connection #2416

KtorZ commented Dec 31, 2020

KtorZ Dec 31, 2020

rvl Jan 2, 2021

KtorZ commented Dec 31, 2020

iohk-bors bot commented Dec 31, 2020 •

edited by KtorZ

Loading

rvl left a comment

rvl Jan 2, 2021

KtorZ Jan 4, 2021

rvl Jan 2, 2021

rvl Jan 2, 2021

KtorZ Jan 4, 2021

KtorZ commented Jan 4, 2021

KtorZ commented Jan 4, 2021

iohk-bors bot commented Jan 4, 2021 •

edited by KtorZ

Loading

KtorZ commented Jan 4, 2021

rvl Jan 5, 2021

KtorZ commented Jan 5, 2021

KtorZ commented Jan 5, 2021

iohk-bors bot commented Jan 5, 2021

KtorZ commented Jan 5, 2021

KtorZ commented Jan 5, 2021

iohk-bors bot commented Jan 5, 2021 •

edited by KtorZ

Loading

rvl left a comment

rvl Jan 6, 2021

KtorZ Jan 6, 2021

rvl Jan 6, 2021

rvl commented Jan 6, 2021

rvl commented Mar 9, 2021

iohk-bors bot commented Mar 9, 2021 •

edited by rvl

Loading

rvl commented Mar 10, 2021

iohk-bors bot commented Mar 10, 2021

		-- unmasked with the acquired resource. If an asynchronous exception occurs,
		-- the resource is NOT placed back in the pool.

		-- ^ A handle to the Persistent SQL backend.
		, isDatabaseActive :: TVar Bool

Use a single-striped connection pool for each database layer instead of a single shared connection #2416

Use a single-striped connection pool for each database layer instead of a single shared connection #2416

Conversation

KtorZ commented Dec 31, 2020

Issue Number

Overview

Comments

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KtorZ commented Dec 31, 2020

iohk-bors bot commented Dec 31, 2020 • edited by KtorZ Loading

try

rvl left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KtorZ commented Jan 4, 2021

KtorZ commented Jan 4, 2021

iohk-bors bot commented Jan 4, 2021 • edited by KtorZ Loading

KtorZ commented Jan 4, 2021

Choose a reason for hiding this comment

KtorZ commented Jan 5, 2021

KtorZ commented Jan 5, 2021

iohk-bors bot commented Jan 5, 2021

KtorZ commented Jan 5, 2021

KtorZ commented Jan 5, 2021

iohk-bors bot commented Jan 5, 2021 • edited by KtorZ Loading

try

rvl left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rvl commented Jan 6, 2021

rvl commented Mar 9, 2021

iohk-bors bot commented Mar 9, 2021 • edited by rvl Loading

rvl commented Mar 10, 2021

iohk-bors bot commented Mar 10, 2021

iohk-bors bot commented Dec 31, 2020 •

edited by KtorZ

Loading

iohk-bors bot commented Jan 4, 2021 •

edited by KtorZ

Loading

iohk-bors bot commented Jan 5, 2021 •

edited by KtorZ

Loading

iohk-bors bot commented Mar 9, 2021 •

edited by rvl

Loading