-
Notifications
You must be signed in to change notification settings - Fork 217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use a single-striped connection pool for each database layer instead of a single shared connection #2416
Conversation
lib/core/src/Cardano/DB/Sqlite.hs
Outdated
let createConnection = do | ||
let info = mkSqliteConnectionInfo connStr | ||
conn <- Sqlite.open connStr | ||
executeManualMigration migration conn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that I originally executed the migration outside of the createConnection
with a bracket open / close / executeMigration ... and it was mostly fine except for database views in the DB.Pool. The database property tests would fail with a no such table: <name of the view>
as if the migration had not been executed.
I have looked a bit in the documentation but didn't find anything that suggested that views are scoped to a single connection but it seems to be the case. At least, it should be safe running migration on every connection anyway... but still strange.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. It seems preferable for the migrations to be executed before other pool connections are allowed. Perhaps there needs to be a transaction commit or something?
7619e4c
to
e559e43
Compare
bors try |
tryBuild failed:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! This will help a lot. I don't know why we didn't do this earlier, since resource-pool is so easy to use. I actually assumed that the API server got to use a different DB connection from the restore thread - oops, wrong.
lib/core/src/Cardano/DB/Sqlite.hs
Outdated
-- unmasked with the acquired resource. If an asynchronous exception occurs, | ||
-- the resource is NOT placed back in the pool. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docs say that an exception of any type means that the resource is not placed back in the pool.
This will be quite helpful for error recovery I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope.
lib/core/src/Cardano/DB/Sqlite.hs
Outdated
let createConnection = do | ||
let info = mkSqliteConnectionInfo connStr | ||
conn <- Sqlite.open connStr | ||
executeManualMigration migration conn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. It seems preferable for the migrations to be executed before other pool connections are allowed. Perhaps there needs to be a transaction commit or something?
createConnection | ||
destroyConnection | ||
numberOfStripes | ||
timeToLive |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the pool thread is cancelled (i.e. wallet is deleted or server needs to exit), those lingering pool connections are going to get cleaned up promptly, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I swapped the destroyDBLayer
function with:
destroyDBLayer :: Tracer IO DBLog -> SqliteContext -> IO ()
destroyDBLayer tr SqliteContext{connectionPool,dbFile} = do
traceWith tr (MsgDestroyConnectionPool dbFile)
destroyAllResources connectionPool
So this is done as part of the withDBLayer
bracket. The same quirks as before with regards to closing a single SQLite connection still applies but destroyConnection
now does exactly what destroyDBLayer
was doing, that is, retry for some cherry-picked exceptions and handle already closed connections as well.
Yes. Sometimes we forget the basics. |
bors merge |
2416: Use a single-striped connection pool for each database layer instead of a single shared connection r=KtorZ a=KtorZ # Issue Number <!-- Put here a reference to the issue that this PR relates to and which requirements it tackles. Jira issues of the form ADP- will be auto-linked. --> ADP-586 # Overview <!-- Detail in a few bullet points the work accomplished in this PR --> - 73df15f 📍 **use a single-striped connection pool for each database layer** It is a rather common practice to use a pool of database connection when dealing with databases. So far, we've been using a single shared connection per wallet worker with, in front of each connection a lock preventing concurrent access to the database. The lock is only necessary because of the way persistent handles query statements internally, in principle, SQLite handles concurrent database accesses just well. For basic wallets, this is a relatively useless change. But for larger wallets like those manipulated by exchanges, we've observed very slow response time due to concurrent access of the database lock. Indeed, some requests may grab the lock for 10 or 20 seconds, preventing any requests from going throug. However, most requests are read-only requests and could be executed in parallel, at the discretion of the SQLite engine. I hope that the introduction of a connection pool will improve the overall experience for large wallets by better serving concurrent requests on the database. Finger crossed. # Comments <!-- Additional comments or screenshots to attach if any --> <!-- Don't forget to: ✓ Self-review your changes to make sure nothing unexpected slipped through ✓ Assign yourself to the PR ✓ Assign one or several reviewer(s) ✓ Jira will detect and link to this PR once created, but you can also link this PR in the description of the corresponding ticket ✓ Acknowledge any changes required to the Wiki ✓ Finally, in the PR description delete any empty sections and all text commented in <!--, so that this text does not appear in merge commit messages. --> Co-authored-by: KtorZ <[email protected]> Co-authored-by: IOHK <[email protected]>
Build failed: #expected |
☝️ looks pretty serious, it's full chaos in the integration cluster. My guess is the database reference counter not counting properly since the mutable counter only count |
lib/core/src/Cardano/DB/Sqlite.hs
Outdated
-- ^ A handle to the Persistent SQL backend. | ||
, isDatabaseActive :: TVar Bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A link to https://www.sqlite.org/wal.html#busy might help.
224f675
to
1b088a4
Compare
bors merge |
2416: Use a single-striped connection pool for each database layer instead of a single shared connection r=KtorZ a=KtorZ # Issue Number <!-- Put here a reference to the issue that this PR relates to and which requirements it tackles. Jira issues of the form ADP- will be auto-linked. --> ADP-586 # Overview <!-- Detail in a few bullet points the work accomplished in this PR --> - 73df15f 📍 **use a single-striped connection pool for each database layer** It is a rather common practice to use a pool of database connection when dealing with databases. So far, we've been using a single shared connection per wallet worker with, in front of each connection a lock preventing concurrent access to the database. The lock is only necessary because of the way persistent handles query statements internally, in principle, SQLite handles concurrent database accesses just well. For basic wallets, this is a relatively useless change. But for larger wallets like those manipulated by exchanges, we've observed very slow response time due to concurrent access of the database lock. Indeed, some requests may grab the lock for 10 or 20 seconds, preventing any requests from going throug. However, most requests are read-only requests and could be executed in parallel, at the discretion of the SQLite engine. I hope that the introduction of a connection pool will improve the overall experience for large wallets by better serving concurrent requests on the database. Finger crossed. # Comments <!-- Additional comments or screenshots to attach if any --> <!-- Don't forget to: ✓ Self-review your changes to make sure nothing unexpected slipped through ✓ Assign yourself to the PR ✓ Assign one or several reviewer(s) ✓ Jira will detect and link to this PR once created, but you can also link this PR in the description of the corresponding ticket ✓ Acknowledge any changes required to the Wiki ✓ Finally, in the PR description delete any empty sections and all text commented in <!--, so that this text does not appear in merge commit messages. --> Co-authored-by: KtorZ <[email protected]> Co-authored-by: IOHK <[email protected]>
bors r- |
Canceled. |
Forgot to push the latest change :| |
1b088a4
to
2f5246c
Compare
bors try |
tryBuild failed: #expected: extra dependencies detected by weeder |
2f5246c
to
d8f3c74
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent, thanks for this!
If I run the integration tests with NO_CLEANUP=1
, then ctrl-c them, there are no longer -shm
and -wal
files left behind in the tests temporary directory. 🎉
Even better, wallets which were supposed to be deleted, actually seem to have been deleted.
lib/core/src/Cardano/DB/Sqlite.hs
Outdated
-- runSqlConn is guarded with a lock because it's not threadsafe in | ||
-- general.It is also masked, so that the SqlBackend state is not | ||
-- corrupted if a thread gets cancelled while running a query. | ||
-- See: https://github.com/yesodweb/persistent/issues/981 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps in future we could drop the lock and mask_?
- If we are confident that any pool connection will only be used by one thread at a time, then the lock isn't needed.
- If the thread is cancelled during a database operation, then so be it. SQLite will rollback the uncommitted transaction. The connection won't be returned to the pool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(1) is only true if we can start the database in full_mutex I guess, which persistent doesn't really allow us to set that flag (although, we've forked it so it's easily added). This will not prevent retries however in case two writing threads are trying to write something at the same time, but at least it should give more concurrency to readers 🤔 Also "if we are confident that any pool connection will only be used by one thread at a time" ==> I have no such confidence at this stage unfortunately :(
(2) Hmmm. That's right. Sounds scary though 😱
lib/core/src/Cardano/DB/Sqlite.hs
Outdated
retryOnBusy :: Tracer IO DBLog -> IO a -> IO a | ||
retryOnBusy tr action = | ||
recovering policy (mkRetryHandler isBusy) $ \RetryStatus{rsIterNumber} -> do | ||
when (rsIterNumber > 0) $ traceWith tr (MsgRetryOnBusy rsIterNumber) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that there's logging. With the resource pool improvements. it will be interesting to see if this error ever happens in tests.
Fixed a weeder error and rebased. bors r+ |
It is a rather common practice to use a pool of database connection when dealing with databases. So far, we've been using a single shared connection per wallet worker with, in front of each connection a lock preventing concurrent access to the database. The lock is only necessary because of the way persistent handles query statements internally, in principle, SQLite handles concurrent database accesses just well. For basic wallets, this is a relatively useless change. But for larger wallets like those manipulated by exchanges, we've observed very slow response time due to concurrent access of the database lock. Indeed, some requests may grab the lock for 10 or 20 seconds, preventing any requests from going throug. However, most requests are read-only requests and could be executed in parallel, at the discretion of the SQLite engine. I hope that the introduction of a connection pool will improve the overall experience for large wallets by better serving concurrent requests on the database. Finger crossed.
I ran into quite a few issues with the integration tests since the unliftio merge and rebase (I think, as I am pretty I did observe unit and integration tests doing just fine with the resource pool at least once). I've been investigating this for most of the day, and found a few interesting cases: (a) SQLite may return 'SQLITE_BUSY' on pretty much any requests if two concurrent write queries hit the engine; though we currently only catch this kind of exception when we try closing the database so I generalized a bit our error handling here. (b) It seems that calling destroyAllResources from resource-pool does not prevent new threads from acquiring new resources. And there's no way with the resource-pool library itself to prevent the creation of new resources after a certain point. So it may happen that while the database layer is being destroyed, new database connections are created and start causing conflicts between each others.
This avoids the need for an extra 'TVar Bool' to guard the connection pool from threads whishing to acquire new resources. Instead, we can wrap the pool acquisition in a bracket: `bracket createPool destroyAllResources` so that the pool is cleaned up when done and we are sure that no thread will attempt to acquire a new resource while destroyAllResources is called. This sole change wasn't as straightforward as I wanted because it moves the control of the `SqliteContext` up in the stack and therefore requires reviewing many more parts of both the pool and wallet db layers. I think it's for a greater good in the end and make them both slightly better / robust. In the end, it is still a bit "awkward" that we have constructors / functions in those modules that are solely used by the test code and not by the actual application (this is the case of 'withDBLayer' for instance...). To not over-complicate things, I ended up handling the in-memory and in-file SqliteContext setup a bit differently. Incidentally I realized later that we run most of our unit-tests on the 'in-memory' version; which means that we aren't testing the resource pool in the context of the unit tests. I am not sure whether this is a good thing or not: it makes the unit tests a bit more focus on testing the actual business logic, and we still have the system-level integration tests to put the resource pool under great stress.
b3059d0
to
0cec0b5
Compare
This function will be useful in criterion benchmarks. It also fixes handling of exceptions while allocating resources.
If the thread is cancelled, we want the query stopped immediately. Anything uncommitted will be rolled back by persistent.
The deleteWallet handler opens a database connection then tries to remove the database before closing the connection. But the sqlite removeDatabase function waits for all connections to be closed first. The livelock is only broken after the 1 minute timeout in removeDatabase.
It was bothering me.
And add logging of database checkpoint cache
0cec0b5
to
7729917
Compare
Rebased and fixed merge conflicts again.... bors r+ |
2416: Use a single-striped connection pool for each database layer instead of a single shared connection r=rvl a=KtorZ # Issue Number <!-- Put here a reference to the issue that this PR relates to and which requirements it tackles. Jira issues of the form ADP- will be auto-linked. --> ADP-586 # Overview <!-- Detail in a few bullet points the work accomplished in this PR --> - 73df15f 📍 **use a single-striped connection pool for each database layer** It is a rather common practice to use a pool of database connection when dealing with databases. So far, we've been using a single shared connection per wallet worker with, in front of each connection a lock preventing concurrent access to the database. The lock is only necessary because of the way persistent handles query statements internally, in principle, SQLite handles concurrent database accesses just well. For basic wallets, this is a relatively useless change. But for larger wallets like those manipulated by exchanges, we've observed very slow response time due to concurrent access of the database lock. Indeed, some requests may grab the lock for 10 or 20 seconds, preventing any requests from going throug. However, most requests are read-only requests and could be executed in parallel, at the discretion of the SQLite engine. I hope that the introduction of a connection pool will improve the overall experience for large wallets by better serving concurrent requests on the database. Finger crossed. # Comments <!-- Additional comments or screenshots to attach if any --> <!-- Don't forget to: ✓ Self-review your changes to make sure nothing unexpected slipped through ✓ Assign yourself to the PR ✓ Assign one or several reviewer(s) ✓ Jira will detect and link to this PR once created, but you can also link this PR in the description of the corresponding ticket ✓ Acknowledge any changes required to the Wiki ✓ Finally, in the PR description delete any empty sections and all text commented in <!--, so that this text does not appear in merge commit messages. --> Co-authored-by: KtorZ <[email protected]> Co-authored-by: Rodney Lorrimar <[email protected]>
Build failed:
|
Please bors r+ |
Build succeeded: |
Issue Number
ADP-586
Overview
73df15f
📍 use a single-striped connection pool for each database layer
It is a rather common practice to use a pool of database connection
when dealing with databases. So far, we've been using a single shared
connection per wallet worker with, in front of each connection a lock
preventing concurrent access to the database. The lock is only
necessary because of the way persistent handles query statements
internally, in principle, SQLite handles concurrent database accesses
just well.
For basic wallets, this is a relatively useless change. But for larger
wallets like those manipulated by exchanges, we've observed very slow
response time due to concurrent access of the database lock. Indeed,
some requests may grab the lock for 10 or 20 seconds, preventing any
requests from going throug. However, most requests are read-only
requests and could be executed in parallel, at the discretion of
the SQLite engine. I hope that the introduction of a connection pool
will improve the overall experience for large wallets by better
serving concurrent requests on the database. Finger crossed.
Comments