AccessHandlePoolVFS with multiple connections #157
Replies: 1 comment 1 reply
-
The implementation was a bit tricky, but there is a shiny new AccessHandlePoolVFS (update: now a new VFS named OPFSCoopSyncVFS) in the dev branch. You can try it online at this link on any current browser and with multiple tabs. The design sketch in the top post mostly works, except that open_v2(), prepare(), and step() all must allow retrying, and in at least one case SQLite will convert SQLITE_BUSY returned by the VFS to SQLITE_ERROR. The tricky parts were mostly in lazy resource release, i.e. only with multiple connection contention so things are still fast without contention. Isn't it ironic how being lazy creates the most work? This VFS is really fast at read transactions with low contention. Here are results of doing as many read transactions as possible in one second with 1 database connection:
That's 60K read transactions per second. Note that this isn't a throughput test - SQLite is reading the database file on every transactions, but only enough to confirm that its cache is still valid. It should also be great with throughput, but this is more of a transaction overhead and latency test with minimal data volume. Here's the same measurement with two connections:
Now it's only doing about 1.5K read transactions per second with contention. It works with contention, but it's a lot slower. For comparison, here is IDBBatchAtomicVFS (also reimplemented in the dev branch) with 1 connection:
And IDBBatchAtomicVFS with 2 connections:
IDBBatchAtomicVFS is much slower than OPFSCoopSyncVFS without contention because (1) it doesn't use lazy locking, (2) OPFS access handles are faster than IndexedDB, and (3) synchronous WebAssembly is faster than Asyncify. But IDBBatchAtomicVFS actually speeds up in total transactions per second with more read connections because it allows concurrent reads. OPFSCoopSyncVFS has a lot of write transaction overhead, as we already knew. Here's 1 writer and 2 writers:
Here is IDBBatchAtomicVFS with 1 and 2 writers for comparison:
So IDBBatchAtomicVFS is more than twice as fast, and this is with the default "full" synchronous setting. IDBBatchAtomicVFS also works with the "normal" setting which trades durability for even more performance:
What about contention between readers and writers? Here OPFSCoopSyncVFS with 3 readers and 1 writer:
Here's IDBBatchAtomicVFS also with 3 readers and 1 writer (using relaxed durability settings):
Newcomer FLOOR requires the new POSIX-style access handles only in Chrome for now, and always has relaxed durability, but really shines on mixed contention:
The takeaway is that different VFS implementations handle different workloads in different ways, and which one is "best" depends on how you plan to use it. OPFSCoopSyncVFS dazzles on reads with low contention. It has high write transaction overhead, so avoid it if you need optimal performance on large numbers of small write transactions. It's pretty cool that it works with multiple connections at all, but there are likely better options for high contention scenarios. |
Beta Was this translation helpful? Give feedback.
-
Update: the VFS described below has been renamed to OPFSCoopSyncVFS. The AccessHandlePoolVFS name is being kept for the original implementation.
The main drawback with AccessHandlePoolVFS is that it doesn't support multiple connections. The reimplementation from scratch in the dev branch does have some support for multiple connections, but currently only with Chrome's "readwrite-unsafe" access handles and with the application handling the locking. I think I have a way to make things work without those caveats.
My approach revisits RetryVFS where the VFS returns an error when it needs to perform an asynchronous operation. I implemented this at the time and it worked but it had these issues:
The starting point for the new approach would be the new AccessHandlePoolVFS, which establishes a separate directory for each connection to contain temporary files . This solves issue (1).
The internals of the API call step() can be augmented to retry automatically on SQLITE_BUSY when a VFS provides a Promise that resolves the problem, which is done by taking a lock and acquiring the access handles. This mostly solves issue (2). The application will need to allow message handlers to run by yielding the task occasionally - this may occur naturally if the application itself is communicating with messages.
The new OPFSAdaptiveVFS supports multiple connections when "readwrite-unsafe" is not available by allowing only the connection with the exclusive lock to hold an open access handle. Opening and closing access handles is expensive, so OPFSAdaptiveVFS does this lazily, only when informed over BroadcastChannel that another connection is waiting for the lock. This pays the open/close penalties only when contention actually occurs. Applying this idea will reduce the impact of issue (3).
In theory, the basic idea should work with multiple main databases with another error and retry for each additional asynchronous operation so I think it's possible to solve issue (4). But I won't be that ambitious for a first attempt. This won't support accessing multiple databases in the same transaction.
I hope to modify the dev branch AccessHandlePoolVFS to try this out. If successful, the resulting VFS should have these properties:
Beta Was this translation helpful? Give feedback.
All reactions