Convert swingstore from LMDB to Sqlite, phase 1 #6561

FUDCo · 2022-11-13T09:35:53Z

This PR is the first phase of the conversion from LMDB to Sqlite (in partial satisfaction of #3087).

This PR includes the replacement of the underlying database engine and the downstream changes that flow from it. This is done in a series of four commits (individually reviewable for clarity), each of which realizes a different step of the transformation.

Step 1 - convert from LMDB to Sqlite, using Sqlite to realize a dumb key-value store that replicates the former LMDB semantics exactly, retaining all of the supporting implementation that assumes a dumb key-value store with LMDB's transaction model.

Step 2 - eliminate the ephemeralSwingStore implementation, used for testing, that realized the dumb key-value store in memory using a JavaScript Map object, replacing it with Sqlite's :memory: pseudo-file which does essentially the same thing (i.e., stores everything ephemerally in RAM) but does it using Sqlite directly.

Step 3 - use Sqlite savepoints for crank-level commits (rather than an in-memory change buffer), with full Sqlite transactions for block-level commits. This eliminates a couple of layers of wrapper objects. A necessary consequence of this is that the crank activity hash is now computed as part of the swingstore itself.

Step 4 - eliminate the rest of the storage wrappers, making the swingstore essentially self contained.

The system that results after each step is fully functional in a way that is compatible with the prior behavior of SwingSet.
If this PR is merged with the master branch, the result will be entirely usable.

I am pushing this PR now in order to get review started. However, there is more to be done. In the broadest possible strokes, this remaining work falls into two categories:

Short term changes to
- (a) rationalize the transaction model to account more gracefully for crank activity that is pre- or post-delivery, since we now have the ability to do more selective rollback. in particular, we believe there is a potential for improper commitment or non-commitment of pre-crank data mutations. Note that this risk is not new -- it's present with the LMDB-based store as well -- but the switch to Sqlite presents us with the opportunity to ensure that the cases we are concerned about are actually handled correctly.
- (b) remove the use of iterators that require either the retention of a database cursor during iteration (which we can't tolerate) or fetching the entirety of a SELECT result into RAM before using it (which the implementation in this PR does but which exposes us to the risk of an unbounded memory suck from adversarial vat code), replacing them with something more like the vatstore's getAfter machinery.
Long term changes that exploit the capabilities of the SQL query mechanism in various ways. These include but are not limited to bulk operations for data deletion, improving GC of on-disk data, and removing the vatstore from the consensus SwingSet state. Most of these will require SwingStore API changes or additions, which argues for taking some care planning them before proceeding, hence their absence from this PR. In addition, we expect further exploration/experimentation to identify further optimization opportunities.

mhofman · 2022-11-14T01:08:18Z

The system that results after each step is fully functional in a way that is compatible with the prior behavior of SwingSet.

I'm wondering, does that mean that the behavior of node with the PR applied, and one without would be 100% consistent (long term deterministic)? Even without this, how confident would we be in applying this change to release-pismo? Which begs the question, do we believe there may be a way to upgrade from the current DBs to the new sqlite one? Aka export the content of LMDB in a new SQLite DB and restart from there (either as a manual script or as a built-in upgrade sequence).

FUDCo · 2022-11-14T08:59:46Z

I'm wondering, does that mean that the behavior of node with the PR applied, and one without would be 100% consistent (long term deterministic)?

There are three answers to this, which are respectively, yes, no, and it doesn't matter.

Yes, I've verified that the sequence of database operations are the same and that the action hashes match.

No, there is an exception to the action hashes matching, which is when there is an action in the crank containing a bundle reference and the bundle hash has changed because the code has changed. And of course changing the database engine is a code change. On the other hand, I observed this with tests that themselves had changes, and it's possible that since the database code changes are all in the kernel we'll be fine if the kernel bundle does not appear in any action hash (which, long term, we need it to not do if we want kernel upgrade to be possible). @warner?

And it doesn't matter (I think) because IIUC we'd be switching database engines as part of the bulldozer upgrade.

mhofman · 2022-11-14T14:32:42Z

which is when there is an action in the crank containing a bundle reference and the bundle hash has changed because the code has changed

Right! This is an interesting case since this change has an impact on the kernel, mostly due to the swingstore api refactoring from what I gather. I thought however that we removed the kernel bundle from being saved in the DB? Or did this somehow impact the liveslot/supervisor bundle?

And it doesn't matter (I think) because IIUC we'd be switching database engines as part of the bulldozer upgrade.

I was wondering in the context of potentially having to deal with state-sync before a bulldozer upgrade. It sounds like with some effort, we might be able to swap the DB implementation for release-pismo. In that case we'd need to do other surgery, e.g. extract local XS snapshot hashes and move them in a different section of the DB (after verifying they're deterministic of course), so it'd definitely not be for the faint of heart.

warner · 2022-11-19T20:29:37Z

Short take: this looks great. I'll start reviewing properly now.

We no longer store the kernel bundle anywhere. We still need to build a kernel bundle (because the kernel runs in its own Compartment, and import-bundle is the only tool we have for loading more than a single eval-ed string into a Compartment), but we now do that on every boot, rather than bundling once during initializeSwingset and storing the result for all subsequent reboots. This makes a kernel upgrade easier, because you just run a new version of the host application (which imports a new version of @agoric/swingset-vat), but of course exports the "only run compatible versions" requirement onto the author of the host app.

I'll think about whether this could cause visible behavioral changes as I review. My intention was that we make this switch without worrying about such changes, especially because of my goal to replace vatstore.getKeyAfter with a simpler API, more like getNext, and that would entail a change to liveslots.

@FUDCo and walked through what the crankBuffer commit/abort replacement will be (which can happen after this lands). We concluded that we will:

decouple crankHash from what used to be the crankBuffer, so there's a kernelStorage.getCrankHash() that returns a hash of everything since the last call, and provide some way to fold that into the activityHash (or maybe that's the host's responsibility, not sure yet)
the "crank cycle" is the thing that defines crank 1, crank 2, etc
we pick the point on this cycle just before we pop something off the run-queue as the start, which is also the state we're in just before controller.run() is called, and also the state we're in just after controller.run() finishes
we define two SQLite SAVEPOINTs, the first is at this "idle" point, the second is just after the delivery/etc has been popped off the run-queue (so we need two swing-store APIs, and the kernel will call each of them, once per cycle)
then there are three possible outcomes:
- 1: the delivery finishes normally, and the kernel wants to commit all changes. We call some swing-store API that says "delete all SAVEPOINTs, I won't use them"
- 2: the delivery goes weird, and the kernel decides to discard the changes, and the delivery/run-queue item was consumed (so the kernel does not want to process it ever again), e.g. a createVat or upgradeVat failing. The kernel calls a swing-store API which rewinds to the second SAVEPOINT (the one taken after the pop). Then the kernel does cleanup work like pushing vat-admin messages about the failure. The swing-store API will also delete the first SAVEPOINT, since we won't be using it, and we don't want to let it clutter up RAM.
- 3: the delivery goes weird, the kernel decides to discard the changes, but the kernel wants the delivery to be re-attempted later (e.g. a deliver killed the vat, and a re-delivery is the simplest way to provoke the right VAT_TERMINATED error). For this case, the kernel calls a third swing-store API which rewinds to the first SAVEPOINT (taken before the pop), and deletes the second SAVEPOINT (again to avoid clutter). Then the kernel does cleanup work like vat-admin messages.
Note that all kvstore changes, committed or abandoned, get rolled into the crankhash. Previously I think we only included committed changes, but we decided/concluded that they're all part of the deterministic behavior. The commit-vs-abort is part of consensus, so the reasons for choosing one vs the other must be within-consensus too.

And @FUDCo points out:

Note that in case 3, the rollback to the first SAVEPOINT clears out the second SAVEPOINT as it goes by with no additional action necessary. Also, in case 2, clearing the first SAVEPOINT is not just a matter of memory savings, it’s a backstop against bugs in later code hitting it in a later rollback.

warner

Phew, ok sorry that took a while. Some small changes to make, some items for me to investigate further, and a few things to discuss.

packages/swing-store/src/sqlStreamStore.js

packages/swing-store/package.json

packages/swing-store/src/swingStore.js

packages/swing-store/src/hasher.js

packages/swing-store/test/test-hasher.js

packages/SwingSet/src/kernel/kernel.js

packages/SwingSet/test/vat-admin/test-create-vat.js

packages/SwingSet/test/test-state.js

warner · 2022-12-20T22:54:54Z

Oh, an update on the WAL-mode and "checkpoint" operations. I just re-read https://sqlite.org/wal.html#performance_considerations , and realized there's a difference between the fsync() durability action and the "checkpoint" consolidate-the-WAL performance action.

It says that if PRAGMA synchronous is set to NORMAL (the default), the DB only does an fsync() during a checkpoint operation. That improves performance, but is not safe against a power failure: you could lose data that was written (and committed), until a checkpoint happens. These checkpoints happen automatically once the WAL file reaches 1000 pages, or when initiated manually, but I don't think we're comfortable with a durability vulnerability window like that.

If we set PRAGMA synchronous to FULL, then it performs an fsync() on every commit(), which sounds like what we want. We only do a real commit() once per block (no faster than once every 5 seconds), so the performance should be fine.

Once we do that, we can stick to automatic checkpoints. These will be opportunistic: if a reader has a transaction open (e.g. to copy data out of the DB, maybe for state-sync purposes), the checkpoint/WAL-compaction might have to stop, but it will pick up where it left off at the next opportunity, and the application doesn't need to know about it.

So:

when creating the DB, set PRAGMA synchronous to FULL, right next to where we set PRAGMA journal_mode=WAL
when doing a commit, don't force a checkpoint

FUDCo · 2022-12-22T19:38:33Z

@warner ready for re-review pending your final analyses of the test changes.

warner · 2022-12-23T05:00:59Z

Ok, that looks good. Maybe rebase one last time. Thanks!

…h into swing-store

…om host facet

The refcount changes in "createVat holds refcount" were correct, but the comments gave the wrong reason. The original version (on trunk) was wrong, but happened to work because of a failure-to-commit bug/omission in kpResolution(). After using `c.kpResolution(kpid1)`, the refcount is indeed 3: one from v1-bootstrap, one from the pin added by kpResolution, and one from the kpid1 resolution value. Note that kpid1 itself has a zero refcount by this point, but has not yet been collected, because we only call processRefcounts() at the end of deliveries, not after the decrefs performed by kpResolution. This is arguably a bug, but fortunately we only call kpResolution() from tests. The original version thought the refcount was 2, and the test only passed because the incref performed by kpResolution was still sitting in the crankbuffer, and the test code looked directly at the kvStore (not the crankbuffer-wrapped version that kernelKeeper uses). The crankbuffer had refcount=3, the DB had refcount=2, and the test asserted that it was "2". The test comments didn't take into account the reference from kpid1 (and probably assumed that kpid1 was retired by that point). The subsequent delivery allowed the crankbuffer to be flushed (incrementing), but also allows processRefcounts() to run, which removes the kpid1 resolution value reference (decrementing), making it look like there was no net refcount change. After switching to SQLite and removing the crankbuffer, the test is correctly seeing the incref added by kpResolution, so it must assert that the count is 3. But the extra refcount didn't come from the 'getHeld' call: v1-bootstrap only has a single c-list entry for 'held', not two as the comments implied (even if multiple objects or Promises within v1-bootstrap held a reference, they all share a single valToSlot and c-list entry).

mhofman · 2023-04-05T03:00:02Z

packages/swing-store/src/swingStore.js

-    closeStreamStore();
-    await doCommit(true);
-    await db.close();
+    commit();
+    db.close();


This changes the behavior on close from abort to commit.

Oh, huh, yeah the previous true in doCommit(true) triggered an LMDB abort. I must have missed this during review (also we don't have any tests of abort-on-close(), and we never use it that way, applications just crash instead of calling close). Yeah, we should make this abort.

mhofman · 2023-04-05T03:03:20Z

packages/swing-store/src/swingStore.js

-    txnFinish(abort ? lmdbAbort : undefined);
-    return Promise.resolve(txnDone)
-      .then(() => {
-        trace(`${abort ? 'abort' : 'commit'}-tx`);


This trace was not kept in the change

FUDCo added the SwingSet package: SwingSet label Nov 13, 2022

FUDCo requested a review from warner November 13, 2022 09:35

FUDCo self-assigned this Nov 13, 2022

mhofman mentioned this pull request Nov 14, 2022

API to get summary of swingstore block changes #6562

Closed

FUDCo force-pushed the 3087-sql-swingstore branch from 85fd98a to 4ef54e2 Compare November 14, 2022 23:58

FUDCo force-pushed the 3087-sql-swingstore branch 4 times, most recently from c90d991 to 2e24974 Compare December 13, 2022 01:47

warner mentioned this pull request Dec 14, 2022

maybe expand the swingstore API to take better advantage of SQLite features #6677

Open

warner requested changes Dec 16, 2022

View reviewed changes

This was referenced Dec 21, 2022

move prepare-test-env-ava helper to a separate package? #6703

Open

move makeBufferedStorage from swingset to cosmic-swingset #6704

Closed

FUDCo force-pushed the 3087-sql-swingstore branch 2 times, most recently from f2643e2 to 47f1aba Compare December 22, 2022 19:17

FUDCo requested a review from warner December 22, 2022 19:38

warner approved these changes Dec 23, 2022

View reviewed changes

FUDCo added 6 commits December 23, 2022 01:12

feat: convert swing-store from LMDB to Sqlite

579a6c7

feat: eliminate ephemeralSwingStore in favor of Sqlite :memory: database

0283233

feat: use Sqlite save points for crank commit, integrate activity has…

6613d7e

…h into swing-store

feat: eliminate storage wrapper

35cce9f

fix: CI failures in other packages

071bf89

feat: refactor SwingStore APIs to cleanly distinguish kernel facet fr…

7126822

…om host facet

FUDCo and others added 5 commits December 23, 2022 01:12

chore: refactor crank commit/abort logic

1acb1b8

fix: correct merge goofiness

be83b2d

fix: incorporate review feedback

24896ee

fix: repair version shear

59de3ab

FUDCo force-pushed the 3087-sql-swingstore branch from 47f1aba to 59de3ab Compare December 23, 2022 09:12

FUDCo added the automerge:rebase Automatically rebase updates, then merge label Dec 23, 2022

mergify bot merged commit 0d0f8f3 into master Dec 23, 2022

mergify bot deleted the 3087-sql-swingstore branch December 23, 2022 10:29

warner mentioned this pull request Jan 24, 2023

explore sqlite as kernelDB #3087

Closed

warner mentioned this pull request Feb 24, 2023

remove the explicit wal_checkpoint(FULL) #7069

Closed

mhofman reviewed Apr 5, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert swingstore from LMDB to Sqlite, phase 1 #6561

Convert swingstore from LMDB to Sqlite, phase 1 #6561

FUDCo commented Nov 13, 2022

mhofman commented Nov 14, 2022

FUDCo commented Nov 14, 2022

mhofman commented Nov 14, 2022

warner commented Nov 19, 2022

warner left a comment

warner commented Dec 20, 2022

FUDCo commented Dec 22, 2022

warner commented Dec 23, 2022

mhofman Apr 5, 2023

warner Apr 5, 2023

mhofman Apr 5, 2023

Convert swingstore from LMDB to Sqlite, phase 1 #6561

Convert swingstore from LMDB to Sqlite, phase 1 #6561

Conversation

FUDCo commented Nov 13, 2022

mhofman commented Nov 14, 2022

FUDCo commented Nov 14, 2022

mhofman commented Nov 14, 2022

warner commented Nov 19, 2022

warner left a comment

Choose a reason for hiding this comment

warner commented Dec 20, 2022

FUDCo commented Dec 22, 2022

warner commented Dec 23, 2022

mhofman Apr 5, 2023

Choose a reason for hiding this comment

warner Apr 5, 2023

Choose a reason for hiding this comment

mhofman Apr 5, 2023

Choose a reason for hiding this comment