define better swingset host-loop patterns #2914
Labels
enhancement
New feature or request
SwingSet
package: SwingSet
swingset-runner
package: swingset-runner
What is the Problem Being Solved?
SwingSet is a library, meant to be embedded in a host application. SwingSet has no notion of time, or IO, or storage: these things must be provided by the host. We use devices to allow the outside world to influence the swingset kernel: input events might happen at arbitrary times, but by restricting each to merely enqueueing a message for later execution, we record the order in which they are processed, so a replay can be deterministic even though we have arrival-order non-determinism. Outbound messages must be embargoed until the kernel state has been committed, to avoid hangover inconsistency. The two sides must also coordinate input IO, commit points, and outbound IO to assure that the swingset world proceeds forward (Waterken-style) and never surprisingly rolls back.
In general, there will be a loop that does the following over and over again:
When embedded in a blockchain, this loop is very tightly coupled to the chain's consensus mechanism. In the Tendermint/Cosmos-SDK world, input events happen during
StartBlock
(e.g. a timer wakeup event triggered when the new block time is larger than the earliest alarm time) orDeliverTx
(inbound messages from IBC or a solo node). The bulk of the run-queue work will probably happen duringFinishBlock
. All swingset state is committed along with the rest of the chain state afterFinishBlock
returns, forming the application state hash which is included in the block being signed and voted upon. "Outbound messages" are strictly a part of this chain state, so they're automatically "embargoed" until that state is committed and off-chain follower nodes can poll for the presence of these messages. The loop is typically executed on a regular cycle, once per block, perhaps once every 5 seconds.In this environment, the swingset API should be somewhat passive. The chain is in control: it needs to tell SwingSet to start processing the run-queue, give it some sense of how much work should be done, and be told when that work is completed (and the kernel is idle once more).
When running in a standalone application, swingset can be more involved. Input events (e.g. HTTP server request handlers, timer wakeups) can occur at any moment, even while the kernel is executing cranks, and need to be queued (#720) until it is safe to execute them (and any state changes need to be committed appropriately). Input events might be clustered (e.g. a frontend making two back-to-back HTTP requests, or two requests appearing in the same WebSocket message), so we might want some form of Nagle delay before we consider the kernel cycle complete, to improve efficiency. Output events must be embargoed until the state is committed, as before, but in a standalone application the notion of an "output event" is more direct: messages may be sent over a TCP socket, HTTP requests may be started, or a chain-delivery helper process might be spawned.
Our current API approach is:
agoric-sdk/packages/cosmic-swingset/lib/ag-solo/start.js
Line 174 in c7862a2
c.step()
until the returned meter-consumption total grows beyond some heuristically-determined threshold; a standalone app can run it until enough wallclock time has passedc.run
to keep running until the run-queue is drained, which may take a long timec.step
/c.run
returns a Promise; the kernel is "active" until it fires, and "inactive" afterwards until the next c.step/run callhostDB
object and is responsible for committing/flushing it when the kernel is inactiveI'd like to improve this, to get a kernel API that makes it easy to handle both scenarios, with minimal opportunities for mistakes.
Description of the Design
I'm still trying to figure out a good design.. here are some notes.
#720 is about coordinating the creation of devices with a swingset-managed queuing mechanism. We don't need this for chain-mode: input calls only happen while the kernel is idle, not spontaneously.
In solo-mode, I'm wondering if we could put swingset in control of everything. The host app would give swingset control over the DB commit function, to be called when swingset was done with cycling the kernel. Input events (HTTP request handler calls) would get queued if the kernel was already running, but if the kernel was idle, it would trigger a kernel cycle. Swingset would avoid calling output functions until after the commit finished.
The kernel cycle in a standalone/solo app is a lot like a "block" in the chain-based app. It's the same unit of transactionality (if the application is interrupted/crashes before the commit point, the new instance will wake up in the previously-committed state).
This will probably need some layer on top of the mailbox device. Something where the host registers a function that knows how to scan the mailbox and send new output messages. This function would be run by swingset at the right time. Swingset needs to know when the function is finished running (making it safe to modify the outbox again), so either it should run synchronously, or it should return a Promise with the knowledge that the kernel will wait for the output function (so maybe don't have it take very long).
Instead of the host app calling
c.run()
in a loop, we might pre-configure swingset with a policy that says how long it should work before taking a break to commit and release outbound messages. For a solo node this can safely be wallclock time. The sequence would be something like:setImmediate
)When the application starts, the first thing it must do is call the output message functions, against the last state committed by the previous instance. It may take an arbitrary amount of time for the new instance to have an input event, and the output messages we began to send last time may not have actually made it to the wire yet, so we must get them on their way before we can rest.
This needs to mesh well with the ag-cosmos-helper protocol diagram in #2855 (comment) . In particular, the output handler needs to be able to poll the kernel active-vs-idle state, and read the latest messages from the outbox iff it is idle.
The text was updated successfully, but these errors were encountered: