define better swingset host-loop patterns #2914

warner · 2021-04-18T23:09:41Z

What is the Problem Being Solved?

SwingSet is a library, meant to be embedded in a host application. SwingSet has no notion of time, or IO, or storage: these things must be provided by the host. We use devices to allow the outside world to influence the swingset kernel: input events might happen at arbitrary times, but by restricting each to merely enqueueing a message for later execution, we record the order in which they are processed, so a replay can be deterministic even though we have arrival-order non-determinism. Outbound messages must be embargoed until the kernel state has been committed, to avoid hangover inconsistency. The two sides must also coordinate input IO, commit points, and outbound IO to assure that the swingset world proceeds forward (Waterken-style) and never surprisingly rolls back.

In general, there will be a loop that does the following over and over again:

process input events, by invoking exported device entry points, which should update state and add messages to the kernel run-queue
process some number of events from the kernel run-queue, based on a value tradeoff between efficiency and a combination of low-latency and maximizing progress made in the face of a non-trivial failure/interruption rate
commit all state
release outbound messages to other kernels

When embedded in a blockchain, this loop is very tightly coupled to the chain's consensus mechanism. In the Tendermint/Cosmos-SDK world, input events happen during StartBlock (e.g. a timer wakeup event triggered when the new block time is larger than the earliest alarm time) or DeliverTx (inbound messages from IBC or a solo node). The bulk of the run-queue work will probably happen during FinishBlock. All swingset state is committed along with the rest of the chain state after FinishBlock returns, forming the application state hash which is included in the block being signed and voted upon. "Outbound messages" are strictly a part of this chain state, so they're automatically "embargoed" until that state is committed and off-chain follower nodes can poll for the presence of these messages. The loop is typically executed on a regular cycle, once per block, perhaps once every 5 seconds.

In this environment, the swingset API should be somewhat passive. The chain is in control: it needs to tell SwingSet to start processing the run-queue, give it some sense of how much work should be done, and be told when that work is completed (and the kernel is idle once more).

When running in a standalone application, swingset can be more involved. Input events (e.g. HTTP server request handlers, timer wakeups) can occur at any moment, even while the kernel is executing cranks, and need to be queued (#720) until it is safe to execute them (and any state changes need to be committed appropriately). Input events might be clustered (e.g. a frontend making two back-to-back HTTP requests, or two requests appearing in the same WebSocket message), so we might want some form of Nagle delay before we consider the kernel cycle complete, to improve efficiency. Output events must be embargoed until the state is committed, as before, but in a standalone application the notion of an "output event" is more direct: messages may be sent over a TCP socket, HTTP requests may be started, or a chain-delivery helper process might be spawned.

Our current API approach is:

non-blockchain applications are responsible for doing their own input queueing (for the cosmic-swingset ag-solo host,

agoric-sdk/packages/cosmic-swingset/lib/ag-solo/start.js

Line 174 in c7862a2

const queuedDeliverInboundToMbx = withInputQueue(

uses https://github.com/Agoric/agoric-sdk/blob/c7862a202d8426bf4fd5de1541d4e04c49554127/packages/cosmic-swingset/lib/ag-solo/vats/queue.js)
a blockchain application can run c.step() until the returned meter-consumption total grows beyond some heuristically-determined threshold; a standalone app can run it until enough wallclock time has passed
- or either can call c.run to keep running until the run-queue is drained, which may take a long time
c.step/c.run returns a Promise; the kernel is "active" until it fires, and "inactive" afterwards until the next c.step/run call
the host provides the hostDB object and is responsible for committing/flushing it when the kernel is inactive
any devices the host chooses to configure must arrange to embargo their outgoing messages until after the commit point

I'd like to improve this, to get a kernel API that makes it easy to handle both scenarios, with minimal opportunities for mistakes.

Description of the Design

I'm still trying to figure out a good design.. here are some notes.

#720 is about coordinating the creation of devices with a swingset-managed queuing mechanism. We don't need this for chain-mode: input calls only happen while the kernel is idle, not spontaneously.

In solo-mode, I'm wondering if we could put swingset in control of everything. The host app would give swingset control over the DB commit function, to be called when swingset was done with cycling the kernel. Input events (HTTP request handler calls) would get queued if the kernel was already running, but if the kernel was idle, it would trigger a kernel cycle. Swingset would avoid calling output functions until after the commit finished.

The kernel cycle in a standalone/solo app is a lot like a "block" in the chain-based app. It's the same unit of transactionality (if the application is interrupted/crashes before the commit point, the new instance will wake up in the previously-committed state).

This will probably need some layer on top of the mailbox device. Something where the host registers a function that knows how to scan the mailbox and send new output messages. This function would be run by swingset at the right time. Swingset needs to know when the function is finished running (making it safe to modify the outbox again), so either it should run synchronously, or it should return a Promise with the knowledge that the kernel will wait for the output function (so maybe don't have it take very long).

Instead of the host app calling c.run() in a loop, we might pre-configure swingset with a policy that says how long it should work before taking a break to commit and release outbound messages. For a solo node this can safely be wallclock time. The sequence would be something like:

input event is queued
- if kernel was idle, schedule a "block" to run promptly (setImmediate)
  - do not Nagle now: if there is work to be done, start it right away
start a "block"
- run all queued input event functions
- run cranks until the run-queue is empty or our policy says we've spent too long running
  - if the run-queue is empty, wait for a Nagle timer to expire
    - run any input events queued while waiting
    - run cranks until run-queue is empty or policy says we've run too long
    - this helps when two input events happen back-to-back: the first will trigger a block, the second will get included during the Nagle timer
    - we don't want to wait forever: if the source of those input events doesn't provide them right away, they can wait for the next cycle
- call the HostDB commit function, let it complete
- call all output message delivery functions, let them complete
declare the kernel "idle"

When the application starts, the first thing it must do is call the output message functions, against the last state committed by the previous instance. It may take an arbitrary amount of time for the new instance to have an input event, and the output messages we began to send last time may not have actually made it to the wire yet, so we must get them on their way before we can rest.

This needs to mesh well with the ag-cosmos-helper protocol diagram in #2855 (comment) . In particular, the output handler needs to be able to poll the kernel active-vs-idle state, and read the latest messages from the outbox iff it is idle.

The text was updated successfully, but these errors were encountered:

dckc · 2021-11-09T21:20:29Z

@warner , is this addressed by the run-policy work #3582?

warner added enhancement New feature or request SwingSet package: SwingSet cosmic-swingset package: cosmic-swingset swingset-runner package: swingset-runner labels Apr 18, 2021

warner self-assigned this Apr 18, 2021

dckc removed the cosmic-swingset package: cosmic-swingset label Nov 10, 2021

Tartuffo added migrate-product-backlog and removed migrate-product-backlog labels Nov 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

define better swingset host-loop patterns #2914

define better swingset host-loop patterns #2914

warner commented Apr 18, 2021

dckc commented Nov 9, 2021

define better swingset host-loop patterns #2914

define better swingset host-loop patterns #2914

Comments

warner commented Apr 18, 2021

What is the Problem Being Solved?

Description of the Design

dckc commented Nov 9, 2021