-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vat-container options: XS, Worker, WASM, etc #1127
Comments
I put together a program to try and analyze our options: from __future__ import print_function
import sys
# options for swingset in browser, node, and XS
headerstr = "case XS WASM XSmet JSmet Worker Work/Vat Atomics SyncDB: capacity save meter-CPU meter-RAM"
dashstr = "---- -- ---- ----- ----- ------ -------- ------- ------ -------- ---- --------- ---------"
formatstr = "{:<5} {:<5} {:<5} {:<5} {:<5} {:<5} {:<5} {:<5} {:<5} {:<5} {:<8} {:<11} {}"
def bools(b):
return "yes" if b else "no"
def describe(case, limitations,
xs, wasm_and_xs, xs_metering, js_metering,
vats_in_worker, worker_per_vat, atomics,
sync_hostdb):
if "browser" in limitations or "leveldb" in limitations:
if xs and not wasm_and_xs:
return
if sync_hostdb:
return
if "firefox" in limitations:
if xs and not wasm_and_xs:
return
if sync_hostdb or atomics:
return
if "node" in limitations:
if xs or wasm_and_xs:
return
# wasm_and_xs, and we change XS to suspend the JS thread while waiting
# for an async wasm-to-host call. Then the JS host can use an async DB.
suspended_syscalls = (vats_in_worker and atomics) or wasm_and_xs
sync_syscalls = (sync_hostdb and not vats_in_worker) or suspended_syscalls
cpu_metering = "cpu-none"
if (vats_in_worker and worker_per_vat):
cpu_metering = "cpu-timeout"
if (wasm_and_xs and xs_metering) or js_metering:
cpu_metering = "cpu-exact"
memory_metering = "ram-none"
if (wasm_and_xs and xs_metering) or js_metering:
memory_metering = "ram-exact"
if sync_syscalls:
capacity = "large" # large object tables can live in DB
else:
capacity = "small" # objects must live in RAM
serialize = "save-no"
if wasm_and_xs or (xs and vats_in_worker and worker_per_vat):
serialize = "save-yes"
print(formatstr.format(case,
bools(xs), bools(wasm_and_xs), bools(xs_metering), bools(js_metering),
bools(vats_in_worker), bools(worker_per_vat), bools(atomics),
bools(sync_hostdb),
capacity, serialize, cpu_metering, memory_metering))
case = 1
limitations = sys.argv[1].split(",") if len(sys.argv) > 1 else []
print(headerstr, file=sys.stderr)
print(dashstr, file=sys.stderr)
for xs in [False, True]:
for wasm_and_xs in [False, True] if xs else [False]:
for xs_metering in [False, True] if wasm_and_xs else [False]:
for js_metering in [False, True]:
for vats_in_worker in [False, True]:
for worker_per_vat in [False, True] if vats_in_worker else [False]:
# 'atomics' means SharedArrayBuffer, Atomics, which
# requires lots of headers to enable in a browser, and
# not available yet in some browsers. If vats aren't run
# in a worker, it doesn't matter
for atomics in [False, True] if vats_in_worker else [False]:
for sync_hostdb in [False, True]: # LMDB=true, leveldb=false
describe(case, limitations,
xs, wasm_and_xs, xs_metering, js_metering,
vats_in_worker, worker_per_vat, atomics,
sync_hostdb)
case += 1 |
The columns are:
In each case, the properties of the resulting environment are:
The full output is:
|
If we limit the options to browsers that either don't enable
The simplest/weakest implementation (case 1) wouldn't use Workers at all: each vat goes into a separate Compartment, everything is under SES, but vat code could kill the swingset by allocating too much memory or going into an infinite loop. We'd have to use full transcripts for all persistence, and vats could not call The barest minimum metering we could is case 7, which prevents runaway CPU usage, but could still kill the machine by excessive memory usage. This doesn't inject metering code, but does put each vat into a separate worker. The kernel would simply apply a timeout to each delivery, and if it didn't complete within a few seconds, the vat is killed. This might be enough for developers. The simplest way to achieve coarse (non-deterministic) protection against CPU and memory exhaustion is case 11: inject metering code, but not use Workers. This doesn't enable secondary storage, To achieve long-term operation in a browser, we need snapshots, which we can achieve by compiling XS to WASM, running each vat in a separate WASM instance, and saving the WASM linear memory. This is case 41. Since we're using XS, we can modify it to suspend vat execution while the kernel does async storage work, so we don't need Atomics. If we also want full metering, we can add it by instrumenting either the XS code (case 61) or injecting metering into the JS code (case 51) |
If we look at a browser that does have an
Case 5 adds large-capacity to our simple case 1 (no checkpointing, no metering), by running both the kernel and all vats in a single shared Worker, and suspending that worker when it wants the main thread to do an async storage operation. Case 9 adds large-capacity vats to our case 7 (basic runaway CPU protection). We still cannot get vat snapshots without going to XS+WASM. |
If we look at Node.js, without XS, we get:
Here, we cannot get snapshots, but we can get large-capacity and deterministic metering in case 12, by injecting metering code into JS, and using a synchronous DB like LMDB. If we want to use an async DB like LevelDB, we use case 15: inject JS metering, use a worker to suspend vats upon read. We can do this with one-worker-per-vat (19), or a shared worker for the kernel and all vats (15). |
EndgameThis comment describes my half-baked plans for the targets I currently have in mind. The plan involves a The kernel provides this The vat code itself will live in some sort of platform-specific container: it might be a The These VatWorkers are made by a function named The data that defines static vats will come from the controller: it passes something into the kernel with Intermediate StepsThere will be some intermediate steps, to retain a working system despite the time it will take to figure out and implement the following:
I currently think the sequence will be:
After that sequence, we can do a couple different things in parallel: Async KernelNow that kernel syscalls are allowed to return a Promise, the kernel can start using async operations internally. The syscall handlers which do c-list lookups become async, then we change the Keepers to return Promises (initially degenerate That kernel reentrancy guard is now pretty important. We need a clear story for external inputs (inbox device calls from the host) to make them all single-file, probably some helper function which wraps the exported device function and delays invoking it if the kernel is already busy doing something. All possible pathways into the kernel should go through this queue. We should provide a clear place for pre-crank and post-crank operations to happen, especially the post-block commit point. When this path is done, we can finally reap the benefits: an async HostDB. At this point we could move from LMDB to LevelDB (in Node), or move this to a browser and use IndexedDB. XSWe write a C startup program that uses the XS API to create a new At first, the controller implements Next, we change Now we have two XS improvements we can work on in parallel:
BrowserWe can start making swingset work in a browser before fully async-ifying the kernel, but we'll be limited to a fake non-persistent synchronous HostDB that keeps everything in RAM (or stashing everything in We can achieve coarse metering by having the VatWorker create a real Once the kernel can use an async HostDB, we can switch to using IndexedDB for storage, which increases our available space to perhaps gigabytes. With some fancy batching, we could also accumulate deltas of state changes and ship them off-site, to some external server with as much storage as we want. The awesome science-fiction future of this path is to visit a web page (or browser extension), type in a strong credential, retrieve an encrypted state bundle, decrypt just enough to figure out the contents of the run queue and the configuration of the IO channels, and then resume your previously-suspended swingset machine in your browser. The page can fetch specific vats as messages arrive for them, replaying their transcripts and delivering the message. At the end of each block, we re-encrypt the state deltas and commit them to off-site storage, then release any newly-generated messages to the IO channels. XS+WASM=AWESOMEThe VatWorker that runs under XS should also be compilable to WASM. The VatWorker we write for a browser should be able to instantiate that WASM code and install the vat bundle inside it. In that world, we can checkpoint the vat by saving the WASM instance's linear memory buffer. This improves our science-fiction "your vats, anywhere" world with efficient restores of those old vats. We can also achieve exact metering by instrumenting XS before compiling it to WASM. And whatever C code we write to suspend the Worker can also be included here, so the WASM code makes a blocking invocation of an Target SystemsSo, this is what I think things will look like when we're done: Chain NodeThe Agoric chain is our most stringent environment. The validators (and full-node followers) which run here must be fully deterministic, so they can always agree with the other validators. They must protect availability against arbitrarily malicious code, which requires complete (deterministic) metering. They will run for years, so they must be efficiently restartable (requiring snapshots/checkpoints). They require high-capacity vats to store all the Purse data tables in secondary storage. On the plus side, validators can be asked to do more work to get the node running: we do not need to support arbitrary developer's workstations. So the program that implements a chain node can be more specialized. This program will be a compiled binary, which links together the Golang-based Cosmos-SDK, the XS library (written in C), and a fair bit of agoric-specific glue code (also written in C). This has the pieces described above in the "XS" path, plus a lot of interaction with the Cosmos-SDK. The SDK gets to invoke devices to deliver inbound messages, and devices get to invoke the SDK to effect outbound changes. Solo Nodes (desktop)Initially, solo nodes will run under Node.js, not XS, for a better development cycle and integration with existing libraries. We can use an async kernel so it can use LevelDB. Coarse metering (Worker-per-Vat, timeout each dispatch, no memory-exhaustion protection) is probably sufficient at the start. We won't have checkpoints, so restarting a long-running vat will take a while. Later we might switch solo nodes over to the XS-based system, for consistency, better metering and checkpointing. Solo Nodes (browser)Nodes in a browser will get more fully-featured as we finish the development steps described above. Initially they will be somewhat ephemeral, and have minimal metering or protection against runaway code. But eventually (XS-on-WASM) they should have the same features and defenses as the chain nodes. |
MarkM's "Slow Caps" ProposalIn today's meeting, @erights analyzed our likely use of "large-capacity vats" (aka "huge tables" aka "hierarchical object references") and proposed a scheme that would let us get away with purely async vats. There are vat-side details that I didn't entirely grasp, so I'll let him fill in the blanks, but the overall scheme looks a lot like the hierarchical references we described earlier (#455), except the kernel figures out what data the vat will need ahead of time, and delivers it along with the dispatch message. The vat doesn't do any additional reads, but can synthesize the "virtual" objects with just the data that the kernel provided. The vat does do additional writes, to update the kernel with any changes to this extra data, but they go to the kernel, not some separate device. No extra device would be used (probably). In this scheme, the kernel is much more involved and aware of these special identifiers, whereas in our previous #455 thinking the vat makes device calls to get the data it needs (which go through the kernel, but the kernel is otherwise unaware of what's going on). In more detail:
Within the user-level vat code, I think this looks mostly like #455. The main new data structure would behave like a virtual Map or maybe WeakMap, registered with the serialization layer during vat startup. This virtual map (maybe called a "HugeTable") pretends to hold a very large number of objects, but is very particular about the keys and values it holds. You can ask it to create a new row (key/value pair), and the key it creates for you (the "virtual object", or "ephemeral object", or maybe "slow object"??), e.g. a new You can use the virtual object in the arguments of an outbound message send (or Promise resolution), which will be serialized as You can probably also use the virtual object in the value of a new row in some other virtual map. This will cause one chunk of slowdata to contain references to other slowcaps. Normally we expect the virtual object to be dropped before the end of the crank (if it were kept around for a long time, we wouldn't save any RAM by keeping its data on disk). But you might keep the virtual object around in a non-virtual container for a while; perhaps all the active escrow purses of a contract that is expected to conclude in a reasonably short amount of time. This might cause identity problems in subsequent message sends, or maybe not. Without WeakRefs, liveslots may not be able to tell that it still has a virtual object for a given |
https://github.com/NeilFraser/JS-Interpreter might be a worth a look as yet another substrate to run a vat on top off. One could make an zygote js-interp instance, load in the ses-shim and other shims or packages wanted. |
re "Slow caps" proposal: can this extra data be called cookies and the rehydration code inside the vat called cookie monsters? Just as a reference to a short story by Vernor Vinge :3 |
Having thought about it a bit, the slow caps idea could be simplyfied quite a bit. |
One concern I've heard (I think from @dtribble) about the slow-caps approach is that it's only easy to do for shallow/simple tables, where the vat is able to explain to the kernel, ahead of time, what data it will need for any given slowcap. If the vat had synchronous access to secondary storage, the vat could enact whatever complicated schema it wanted, like an auction handle referencing the purses of all the submitted bids, along with secondary tables referenced by those purses, etc. But if the vat needs to tell the kernel about the relationship between the handle and the tables rows ahead of time, it will be limited to a simpler and less-dynamic schema. The relationship between the slowcap's ahead-of-time schema and giving the vat full synchronous access to secondary storage is a lot like the relationship between eventual-send and full mobile code. The first case is an optimized / easier-to-implement subset of the latter's more general / harder-to-implement case. But that made me think about prepare-commit again, and how we might accomplish it without too much kernel involvement. So here's a proposal.
From the vat code's point of view, it won't be notified until all the data it needs is available within liveslots for synchronous access. It is required to explain how to achieve this (to liveslots), but after that point, it no longer needs to be aware of the asynchronous nature of the secondary storage. From liveslots' point of view, it has a powerless agent that came from the vat code (so it can be arbitrarily complex and tailored to the vat's needs), which is told the references of each inbound message, and returns additional keys to fetch from secondary storage. It keeps fetching more data and feeding the agent until the agent says the vat will be satisfied. Then the agent is purged and the message can be released to the vat code. Liveslots is responsible for preventing interleaving of multiple messages, making the prepare/commit phase look uninterrupted. From the kernel's point of view, it just sends Concerns:
|
Open question: Can @warner 's "prepare agent" proposal emulate the "slowcap" proposal by vat-side ("user") code? (When I asked this question verbally, I got opposite answers. Hence "open".) I do not have an opinion yet about whether emulating slowcaps would be useful, but it would help us understand the proposal. |
After today's discussion, I think we're back to synchronous syscalls to read secondary storage. The concerns with a "prepare agent" (coming mostly from @dtribble) are similar to those with kernel-based "slowcaps" and/or prepare-commit: the programming model changes too much. Needing to explain your schema (and most importantly the rules for what data needs to be fetched) in a separate object, distant from where the data is being accessed, feels like it will become a barrier for programmers to overcome. The "prepare agent" API would be a bundle of source (to construct a pure function, without access to mutable state), that must know enough about the vat code to correctly analyze incoming messages and figure out what state it needs. The "slowcaps" (kernel-managed) approach would need to express this same logic declaratively, for implementation in the kernel. Both sounded like more trouble than the hacks we have in mind to make blocking syscalls work. So we're going to plan to have a synchronous/blocking "read data now" syscall, which operates on a chunk of secondary storage that is dedicated to the specific vat. There will be a synchronous "write data soon" syscall too: reads observe previous writes, but writes are not fully committed until/unless the crank finishes without error. The API will be in the form of a "hugetable" Store-like object, which needs a schema but is in cahoots with liveslots and is backed to secondary storage.
Web browsers will have the same huge-table API, but the data will stay in RAM (or in I'll write up more in #455 where the API work will happen, leaving this ticket to be about platform requirements. |
We identified a couple of properties that result from running a vat in a separate worker process:
|
Refactor the creation of VatManagers to use a single function (vatManagerFactory), which takes "managerOptions", which include everything needed to configure the new manager. The handling of options was tightened up, with precondition checks on the options-bag contents. One new managerOption is `managerType`, which specifies where we want the new vat to run. Only `local` is supported right now (our usual Compartment sharing the main thread approach), but #1299 / #1127 will introduce new types that use separate threads, or subprocesses. The controller/kernel acquired a new `shutdown()` method, to use in unit tests. This is unused now, but once we add workers that spawn threads or subprocesses, we'll need to call+await this at the end of any unit test that creates thread/subprocess-based workers, otherwise the worker thread would keep the process from ever exiting.
This adds a per-vat option to run the vat code in a separate thread, sharing the process with the main (kernel) thread, sending VatDelivery and VatSyscall objects over the postMessage channel. This isn't particularly useful by itself, but it establishes the protocol for running vats in a separate *process*, possibly written in a different language or using a different JS engine (like XS, in #1299). This 'nodeWorker' managertype has several limitations. The shallow ones are: * vatPowers is missing transformTildot, which shouldn't be hard to add * vatPowers.testLog is missing, only used for unit tests so we can probably live without it * vatPowers is missing makeGetMeter/transformMetering (and will probably never get them, since they're only used for within-vat metering and we're trying to get rid of that) * metering is not implemented at all * delivery transcripts (and replay) are not yet implemented Metering shouldn't be too hard to add, although we'll probably make it an option, to avoid paying the instrumented-globals penalty when we aren't using it. We also need to add proper control over vat termination (via meter exhaustion or manually). The deeper limitation is that nodeWorkers cannot block to wait for a syscall (like `callNow`), so they cannot invoke devices. refs #1127 closes #1384
Refactor the creation of VatManagers to use a single function (vatManagerFactory), which takes "managerOptions", which include everything needed to configure the new manager. The handling of options was tightened up, with precondition checks on the options-bag contents. One new managerOption is `managerType`, which specifies where we want the new vat to run. Only `local` is supported right now (our usual Compartment sharing the main thread approach), but #1299 / #1127 will introduce new types that use separate threads, or subprocesses. The controller/kernel acquired a new `shutdown()` method, to use in unit tests. This is unused now, but once we add workers that spawn threads or subprocesses, we'll need to call+await this at the end of any unit test that creates thread/subprocess-based workers, otherwise the worker thread would keep the process from ever exiting.
This adds a per-vat option to run the vat code in a separate thread, sharing the process with the main (kernel) thread, sending VatDelivery and VatSyscall objects over the postMessage channel. This isn't particularly useful by itself, but it establishes the protocol for running vats in a separate *process*, possibly written in a different language or using a different JS engine (like XS, in #1299). This 'nodeWorker' managertype has several limitations. The shallow ones are: * vatPowers is missing transformTildot, which shouldn't be hard to add * vatPowers.testLog is missing, only used for unit tests so we can probably live without it * vatPowers is missing makeGetMeter/transformMetering (and will probably never get them, since they're only used for within-vat metering and we're trying to get rid of that) * metering is not implemented at all * delivery transcripts (and replay) are not yet implemented Metering shouldn't be too hard to add, although we'll probably make it an option, to avoid paying the instrumented-globals penalty when we aren't using it. We also need to add proper control over vat termination (via meter exhaustion or manually). The deeper limitation is that nodeWorkers cannot block to wait for a syscall (like `callNow`), so they cannot invoke devices. refs #1127 closes #1384
Refactor the creation of VatManagers to use a single function (vatManagerFactory), which takes "managerOptions", which include everything needed to configure the new manager. The handling of options was tightened up, with precondition checks on the options-bag contents. One new managerOption is `managerType`, which specifies where we want the new vat to run. Only `local` is supported right now (our usual Compartment sharing the main thread approach), but #1299 / #1127 will introduce new types that use separate threads, or subprocesses. The controller/kernel acquired a new `shutdown()` method, to use in unit tests. This is unused now, but once we add workers that spawn threads or subprocesses, we'll need to call+await this at the end of any unit test that creates thread/subprocess-based workers, otherwise the worker thread would keep the process from ever exiting.
This adds a per-vat option to run the vat code in a separate thread, sharing the process with the main (kernel) thread, sending VatDelivery and VatSyscall objects over the postMessage channel. This isn't particularly useful by itself, but it establishes the protocol for running vats in a separate *process*, possibly written in a different language or using a different JS engine (like XS, in #1299). This 'nodeWorker' managertype has several limitations. The shallow ones are: * vatPowers is missing transformTildot, which shouldn't be hard to add * vatPowers.testLog is missing, only used for unit tests so we can probably live without it * vatPowers is missing makeGetMeter/transformMetering (and will probably never get them, since they're only used for within-vat metering and we're trying to get rid of that) * metering is not implemented at all * delivery transcripts (and replay) are not yet implemented Metering shouldn't be too hard to add, although we'll probably make it an option, to avoid paying the instrumented-globals penalty when we aren't using it. We also need to add proper control over vat termination (via meter exhaustion or manually). The deeper limitation is that nodeWorkers cannot block to wait for a syscall (like `callNow`), so they cannot invoke devices. refs #1127 closes #1384
This fixes the two ends of the netstring-based "kernel-worker" protocol: the previous version failed to parse large inbound messages, such as non-trivial vat bundles. The replacement netstring parser is based on Node.js "Streams", in their "object mode". We intend to replace this with one based on async iterators, once I can figure out some other problems with that branch. We re-enable test-worker.js for all worker types, now that the decoding problem is fixed. refs #1299 refs #1127
I read this over and I think it's done; I don't see anything outstanding that's not covered by other open issues. |
As part of moving swingset to new-SES (in particular where exactly SES gets initialized, and how Compartments and the metering transform figure into it), I'm drawing up a "Vat Container" taxonomy. Here are my thoughts so far:
Assumptions
Use Cases
Target Platforms
Resulting Properties
Sync vs Async Vats, Large vs Small Capacity
The environment we build might support sync vats, or it might only support async vats. The difference is that
sync vats
are able to make a synchronous/blockingsyscall.deviceRead
(#55) call, which invokes a device immediately and returns the resulting data directly.async vats
can make all other syscalls (including the async commit-laterdeviceWrite
), but they cannot calldeviceRead
. We can rewrite most devices (in particular the Timer device) to only requiredeviceWrite
, but the big thing that needsdeviceRead
is the #455 "hierarchical object identifiers", which enables large tables that live in secondary storage, for Mints with zillions of Purse objects. We've also considered a "bulk-storage API" (#512) and "large-string device" (#46) which would need synchronous reads.We'll use "large capacity" to describe environments that support these large tables (because they support sync vats), and "small capacity" to describe ones that don't (because all the vat-side state must essentially be kept on the heap in RAM).
Checkpointing: Efficient Long-Term Vat State Management
Our current vats enjoy orthogonal persistence: a carefree belief that they live forever, without the need to explicitly save or restore their state. This is enabled by our kernel, which maintains a transcript of all messages into and out of the vat over time. When the kernel is restarted and the vat needs to be regenerated from stored data, it rebuilds the initial state (by evaluating the original code that created the vat's root object), and then replays the entire transcript, delivering all inbound messages just like they happened the first time around (and comparing+discarding the outbound messages).
To make this efficient enough to use on a chain that runs for years, we must be able to snapshot the JS heap state (#511), one vat at a time, and reload that data into a new instance later. This will also help with scalability, because idle vats can be serialized off to disk and evicted from RAM, then "rehydrated" later when a message arrives for them. We won't checkpoint after every crank or block, but rather we'll periodically flatten the previous checkpoint and subsequent transcript down into a new checkpoint and an empty transcript.
Some environments we build will support this checkpointing: it is certainly critical for the chain nodes. Other environments might not, and would be suitable for low-volume or short-lived nodes, but would become increasingly inefficient to reload as their history grows.
Metering
Vat code might run too large (memory consumption), or too long (infinite loops, infinite Promise chains). This might happen because of simple bugs in otherwise-trusted code, higher usage patterns than we expected, or malice.
To protect the availability of other code running in the same environment, we need a way to shut down the over-enthusiastic computation, either by cancelling the one message (with some semantics that lets both caller and callee recover), or by terminating the vat entirely (#516) (which, while harsh, is easier to reason about). A more sophisticated metering scheme could notify some "keeper" code and give it the opportunity to pay for additional space or CPU cycles.
When swingset nodes are running in a chain environment, the decision of exactly when to give up on runaway computation must be deterministic, and made identically across the validators. This requires more precise metering than a solo environment.
We describe the metering abilities of an environment as:
cpu-none
: no ability to limit CPU usagecpu-timeout
: coarse non-deterministic limits: if a message doesn't finish processing within N seconds, cancel the message or terminate the vatcpu-exact
: deterministic metering suitable for a chain noderam-none
: no limit on memory usageram-coarse
: use process-level or JS engine-level estimate of bytes used, highly sensitive to GC and other factors, not deterministicram-exact
: deterministic memory-usage metering suitable for chain nodeFrom an operators point of view, it would be nice to assert a limit on MB of RAM in use, or CPU-seconds consumed per block. In particular, a chain's block time is limited by how quickly it can process messages, which is measured in seconds. However the meters we use may not be denominated in bytes or seconds, especially if it is deterministic, since those low-level measures are highly sensitive to unrelated factors like GC pressure and OS-level CPU load.
Implementation Techniques
This is a collection of information about our target platforms; tools we can use to implement the features described above.
HostDB sync-vs-async
p.s. see #3171
The Node.js-based kernel currently enjoys synchronous access to secondary storage, in the form of LMDB. Our choices of database were limited by the requirement for synchronous access: we were unable to use the wider "LevelDB" ecosystem because they offer a purely asynchronous API.
This also prevents us from using secondary storage in a browser environment, where IndexedDB is purely async (and LocalStorage is limited to 10MB, too small to be useful), or even tertiary storage, where we push an encrypted copy of the bulk data to an external server via HTTP when necessary (which is less vulnerable to eviction or the 2GB storage limit).
To enable sync vats, either the host must offer synchronous storage access, or the vat must somehow be paused while it waits for a read-device syscall.
Workers
Our target platforms offer various kinds of "Workers":
Worker
, and/or our C code can create a newXS
instance, which is basically the same thingIn most cases the Worker runs on a separate (concurrent) thread, although XS gives us more control over scheduling.
In some environments a Worker can be suspended while waiting on data from outside. We're told this suspend-the-instance won't be too hard to implement in XS.
The benefit of a Worker is:
The general idea is that the kernel thread sends a
postMessage
to the worker when it wants to initiate a crank (specifically when it wants to invoke one of the vat'sdispatch
methods, likedispatch.deliver
ordispatch.notifyFulfillToData
). Inside the worker, a supervisor usessetImmediate
to track when the vat becomes quiescent, then invokes the vat code. While the vat runs, any asynchronous syscalls it makes are converted by the supervisor intopostMessage
calls back up to the kernel. When the vat becomes quiescent, the supervisor notifies the kernel with anotherpostMessage
, allowing the kernel to commit (or reject) the syscalls and finish the crank.To reduce the overhead of copying, we might take advantage of transferrable
ArrayBuffer
objects. Node.js, at least, has an API to transfer ownership of a buffer entirely from one Worker to another thread. We could perhaps put thecapdata.body
string (created by JSON serialization) into one of these, and transfer it to the kernel rather than using the "structured copy" feature ofpostMessage
. If the kernel could hang on to this buffer in the run-queue, it could transfer it to the target vat later, again without copying. It remains to be seen whether this would be useful.To enable synchronous syscalls (
deviceRead
), the Worker must be suspended while waiting on the kernel's response. If we're not running in a modified XS engine, we anticipate usingSharedArrayBuffer
andAtomics.wait
. The general idea is that the Worker and the kernel share a moderate-sizedSharedArrayBuffer
for the response, as well as a small one used for coordination.deviceRead
usespostMessage
to send the request and the arguments to the kernel, and then waits on the synchronization buffer withAtomics.wait
. The kernel receives the request and invokes the device (which may involve various delays and Promises). When the response is ready, the kernel writes the response data into the shared response buffer, then writes a signal into the synchronization buffer. The Worker wakes up, reads the data out of the response buffer, and returns it to the caller in the vat. If the response could not fit in the buffer, the two sides can use a streaming protocol (one segment per round trip) to transfer everything out.SharedArrayBuffer
andAtomics.wait
were disabled in many browsers a few years ago to mitigate Spectre/Meltdown attacks. They are likely to work on Node.js, and can probably be made to work on Chrome with a bunch of exciting headers. They remain disabled in Firefox for now. As a result, some platforms will support Worker-based synchronous syscalls, while some will not.If we put each vat into a separate
Worker
(#1107), we can achieve the "cpu-timeout" style of metering by starting a timer when we dispatch a message into the Worker, and abandoning the delivery if the timer expires before the Worker becomes idle. This is not deterministic, and will vary depending upon CPU load and other external factors, but would be faster (it can be implemented without code transformations that inject metering code, which adds overhead), and might be good enough for development use.SES
We want all our code (kernel, vats, untrusted guest code) to run under SES. We see three ways to get a SES environment:
lockdown()
very very early in the lifetime of the application, immediately after running any "vetted shims" (such astame-metering
which instruments global objects to count invocations and allocations)Each new JS environment needs to be SES-ified, so if we're using Workers, the SES shim must be invoked inside each new Worker before it loads any other code.
Metering Transforms
One way to apply exact metering (without making changes to the underlying JS engine) is to inject counters into all basic blocks (function calls,
for
loops, etc), and have them throw an exception when exceeded.try/catch
blocks are similarly instrumented to prevent these meter-exhausted exceptions from being caught. The globals must also be modified, because some can be called in a way that consumes a lot of memory or CPU.The injected code looks for a special name (currently
getMeter
) to acquire control over the counters. Most counters (function counts, memory usage) are increment-only, but the stack-frame counter is increment+decrement. As a result, to prevent confined code from fradulently decrementing its stack counter (enabling it to perform infinite recursion), the transform must also prohibit the confined code from referencing this name.getMeter
must be placed in the the global lexical scope (where it will be visible to the injected code), but must not be added to theglobalThis
object (where confined code could do a computed property lookup to access, which is halting-problem impossible to prohibit).This doesn't tell us exactly how many bytes are being used, or how many CPU cycles have been consumed. Just as Ethereum's EVM assigns gas costs to each opcode, the injected counters assigns a cost to each function call, or loop iteration, or
Object
creation. These costs will have a vague, non-linear, but hopefully monotonically-increasing "good enough" relationship with the bytes or CPU-seconds that users really care about. But we describe injected metering as "exact" because the behavior should be consistent across JS engines (modulo the known non-determinisms of the JS specification), and insensitive to things like GC behavior or CPU load from unrelated processes.This transform must be inescapable. The guest code being metered must be transformed before it is evaluated. In addition, any
eval
calls used by that code must be changed to apply the transform first. Other pathways to evaluation, such as creating a newCompartment
or importing a module, must be similarly instrumented.Transformed code runs slower, so we want to disable the counters when possible, and avoid injecting code at all unless necessary. We do not need to transform trusted code like the kernel (and perhaps the static vats). The kernel needs to enable the global-object instrumentation just before it gives control to a vat, and disable it when the kernel gets control back again.
Compartments
The Compartment Proposal (currently part of SES, but not strictly bound to it) defines a
Compartment
as a collection of:c.evaluate(code)
, plus internaleval
callsc.import(what)
can to load a complete module graph into the CompartmentTo enforce a transform on a Compartment, we must also wrap it's own
Compartment
constructor with one that propagates thetransforms
option. We expect this to live in a library function, rather than being a feature of theCompartment
API itself.We need to use at least three Compartments. The first is the all-powerful "start compartment", whose global provides access to platform authorities like filesystem and network access. The kernel and all non-metered vats should be loaded into a separate less-powerful Compartment, to apply POLA (the kernel does not need arbitrary filesystem access). And all metered vats should go into a third Compartment, where the metering transform is applied.
It is probably easier to use a distinct Compartment for each vat. We don't strictly need a separate global object for each vat (they'll all be the same: frozen and powerless), but they might have different module-loader configurations, and some will have the metering transform enforced.
Compartments do not provide any sort of preemptive termination, but we might run each vat inside a separate Worker, and then create a Compartment inside the Worker for the vat's code to live.
XS
The Moddable XS platform offers a low-footprint JS engine written in C. This lacks many of the high-performance JIT features found in V8 or SpiderMonkey, but has several compelling benefits for our system.
The first is safety: the code is much much smaller, and is more likely to be auditable than something like V8. Both are written in non-memory-safe languages, unfortunately, but XS is written in plain C, whereas V8 uses C++ aggressively. The lack of a JIT makes XS's behavior far more predictable.
We believe XS will implement the JS specification more deterministically. The spec has many areas where the exact behavior is left up to the implementor. For chain nodes, where exact consensus must be reached, we need to define the precise behavior of our platform. We expect to define this as "whatever XS does", and build a list of details as we identify ways in which XS diverges from other engines. It should be possible to write vat code such that it does not trigger this divergent behavior; this will become part of our style guides and linting tools. Chain nodes cannot rely upon this, of course, but we can make sure that local vat applications (non-XS-based, probably running on Node.js or in a browser) behave the same way under non-malicious circumstances.
XS is far more easily extendable than a larger engine like V8. We anticipate adding metering instrumentation into the XS codebase (to count function calls and allocations), rather than continue our source-to-source transformation at the Compartment level. This should have far less overhead and be easier to enforce without e.g. wrapping
eval
andCompartment
.XS already has most of the code necessary to serialize a Worker (specifically the underlying
XS
instance) to data, and we're working on the code to unserialize that data back into a running instance. With some touchups to a few objects, we think we can turn this into a new Worker, enabling an efficient save/restore persistence story.With XS, we can control the communication pathways between Workers, and can probably suspend a Worker while it waits for a syscall to finish. This will allow the vat Worker to think it has a synchronous read-device syscall, while in fact the kernel side is making asynchronous calls to retrieve the necessary state.
XS defines modules much more explicitly than Node or a browser. It has
eval
(if enabled), but code cannot do arbitraryimport
statements and expect them to work. An XS application is built from a manifest that defines all the modules that can ever be referenced. Node.js lets you runnode main.js
and then the code can name other local files to import. For XS, you run a build step likemake application
that readsmain.js
and all the rest of your code, and compiles it all into a single executable. Then later you run that executable. This affects the way we build and launch our swingset applications, as well as requiring some careful integration with the Compartment module loader to accomodate the dynamic modules that will be installed into new vats.Finally, XS is a C program, which means we might compile it down into WASM, and then execute it in a WASM instance. This is most interesting in a browser.
WASM
All modern browsers (as well as Node.js) offer WASM execution engines. Running a vat inside a WASM instance offers some intriguing properties:
ArrayBuffer
that backs the instance, with a protocol between WASM instance and the host to ask for more, which may be useful for meteringHowever WASM instances run WASM bytecode, not JavaScript. To use this, we'll need to compile XS (written in C) down into WASM (we know this works already, and WASM is an officially supported platform for XS). In doing so, we can apply other customizations, like suspending the vat while it makes an apparently-synchronous syscall (which turns into a WASM-to-host import invocation, which is synchronous but which can be answered by a subsequent host-to-WASM export invocation, to unsuspend the vat). This can enable synchronous vats in a host that only offers asynchronous storage (i.e.
IndexedDB
).The Supervisor
If we use Workers, the first code we will execute inside each Worker will be a Supervisor. This must:
tame-metering
(if this vat is metered) to instrument the global objects for meteringSES
and calllockdown()
(if the engine isn't already SES)Compartment
for the liveslots and vat code, so it doesn't get access to start-compartment globalsCompartment
object to enforce this transform on internal CompartmentspostMessage
object available to the start compartmentSharedArrayBuffer
for synchronous syscalls, if possiblepostMessage
event.On each delivery, the supervisor must use
setImmediate
to monitor quiescence, then configure metering, then invoke the liveslotsdispatch
function. While this runs, it may invoke various syscalls, which must be delivered to the kernel (and, for synchronousdeviceRead
calls, the results must be returned). WhensetImmediate
fires, the supervisor knows the vat code has lost agency, and cannot regain control until the next delivery is made. At this point, it notifies the kernel that the crank is complete, and goes back to sleep waiting for a newpostMessge
delivery.The text was updated successfully, but these errors were encountered: