You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a plan to let vats use a non-transcripted syscall.dropImport to signal both deterministic/deliberate drops and WeakRef-based ones. To implement it, liveslots will need to know when the user-level vat code has lost agency, which means it needs access to setImmediate (or more likely the waitUntilQuiescent wrapper). This will rearrange some of the responsiblity for knowing that a crank has ended, but into a way that fits better with the various worker types.
The idea is that liveslots is responsible for managing WeakRefs (in the slotToVal table) and the FinalizationRegistry used to detect deletions, and those notifications always come in their own turn, and liveslots needs to (eventually, at the right time) react to those notifications by making dropImport syscalls.
My plan is for the crank cycle to look like:
liveslots gets control via dispatch.deliver or dispatch.notify (the future dispatch.dropExport is the same, but won't result in the user-level vat code getting control)
in this approach, dispatch.* functions return a Promise, and the supervisor relies upon it to know when the crank is done, rather than using setImmediate/waitUntilQuiescent itself
liveslots deserializes the arguments, updating slotToVal/valToSlot in the process for any new imports
liveslots invokes user-level vat code, by invoking a method (dispatch.deliver) or resolving one or more Promises (dispatch.notify)
while user code is running, more syscalls will be made, including syscall.dropImport for vats which have a way to voluntarily give up access to an import (e.g. the comms vat, maybe the referenced imports of explicitly-deleted virtual objects)
liveslots waits until user code has gone idle ("lost agency"), with setImmediate
(on some platforms)
liveslots invokes a gc() function provided by its supervisor, to trigger an engine-level GC sweep
liveelots waits some more, to allow FinalizationRegistry callbacks to run. In my experiments on Node.js, this required two setImmediate cycles, and/or maybe a setTimeout(epsilon), I need to re-investigate
maybe gc() should return a Promise, and encapsulate any necessary stalls
liveslots checks the set of dereferenced imports accumulated by the FR callbacks, sorts them somehow (to improve determinism), and performs syscall.dropImport for each
liveslots resolves the dispatch.* return Promise, letting the supervisor know the crank is done
The bigger picture is a collection of the kernel, some VatManagers (of various types), their associated worker processes (perhaps local, perhaps in a child process), the Supervisors in those workers, the liveslots layer, and the user-level vat code.
the kernel pulls an item off the run-queue, figures out which vat it is destined for, creates a KernelDeliveryObject, translates it (through the kernel's c-lists for that vat) into a VatDeliveryObject, hands it to the right VatManager (which returns a Promise for when the delivery is complete)
the VatManager somehow conveys the VatDeliveryObject (which is pure serializable data) to the worker, which might mean serializing it over a message pipe, or handing it to a local function
the worker somehow receives this VatDeliveryObject, deserializing it if necessary, giving it to the supervisor
the supervisor configures/resets the meters, somehow
the supervisor enables liveslot's syscall object
the supervisor invokes liveslot's dispatch.* method, which returns a Promise for when liveslots is done
when the supervisor sees that Promise resolve, it disables the syscall object, consults the meters for underflow or remaining computrons, and sends the crank results back to the parent (which might need to be serialized by the worker, to send over the message pipe)
when the VatManager receives the crank results from its worker, it resolves the Promise it gave to the kernel
when the kernel sees that Promise resolved, it either commits the crank (for success) or rolls it back (for failure)
the kernel loops back to the next run-queue item, or gives the host application the option of continuing or finishing a block
The authority/reliance allocation is:
the VatManager can do anything the worker can do, plus crash the kernel
the worker can do anything liveslots can do, plus crash the worker, violate metering, be nondeterministic
liveslots can do anything the vat can do (send messages to any object exposed to the vat), plus be nondeterministic, plus provoke a GC sweep using whatever tool the supervisor gives it
the user-level buildRootObject can use vatPowers to read/write the per-vat offline storage (enabling a non-ocap communication channel), maybe a few other minor powers
user-level objects are limited by normal ocap discipline
For XS-based workers, the gc() tool will only provoke a sweep of the one engine. (Our current xsnap approach only has one engine per process, but the long-term picture will have multiple). For Node.js we might have it provoke GC across the entire kernel-plus-local-workers process, or make it a no-op, depending upon what the performance consequences are. The issue is that frequent GC is less efficient than batching it, but rarely-visited vats might have dead objects that can't be reported until they get control again. User-level code will only lose reachability to objects in response to a delivery being made (including dropExport), which happens in a crank, so the best time to discover those drops is just afterwards (with an intervening gc() call to prod the engine into finding out). If we knew for sure that there was another delivery to this vat coming up, we could defer the gc() and amortize the cost, but we don't generally have way to do that (maaaybe something in the kernel that keeps track of lonely vats and sends them a "hey, haven't talked in a while, I have no work for you, but do you maybe have any garbage from before that we should clean up" message.. we could do this just before evicting them into a snapshot, but that wouldn't help with memory/object footprint before eviction).
Alternate Approaches
My earlier thinking in #1872 used a separate special dispatch.bringOutYourDead call, which returns a list of dereferenced vrefs. I think we need a syscall form of this, both for the comms vat (which knows exactly when an import is no longer referenced, which is just after it receives a dropImport from the last remote system, modulo local promise resolutions that might also maintain a reference), and for vats that can somehow deliberately drop imports (like a virtual object that is explicitly deleted, releasing any data it contained). Once we have a dropImport syscall for that purpose, it makes sense to have the bring-out-your-dead phase use it too, rather than a secondary pathway in the return value.
The purpose of an explicit bringOutYourDead is to avoid hearing about dropImports from Vat A while we're actually running a crank on Vat B. One alternative would be for the kernel/supervisor/something to follow every delivery with a gc() sweep and a bringOutYourDead call, which would avoid needing to give either gc() or setImmediate() to liveslots. This could leave the prompt-vs-efficient tradeoff to something higher up, which might be better. But the manager/worker is what knows the vat's JS engine the best, and it would involve three phases instead of just one, which feels more complicated.
#1872 also proposes a layering of notifications: vats could notify the kernel that they have some garbage to collect (without naming which vrefs) at any time, and the kernel reacts by scheduling a bringOutYourDead call at some point in the future. That gives the kernel control over the prompt-vs-efficient tradeoff, but doesn't provide a good story for when gc() should be provoked. In Node.js we can probably rely upon the automatic occasional gc() call to manage local heap space, but uncollected garbage also keeps objects alive in other vats, and on remote machines, so we might want to learn towards promptness over efficiency. In an XS worker, each worker has its own engine, so the kernel has less visiblity into when it might be appropriate to provoke GC on each one.
Description of the Design
Security Considerations
Test Plan
The text was updated successfully, but these errors were encountered:
What is the Problem Being Solved?
I have a plan to let vats use a non-transcripted
syscall.dropImport
to signal both deterministic/deliberate drops and WeakRef-based ones. To implement it, liveslots will need to know when the user-level vat code has lost agency, which means it needs access to setImmediate (or more likely thewaitUntilQuiescent
wrapper). This will rearrange some of the responsiblity for knowing that a crank has ended, but into a way that fits better with the various worker types.The idea is that liveslots is responsible for managing
WeakRef
s (in theslotToVal
table) and theFinalizationRegistry
used to detect deletions, and those notifications always come in their own turn, and liveslots needs to (eventually, at the right time) react to those notifications by makingdropImport
syscalls.My plan is for the crank cycle to look like:
dispatch.deliver
ordispatch.notify
(the futuredispatch.dropExport
is the same, but won't result in the user-level vat code getting control)dispatch.*
functions return a Promise, and the supervisor relies upon it to know when the crank is done, rather than usingsetImmediate
/waitUntilQuiescent
itselfslotToVal
/valToSlot
in the process for any new importsdispatch.deliver)
or resolving one or more Promises (dispatch.notify
)syscall.dropImport
for vats which have a way to voluntarily give up access to an import (e.g. the comms vat, maybe the referenced imports of explicitly-deleted virtual objects)setImmediate
gc()
function provided by its supervisor, to trigger an engine-level GC sweepFinalizationRegistry
callbacks to run. In my experiments on Node.js, this required twosetImmediate
cycles, and/or maybe asetTimeout(epsilon)
, I need to re-investigategc()
should return a Promise, and encapsulate any necessary stallssyscall.dropImport
for eachdispatch.*
return Promise, letting the supervisor know the crank is doneThe bigger picture is a collection of the kernel, some VatManagers (of various types), their associated worker processes (perhaps local, perhaps in a child process), the Supervisors in those workers, the liveslots layer, and the user-level vat code.
syscall
objectdispatch.*
method, which returns a Promise for when liveslots is donesyscall
object, consults the meters for underflow or remaining computrons, and sends the crank results back to the parent (which might need to be serialized by the worker, to send over the message pipe)The authority/reliance allocation is:
buildRootObject
can usevatPowers
to read/write the per-vat offline storage (enabling a non-ocap communication channel), maybe a few other minor powersFor XS-based workers, the
gc()
tool will only provoke a sweep of the one engine. (Our currentxsnap
approach only has one engine per process, but the long-term picture will have multiple). ForNode.js
we might have it provoke GC across the entire kernel-plus-local-workers process, or make it a no-op, depending upon what the performance consequences are. The issue is that frequent GC is less efficient than batching it, but rarely-visited vats might have dead objects that can't be reported until they get control again. User-level code will only lose reachability to objects in response to a delivery being made (includingdropExport
), which happens in a crank, so the best time to discover those drops is just afterwards (with an interveninggc()
call to prod the engine into finding out). If we knew for sure that there was another delivery to this vat coming up, we could defer thegc()
and amortize the cost, but we don't generally have way to do that (maaaybe something in the kernel that keeps track of lonely vats and sends them a "hey, haven't talked in a while, I have no work for you, but do you maybe have any garbage from before that we should clean up" message.. we could do this just before evicting them into a snapshot, but that wouldn't help with memory/object footprint before eviction).Alternate Approaches
My earlier thinking in #1872 used a separate special
dispatch.bringOutYourDead
call, which returns a list of dereferenced vrefs. I think we need a syscall form of this, both for the comms vat (which knows exactly when an import is no longer referenced, which is just after it receives adropImport
from the last remote system, modulo local promise resolutions that might also maintain a reference), and for vats that can somehow deliberately drop imports (like a virtual object that is explicitly deleted, releasing any data it contained). Once we have adropImport
syscall for that purpose, it makes sense to have the bring-out-your-dead phase use it too, rather than a secondary pathway in the return value.The purpose of an explicit
bringOutYourDead
is to avoid hearing aboutdropImports
from Vat A while we're actually running a crank on Vat B. One alternative would be for the kernel/supervisor/something to follow every delivery with agc()
sweep and abringOutYourDead
call, which would avoid needing to give eithergc()
orsetImmediate()
to liveslots. This could leave the prompt-vs-efficient tradeoff to something higher up, which might be better. But the manager/worker is what knows the vat's JS engine the best, and it would involve three phases instead of just one, which feels more complicated.#1872 also proposes a layering of notifications: vats could notify the kernel that they have some garbage to collect (without naming which vrefs) at any time, and the kernel reacts by scheduling a
bringOutYourDead
call at some point in the future. That gives the kernel control over the prompt-vs-efficient tradeoff, but doesn't provide a good story for whengc()
should be provoked. In Node.js we can probably rely upon the automatic occasionalgc()
call to manage local heap space, but uncollected garbage also keeps objects alive in other vats, and on remote machines, so we might want to learn towards promptness over efficiency. In an XS worker, each worker has its own engine, so the kernel has less visiblity into when it might be appropriate to provoke GC on each one.Description of the Design
Security Considerations
Test Plan
The text was updated successfully, but these errors were encountered: