-
Notifications
You must be signed in to change notification settings - Fork 220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add controller.terminateVat() #8687
Comments
@Chris-Hibbert and @dckc pointed out that the mainnet bootstrap vat only has the zoe-level So, I'm going to proceed with adding this |
One consideration for these sorts of external provocations (
We didn't think to implement a broad-authority Fortunately, #8928 reduces the size of the DB changes that happen during vat termination. The main changes are to add the vatID to Alternatively, it might be cleaner to finally add a Ok, based on that argument I think I'll stick with |
This new API allows the host application to terminate any vat for which is knows the VatID (which must be gleaned manually from logs or the database). This might be useful if the normal vat code is unable or unwilling to terminate the vat, or if you need to trigger termination at some specific point in time. closes #8687
This new API allows the host application to terminate any vat for which is knows the VatID (which must be gleaned manually from logs or the database). This might be useful if the normal vat code is unable or unwilling to terminate the vat, or if you need to trigger termination at some specific point in time. closes #8687
This new API allows the host application to terminate any vat for which is knows the VatID (which must be gleaned manually from logs or the database). This might be useful if the normal vat code is unable or unwilling to terminate the vat, or if you need to trigger termination at some specific point in time. closes #8687
This new API allows the host application to terminate any vat for which is knows the VatID (which must be gleaned manually from logs or the database). This might be useful if the normal vat code is unable or unwilling to terminate the vat, or if you need to trigger termination at some specific point in time. closes #8687
This new API allows the host application to terminate any vat for which is knows the VatID (which must be gleaned manually from logs or the database). This might be useful if the normal vat code is unable or unwilling to terminate the vat, or if you need to trigger termination at some specific point in time. closes #8687
This new API allows the host application to terminate any vat for which is knows the VatID (which must be gleaned manually from logs or the database). This might be useful if the normal vat code is unable or unwilling to terminate the vat, or if you need to trigger termination at some specific point in time. closes #8687
This new API allows the host application to terminate any vat for which is knows the VatID (which must be gleaned manually from logs or the database). This might be useful if the normal vat code is unable or unwilling to terminate the vat, or if you need to trigger termination at some specific point in time. closes #8687
This new API allows the host application to terminate any vat for which is knows the VatID (which must be gleaned manually from logs or the database). This might be useful if the normal vat code is unable or unwilling to terminate the vat, or if you need to trigger termination at some specific point in time. closes #8687
This new API allows the host application to terminate any vat for which is knows the VatID (which must be gleaned manually from logs or the database). This might be useful if the normal vat code is unable or unwilling to terminate the vat, or if you need to trigger termination at some specific point in time. closes #8687
This new API allows the host application to terminate any vat for which is knows the VatID (which must be gleaned manually from logs or the database). This might be useful if the normal vat code is unable or unwilling to terminate the vat, or if you need to trigger termination at some specific point in time. closes #8687
This new API allows the host application to terminate any vat for which is knows the VatID (which must be gleaned manually from logs or the database). This might be useful if the normal vat code is unable or unwilling to terminate the vat, or if you need to trigger termination at some specific point in time. closes #8687
This new API allows the host application to terminate any vat for which is knows the VatID (which must be gleaned manually from logs or the database). This might be useful if the normal vat code is unable or unwilling to terminate the vat, or if you need to trigger termination at some specific point in time. closes #8687
This new API allows the host application to terminate any vat for which is knows the VatID (which must be gleaned manually from logs or the database). This might be useful if the normal vat code is unable or unwilling to terminate the vat, or if you need to trigger termination at some specific point in time. closes #8687
This new API allows the host application to terminate any vat for which is knows the VatID (which must be gleaned manually from logs or the database). This might be useful if the normal vat code is unable or unwilling to terminate the vat, or if you need to trigger termination at some specific point in time. closes #8687
What is the Problem Being Solved?
While working on ghost-replay tooling, I found a need to terminate a vat from outside the kernel. We currently have
controller.upgradeStaticVat()
to trigger vat upgrades, but nothing to trigger vat termination.This might be useful for mainnet remediation of things like #8401 , if our strategy is to just kill the price-feed vats that are participating in the majority of the cycles. OTOH, our remediation might live mostly in userspace, if the
adminNode
s for those vats are reachable from the core-eval environment, so we canE(adminNode).terminateWithFailure()
each one "from the inside".Description of the Design
Add
controller.terminateVat(vatID, reasonString)
.To implement this properly, we must first convert vat termination into a queued event, like
update-vat
. Some notes are already present as comments:agoric-sdk/packages/SwingSet/src/kernel/vat-admin-hooks.js
Lines 98 to 100 in 4f72580
agoric-sdk/packages/SwingSet/src/kernel/kernel.js
Line 260 in 4f72580
The basic plan is:
RunQueueEventTerminateVat
processTerminateVat()
, just afterprocessUpgradeVat
CrankResults
with the.terminate
flag setprocessDeliveryMessage
deliverRunQueueEvent
to add a case forterminate-vat
, just after the clause forchangeVatOptions
terminate
to enqueue aterminate-vat
event (usingkernelKeeper.addToAcceptanceQueue()
, just like the neighboringchangeOptions
orupgrade
do it)Timing Changes
This will change the timing of external vat termination slightly.
In the case of external termination, some parent vat does
E(adminNode).terminateWithFailure()
. This enqueues a message to vat-vat-admin. Later, when this message arrives, vat-vat-admin invokes device-vat-admin, which calls into vat-admin-hooks.js, which callsvoid terminateVat()
and then returns right away.terminateVat()
is async, so vat-vat-admin regains control promptly, before any state changes have actually happened, and it resolves theterminateWithFailure()
result promise right away. On the next kernel turn,terminateVat()
will reject all the vat's outstanding promises, delete its remaining state, and enqueue a notification to vat-vat-admin. On some future crank, that notification arrives at vat-vat-admin, which will resolve the adminNode'sdone()
promise to anyone who might be watching.There is an awkwardness in the old implementation: after
terminateVat()
enqueues the notification, it doesawait vatWarehouse.stopWorker(vatID);
, which won't fire until after thexsnap
worker process has been fully killed. We don't know how long this will take. We think it shouldn't cause a problem, even if it took several minutes, but it would be nice to be more deterministic, and not claim success until the worker is really dead. The code in vat-admin-hooks currently ignores this Promise.. I'm not sure if we could/should change that to await it instead.With this change, vat-admin-hooks.js will merely enqueue a request, and the termination won't actually happen until that request makes it to the top of the run-queue. Where previously the termination was delayed by one trip-through-the-queue (the delivery of
E(adminNode).terminateWithFailure()
), it is now delayed by two trips (adding waiting forterminate-vat
to reach the top). Likewise, previously thedone()
promise resolution was delayed by three trips (relative to the sending ofterminateWithFailure()
): delivery ofterminateWithFailure()
to vat-vat-admin, notification fromterminateVat()
to vat-vat-admin, and finally notification ofdone()
promise resolution to the subscribing vats. This change will increase that to four trips.In the case of self-termination (
vatPowers.exitVat()
) or error-termination (bad syscall), the syscall will set a flag, and the end-of-crank processing will seecrankResults.terminate
, and will callterminateVat
. This will remain immediate: the vat must not survive the crank ("death before confusion": if it requested self-termination, it was because some invariant has been violated, and it must not receive any further messages).Alternate Approaches
We could consider leaving
E(adminNode).terminateWithFailure()
alone, and merely introduce the new run-queue event for the benefit ofcontroller.terminateVat()
. That would leave the worker-killed awkwardness, but would avoid changing the timing ofterminateWithFailure()
. I don't think that timing is important, and the awkwardness is awkward, so I'm inclined to do the full cleanup.Security Considerations
None: this new method is only available from outside the kernel.
Scaling Considerations
Terminating a vat will clean up everything it was referencing (modulo our ongoing efforts to finish vat-data cleanup: we don't delete unreachable durable data yet). This may trigger a significant amount of GC work, which might take a while to play out. Keep this in mind when choosing to delete a vat with a lot of imports or exports.
Test Plan
Normal unit tests, following the pattern of
controller.upgradeStaticVat
Upgrade Considerations
The new kernel will have a new kind of run-queue message, but will also be able to process this message. This introduces a no-downgrade ratchet, albeit short-range: while a
terminate-vat
message is on the run-queue, downgrading the kernel will cause a failure when that message reaches the top. But we don't support kernel downgrades in general anyways, every change we make introduces another such ratchet.The text was updated successfully, but these errors were encountered: