-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
exclude liveslots deserialization code from metering #3667
Conversation
c621af7
to
dfeff9e
Compare
const limit = globalThis.currentMeterLimit(); | ||
const before = globalThis.resetMeter(0, 0); | ||
meteringDisabled += 1; | ||
try { | ||
return await thunk(); | ||
} finally { | ||
globalThis.resetMeter(limit, before); | ||
meteringDisabled -= 1; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nervous whether this pattern actually works -- can you get away with scoping async operations like this, even if you use await
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just checked with @erights and @michaelfig , and the answer is "yes, but don't do it that way". The finally
is guaranteed to run, and in our case this pattern would be fine, but their recommendation was to use this instead:
async function runWithoutMeteringAsync(thunk) {
const limit = globalThis.currentMeterLimit();
const before = globalThis.resetMeter(0, 0);
meteringDisabled += 1;
return Promise.resolve()
.then(() => thunk())
.finally(() => {
globalThis.resetMeter(limit, before);
meteringDisabled -= 1;
});
}
The difference is that if thunk()
throws (synchronously), in my version (return await thunk()
) the finally
block will get executed in the same turn, but in theirs (using Promise.prototype.finally
) the finalizer always runs in its own turn.
I'll update it to use their recommendation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just checked with @erights and @michaelfig , and the answer is "yes, but don't do it that way". The finally is guaranteed to run, and in our case this pattern would be fine, but their recommendation was to use this instead:
Just for the record, what did I say "yes" to? The fact that the finally may run in the same or another turn, as you correctly explain, for me means that code does not work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Yes" meant that finally { }
always runs. even if there's a return
or await
in the try { }
, which I think was the "does JS work the way you seem to think it works" question that @FUDCo was asking. I think we were both uncertain about whether the JS try/finally pattern continues to work after the introduction of async/await, but it does: the general semantics are the same, modulo the timing subtlety you pointed out.
In this case, my code was not depending upon having a turn boundary between the thunk running and the counter being decremented. runWithoutMeteringAsync
is not being granted to adversarial code (this is entirely within liveslots), so there's not an adversary to perform reentrancy. The thunk
is also friendly code (if it were adversarial, a trivial Promise.then would let it regain agency after the thunk's Promise had returned). I'm all in favor of using a pattern that's easier to review, and I've updated this PR to do so, but in this case it wouldn't have caused a bug to do it the return await thunk()
way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well OK then
// Set.delete() metering seems unaffected by presence/absence, but it | ||
// doesn't matter anyway because deadSet.add only happens when | ||
// finializers run, which happens deterministically |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to me that this comment is not quite right. Finalizers run non-deterministically -- that's why we use deadSet
to protect ourselves. It's the processing of deadSet
that's run deterministically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By "finalizers run deterministically" I mean that, given the way xsnap.c
uses XS, finalizer callbacks only run during the gcAndFinalize()
window. XS only polls the registries during fxEndJob()
, and we only call that after the promise queue is empty (
agoric-sdk/packages/xsnap/src/xsnap.c
Line 1108 in cc06a29
for (;;) { |
void fxRunLoop(txMachine* the)
{
c_timeval tv;
txNumber when;
txJob* job;
txJob** address;
for (;;) {
while (the->promiseJobs) {
while (the->promiseJobs) {
the->promiseJobs = 0;
fxRunPromiseJobs(the);
}
// give finalizers a chance to run after the promise queue is empty
fxEndJob(the);
// if that added to the promise queue, start again
}
// at this point the promise queue is empty
// .. the rest of the code checks for timer jobs
That's not mandatory: I think the JS spec allows engines to run the finalizer callbacks any time they want (as long as they happen in their own turn). The tricky part, for us, is metering. If the finalizer could happen at any arbitrary time, then it would get metered. The only thing a finalizer does is to examine the weakref and add non-revived vrefs to the deadset, which doesn't cause e.g. immediate syscalls, but even that would cause metering sensitivity to the state of the weakref:
function finalizeDroppedImport(vref) {
const wr = slotToVal.get(vref);
if (wr && !wr.deref()) {
// we're in the COLLECTED state, or FINALIZED after a re-introduction
deadSet.add(vref);
slotToVal.delete(vref);
// console.log(`-- adding ${vref} to deadSet`);
}
}
So our "metering is insensitive to GC timing" story already depends upon finalizers not running during the userspace phase.
If we were somehow not depending upon that, and finalizers could run at any old time, then a vref
might or might not be in the deadSet
when this convertValToSlot
needed to delete it. We delete it because this might be a re-introduction of the vref: imagine a dispatch.deliver()
crank which introduces o-4
, then drops it, then performs a device invocation whose return value re-introduces o-4
. If the finalizer could run in the middle of the crank, we'd have o-4
in our deadSet
when the D()
return value was unserialized and convertValToSlot('o-4')
creates a new Presence
, but o-4
is still in the deadSet
. (Now that I think about it, we don't strictly need to delete it from deadSet
.. processDeadSet
will check the weakref anyways. But I think you and I decided that it would be tidier to delete it as soon as we know it's been re-introduced).
So in that case, deadSet.delete('o-4')
would sometimes be deleting something, and sometimes not, depending upon this hypothetical non-deterministic finalizer callback having already run or not. If .delete
's metering usage were different in these two situations, then now the metering of convertValToSlot
(i.e. m.serialize()
) would become sensitive to GC state, which we don't want, because we want serialization to be metered, because (at least) it could invoke user-provided getters, which would allow user code to escape metering.
If that deadSet.delete(slot)
were replaced with if (deadSet.has(slot)) { deadSet.delete(slot); }
, that'd be obviously metering-affecting. But I wasn't sure about the built-in deadSet.delete()
.. does it follow differently-metered code paths internally? I did an experiment and convinced myself that, at least right now, it does not: deleting an existing key and deleting a non-existing key both use the same metering.
So we're protected in two ways. 1: deadSet.delete()
's metering doesn't depend upon the presence/absence of the key, and 2: the key won't get added in the middle of a crank (the metered region) because that only happens in the finalizer callback, and our xsnap.c
ensures that finalizers only happen at end-of-crank (when we call gcAndFinalize
).
I'll update the comment to make it clear that the reason our finalizers run deterministically is because we wrote xsnap.c
to make it that way, and that JS in general does not guarantee that in the slightest.
The `meterControl` tool will give liveslots a way to temporarily disable metering, so it can perform GC-sensitive operations without affecting the vat's metering results. Each supervisor provides their own version: all are dummy facades except for XS. refs #3458
We're trying to hedge against XS not necessarily performing "organic" (non-forced) GC at exactly the same across all members (validators) of a consensus machine, especially when some members have reloaded a vat from snapshot (i.e. restarted) recently and others have not. We rely upon finalizer callbacks only running at explicitly-deterministic times, but we must still guard against objects becoming collected spontaneously, which will cause a WeakRef to become "dead" (`wr.deref() === undefined`) at a random point in the middle of a turn. Any code which calls `wr.deref`, or is conditionally executed/skipped according to the results, is "GC-sensitive". This includes `convertSlotToVal`, and therefore `m.unserialize`. We cannot allow metering results to diverge between validators, because: * 1: it might make the difference between the crank completing successfully, and the vat being terminated for a per-crank metering fault * 2: it will change the large-scale Meter value, which is reported to userspace * 3: it might cause the runPolicy to finish the block earlier on one validator than on others all of which would cause a consensus failure. To prevent this, we run most of the "inbound" side of liveslots without metering. This includes the first turn of all `dispatch.*` methods, which runs entirely within liveslots: * `dispatch.deliver` performs argument deserialization in the first turn, then executes user code in the second and subsequent turns * `dispatch.notify` does the same * the GC deliveries (`dispatch.dropExport`, etc) only use one turn We also disable metering when deserializing the return value from a (synchronous) device call, and when retiring a promise ID (which touches `slotToVal`). Finally, we disable metering for all turns of the post-crank GC `finish()` call. This excludes all invocations of the finalizer callbacks, as well as all the `processDeadSet` code which is highly sensitive to the results. closes #3458
For some reason, most of the deserialization tests in test-marshal.js have been creating a full SwingSet controller and then throwing it away. This is an ancient leftover from the PlaygroundVat days, and removing it reduces the test's runtime from about 20 seconds to just 1.
The use of `await` inside a try/finally block would expose a subtle timing issue (if `thunk()` throws synchronously, the finalizer would run in the same turn as the thunk, rather than in a subsequent turn). Our recommended style avoids conditional `await` for just that reason, so this updates `runWithoutMeteringAsync` to use a better approach.
dfeff9e
to
64e4f2f
Compare
We're trying to hedge against XS not necessarily performing
"organic" (non-forced) GC at exactly the same across all members (validators)
of a consensus machine, especially when some members have reloaded a vat from
snapshot (i.e. restarted) recently and others have not. We rely upon
finalizer callbacks only running at explicitly-deterministic times, but we
must still guard against objects becoming collected spontaneously, which will
cause a WeakRef to become "dead" (
wr.deref() === undefined
) at a randompoint in the middle of a turn. Any code which calls
wr.deref
, or isconditionally executed/skipped according to the results, is "GC-sensitive".
This includes
convertSlotToVal
, and thereforem.unserialize
.We cannot allow metering results to diverge between validators, because:
and the vat being terminated for a per-crank metering fault
userspace
validator than on others
all of which would cause a consensus failure.
To prevent this, we run most of the "inbound" side of liveslots without
metering. This includes the first turn of all
dispatch.*
methods, whichruns entirely within liveslots:
dispatch.deliver
performs argument deserialization in the first turn,then executes user code in the second and subsequent turns
dispatch.notify
does the samedispatch.dropExport
, etc) only use one turnWe also disable metering when deserializing the return value from
a (synchronous) device call, and when retiring a promise ID (which touches
slotToVal
).Finally, we disable metering for all turns of the post-crank GC
finish()
call. This excludes all invocations of the finalizer callbacks, as well as
all the
processDeadSet
code which is highly sensitive to the results.closes #3458