Consider including GC calls in vat transcript #5564

mhofman · 2022-06-10T01:23:56Z

What is the Problem Being Solved?

I'm wondering if we should stop stripping the GC syscalls from the transcript. We did this when we wanted to tolerate different organic GC schedules between different vats (the "formal vs local GC" era). Given our need to have metering be identical, and our work to make GC consequences only observable during bringOutYourDead, I think at this point we could reasonably expect all GC activity to be in consensus, and thus it might no longer be necessary to ignore GC syscalls when replaying the transcript.

If we stopped ignoring them, this bug would probably have surfaced much faster: if the error caused a second GC syscall to not happen, we'd have gotten an AnacrophobiaError during replay.

Description of the Design

Include GC syscalls in the transcript and expect identical replay during BOYD.

Test Plan

TBD

The text was updated successfully, but these errors were encountered:

warner · 2022-06-15T16:17:50Z

I realized that an even better argument for this is our hope for deterministic heap snapshots. The contents of a heap snapshot are more sensitive to vat execution than the metering results, because we can (and do) disable metering during deserialization operations which might probe a WeakRef. But the state (full/empty) of that WeakRef depends upon whether GC has happened or not, and that state is obviously included in the heap snapshot. If we want all validators to get the same heap snapshot, we can't afford to allow them to perform GC at different times.

A related issue is transcript replay for debugging, where we take the original (xs-worker) transcript from the chain, and replay it locally under a Node.js worker (so you can point a real debugger at it). Our use of forced GC, coupled with our accumulation/suppression of finalizer results until an explicit bringOutYourDead, ought to result in the same observable GC activity (i.e. syscalls) on both the local worker and the original XS process. We're still seeing Node.js/V8 doing something weird (not finalizing or releasing some objects as fast when it should), which can cause that to break, but I think if we could find a fix for that, our local replay would match the xs-worker one.

Omitting/ignoring GC syscalls in the transcript used to be a sort of workaround for that bug: before we introduced vatstore-held refcounts, the only consequence of GC was a set of dropImports/retireImports/retireExports, and they always happened at the end of a crank, and none of them return a result. So if GC happened a during delivery 5 instead of delivery 3, then d3 would be missing some GC syscalls (but the filtered list was identical), and d5 would have some extra syscalls (but we could synthesize their (empty) results without needing to get them from a transcript, and again the filtered list was identical).

But once we introduced vatstore refcounts, that changed: now GC happening in delivery 3 means we get some refcount checks in d3, which does syscall.vatstoreGet, which demands a result. If that syscall gets delayed until delivery 5, we won't know what result to give it, since the d5 transcript doesn't have a record of the syscall.

So I think our "replay locally" (under Node) story depends upon finding and fixing the Node/V8 "GC sometimes happens too late" bug (#3240 , #5575, #5202), otherwise our local replays will fail as soon as they experience a late GC event. We can replay them locally under xs, of course, but the debugging experience leaves something to be desired.

So actions to take:

in src/kernel/vat-loader/transcript.js:
- in addSyscall(), remove the gcSyscalls.has check
- in simulateSyscall(), remove the gcSyscalls.has check
- remove the gcSyscalls definition altogether
probably update some tests

Closes #5564

mhofman added enhancement New feature or request SwingSet package: SwingSet labels Jun 10, 2022

mhofman assigned FUDCo Jun 10, 2022

This was referenced Jun 10, 2022

Transcript replay should panic on delivery failure #5565

Closed

swingset transcript.js returns bogus VatSyscallResults during replay for GC syscalls #5557

Closed

warner mentioned this issue Jun 15, 2022

remove gcEveryCrank and worker type xs-worker-no-gc #5600

Closed

FUDCo added a commit that referenced this issue Jun 23, 2022

feat: include GC calls in transcript

81544ec

Closes #5564

FUDCo mentioned this issue Jun 23, 2022

Include GC calls in transcript #5656

Merged

FUDCo added a commit that referenced this issue Jun 23, 2022

feat: include GC calls in transcript

90b297b

Closes #5564

FUDCo added a commit that referenced this issue Jun 23, 2022

feat: include GC calls in transcript

f124c76

Closes #5564

michaelfig pushed a commit that referenced this issue Jun 25, 2022

feat: include GC calls in transcript

17764ff

Closes #5564

mergify bot closed this as completed in #5656 Jun 25, 2022

Tartuffo added this to the Mainnet 1 milestone Jul 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider including GC calls in vat transcript #5564

Consider including GC calls in vat transcript #5564

mhofman commented Jun 10, 2022

warner commented Jun 15, 2022

Consider including GC calls in vat transcript #5564

Consider including GC calls in vat transcript #5564

Comments

mhofman commented Jun 10, 2022

What is the Problem Being Solved?

Description of the Design

Test Plan

warner commented Jun 15, 2022