You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When we're in non-consensus mode, we want to be tolerant of GC activity happening in different order (during vat transcript replay) from one kernel restart to the next. In consensus mode, we require that each replica and each restart sees the same set of GC operations in any given crank, but I think we want to be more tolerant in an ag-solo, or in a local replay (under Node.js, for debugging) of a transcript recorded on a chain fullnode (under XS).
The simplest way to support this is to not record GC-related syscalls in the transcript, and to ignore them when doing a replay (the simulated syscalls is a no-op, instead of the usual "pop the next syscall record off the transcript and compare" test). This means GC-related syscalls cannot return data (good thing they don't), and cannot cause an "anachrophobia" error.
There is one consideration to make: in non-consensus mode, when doing a replay, the new execution might observe a GC transition earlier than the previous execution would have. This means the vat might emit a syscall.dropImport during replay that was not included in the transcript, which is fine, but that dropImport isn't going to happen a second time. So we need to watch the series of dropImports that happen during replay and compare them against the clist. If, at the end of replay, we've seen a dropImport for something that's still marked as "reachable" in the clist, that means we've witnessed this "earlier than the historical record" dropImports, and we need to process it. If we ignore it, we'll be keeping some object pinned in memory, maybe forever.
I'm going to defer this second part for a while, I don't think it will cause any problems in the near term.
Test Plan
Unit tests.
The text was updated successfully, but these errors were encountered:
Consensus mode will depend upon GC being deterministic, but solo mode does
not. Solo mode requires GC be "sufficiently deterministic", which means a
finalizer may or may not run in any given crank.
To support this, we must not record the GC-related syscalls (dropImport,
retireImport, retireExport) in the transcript. When replaying a transcript,
we ignore these syscalls as well.
closes#3146
refs #2615
refs #2660
refs #2724
Consensus mode will depend upon GC being deterministic, but solo mode does
not. Solo mode requires GC be "sufficiently deterministic", which means a
finalizer may or may not run in any given crank.
To support this, we must not record the GC-related syscalls (dropImport,
retireImport, retireExport) in the transcript. When replaying a transcript,
we ignore these syscalls as well.
closes#3146
refs #2615
refs #2660
refs #2724
Consensus mode will depend upon GC being deterministic, but solo mode does
not. Solo mode requires GC be "sufficiently deterministic", which means a
finalizer may or may not run in any given crank.
To support this, we must not record the GC-related syscalls (dropImport,
retireImport, retireExport) in the transcript. When replaying a transcript,
we ignore these syscalls as well.
closes#3146
refs #2615
refs #2660
refs #2724
What is the Problem Being Solved?
When we're in non-consensus mode, we want to be tolerant of GC activity happening in different order (during vat transcript replay) from one kernel restart to the next. In consensus mode, we require that each replica and each restart sees the same set of GC operations in any given crank, but I think we want to be more tolerant in an ag-solo, or in a local replay (under Node.js, for debugging) of a transcript recorded on a chain fullnode (under XS).
part of #3106
Description of the Design
The simplest way to support this is to not record GC-related syscalls in the transcript, and to ignore them when doing a replay (the simulated syscalls is a no-op, instead of the usual "pop the next syscall record off the transcript and compare" test). This means GC-related syscalls cannot return data (good thing they don't), and cannot cause an "anachrophobia" error.
There is one consideration to make: in non-consensus mode, when doing a replay, the new execution might observe a GC transition earlier than the previous execution would have. This means the vat might emit a
syscall.dropImport
during replay that was not included in the transcript, which is fine, but thatdropImport
isn't going to happen a second time. So we need to watch the series ofdropImports
that happen during replay and compare them against the clist. If, at the end of replay, we've seen adropImport
for something that's still marked as "reachable" in the clist, that means we've witnessed this "earlier than the historical record"dropImport
s, and we need to process it. If we ignore it, we'll be keeping some object pinned in memory, maybe forever.I'm going to defer this second part for a while, I don't think it will cause any problems in the near term.
Test Plan
Unit tests.
The text was updated successfully, but these errors were encountered: