Nondeterministic SwingSet historical inaccuracy in CI #4617

kriskowal · 2022-02-21T04:34:36Z

Describe the bug

At a5de8dd, the following SwingSet error occurred in CI but did not reappear after re-running the workflow.

  activityhash-vs-start › comms initialize is deterministic

  Rejected promise returned by test. Reason:

  Error {
    message: 'historical inaccuracy in replay of v1',
  }

The text was updated successfully, but these errors were encountered:

michaelfig · 2022-02-22T16:50:28Z

@warner will this mean nondeterminism between validators on the chain?

warner · 2022-02-22T18:26:54Z

note to self: the test was added to fix #3726, so the problem may resemble something from there

warner · 2022-02-22T19:22:36Z

One test failure looked like:

  ✔ activityhash-vs-start › restarting kernel does not change activityhash (47s)
anachrophobia strikes vat v1
delivery completed with 3 expected syscalls remaining
expected: {"0":"vatstoreGet","1":"vom.rc.o-51","length":2}
expected: {"0":"vatstoreGet","1":"vom.rc.o-52","length":2}
expected: {"0":"vatstoreGet","1":"vom.rc.o-53","length":2}
REJECTED from ava test: (Error#1)
Error#1: historical inaccuracy in replay of v1
  at Object.finishReplayDelivery (.../swingset-vat/src/kernel/vatManager/transcript.js:87:23)
  at replayOneDelivery (.../swingset-vat/src/kernel/vatManager/manager-helper.js:189:23)
  at async Object.replayTranscript (.../swingset-vat/src/kernel/vatManager/manager-helper.js:220:9)
  at async ensureVatOnline (.../swingset-vat/src/kernel/vatManager/vat-warehouse.js:134:5)
  at async Object.start (.../swingset-vat/src/kernel/vatManager/vat-warehouse.js:177:7)
  at async Object.start (.../swingset-vat/src/kernel/kernel.js:936:5)
  at async makeSwingsetController (packages/SwingSet/src/controller.js:290:3)
  at async packages/SwingSet/test/test-activityhash-vs-start.js:118:14

  ✖ activityhash-vs-start › comms initialize is deterministic Rejected promise returned by test

in https://github.com/Agoric/agoric-sdk/runs/5277256769?check_suite_focus=true

warner · 2022-02-22T19:24:45Z

Huh, that's not even comms (vom is the virtual object manager, which comes from liveslots, and comms doesn't use liveslots). The only other vat in that test is the bootstrap vat. That looks like the initialization of the VatData collection kinds.

warner · 2022-02-23T01:52:37Z

This seems to be another instance of #3240. If I run just yarn test test/activityhash-vs-start.js -m "comms initialize is deterministic" by itself in a loop, nothing fails despite an hour of runtime. If I run both test cases within that file (AVA runs them in parallel), I get a failure almost immediately.

The comms test creates the first swingset, lets the bootstrap run, records the state, runs a second message, records the activity hash, then terminates the swingset. Then it launches a second swingset from the recorded state, runs the same message as before, and compares the activity hash.

In both passes, the second message causes a setReceiver() to be sent to an object within the bootstrap vat, with an argument pointing to an object exported by the comms vat. setReceiver() ignores its argument, so this import should be eligible for collection right away.

In my failing case, a slog trace of the first pass shows that imported object being collected during the bringOutYourDead that followed, as expected:

{"type":"deliver","vatID":"v1","deliveryNum":8,"replay":false,"vd":["message","o+4",{"method":"setReceiver","args":{"body":"[{\"@qclass\":\"slot\",\"index\":0}]","slots":["o-55"]},"result":null}]}
{"type":"deliver","vatID":"v1","deliveryNum":9,"replay":false,"vd":["bringOutYourDead"]}
{"type":"syscall","vatID":"v1","deliveryNum":9,"syscallNum":0,"replay":false,"vsc":["vatstoreGet","vom.rc.o-55"]}
{"type":"syscall","vatID":"v1","deliveryNum":9,"syscallNum":1,"replay":false,"vsc":["dropImports",["o-55"]]}
{"type":"syscall","vatID":"v1","deliveryNum":9,"syscallNum":2,"replay":false,"vsc":["retireImports",["o-55"]]}

but in the second pass, it was not:

{"type":"deliver","vatID":"v1","deliveryNum":8,"replay":false,"vd":["message","o+4",{"method":"setReceiver","args":{"body":"[{\"@qclass\":\"slot\",\"index\":0}]","slots":["o-55"]},"result":null}]}
{"type":"deliver","vatID":"v1","deliveryNum":9,"replay":false,"vd":["bringOutYourDead"]}

I think that running both test functions in parallel, on V8, on Node 14, caused the finalizers to not fire during bringOutYourDead on the second pass, therefore the VOM refcount checkers weren't notified about the Presence going away, therefore they did not make the vatstoreGet syscalls to check the on-disk refcount, and liveslots did not emit syscall.dropImports. In my local reproduction, the misbehavior occurred during the live (non-replayed) delivery, so the divergence was detected by the activity hash (one live delivery did syscalls, the other live delivery did not). Most unit tests ignore the activity hash, but this test checks it explicitly, so it caught this form of divergence.

In CI, I think the misbehavior occurred during the replay of the earlier (bootstrap()) delivery, so it made different syscalls than the original (the live delivery made syscalls, the replay did not). That divergence gets caught by the transcript manager. We have a handful of unit tests that exercise replay (not a lot, but still more than the ones that check the activity hash), and this test is also one of those. So it was doubly sensitive.

The fix, until we figure out the root cause behind #3240, is to find all unit tests that are (sensitive to transcript divergence OR sensitive to activity hash) and change them all to either 1: use config.defaultManagerType = 'xsnap', or 2: change the tests to use test.serial.

For this case, I'm going to change test-activityhash-vs-start.js to use test.serial.

We don't understand the root cause, but our `gcAndFinalize()` doesn't always work when 1: run on Node.js (not xsnap) and 2: AVA allows tests to run in parallel. The problem happens somewhat more frequently on 14.x, at least in CI. These two tests exercise transcript replay (which needs GC to happen the same way for both the original delivery and the replay), and compare the activityHash (which, of course, is sensitive to all syscalls made, including GC syscalls). They sometimes failed in CI. We don't fully understand why gcAndFinalize doesn't work, but serializing the tests (with test.serial) seems to address the problem. refs #3240 closes #4617

kriskowal added the bug Something isn't working label Feb 21, 2022

kriskowal assigned warner Feb 21, 2022

michaelfig added the SwingSet package: SwingSet label Feb 22, 2022

warner mentioned this issue Feb 22, 2022

gcAndFinalize might be insufficient on Node.js in test-transcript.js #3240

Open

warner mentioned this issue Feb 23, 2022

test(swingset): use test.serial on GC-sensitive test #4644

Merged

mergify bot closed this as completed in #4644 Feb 23, 2022

Tartuffo added this to the Mainnet 1 milestone Mar 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nondeterministic SwingSet historical inaccuracy in CI #4617

Nondeterministic SwingSet historical inaccuracy in CI #4617

kriskowal commented Feb 21, 2022

michaelfig commented Feb 22, 2022

warner commented Feb 22, 2022

warner commented Feb 22, 2022

warner commented Feb 22, 2022

warner commented Feb 23, 2022

Nondeterministic SwingSet historical inaccuracy in CI #4617

Nondeterministic SwingSet historical inaccuracy in CI #4617

Comments

kriskowal commented Feb 21, 2022

Describe the bug

michaelfig commented Feb 22, 2022

warner commented Feb 22, 2022

warner commented Feb 22, 2022

warner commented Feb 22, 2022

warner commented Feb 23, 2022