Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

intermittent CI failure in vat-admin terminate test #6692

Closed
warner opened this issue Dec 17, 2022 · 0 comments · Fixed by #7047
Closed

intermittent CI failure in vat-admin terminate test #6692

warner opened this issue Dec 17, 2022 · 0 comments · Fixed by #7047
Assignees
Labels
bug Something isn't working SwingSet package: SwingSet test vaults_triage DO NOT USE
Milestone

Comments

@warner
Copy link
Member

warner commented Dec 17, 2022

Describe the bug

@gibson042 reported a CI failure in one of the swingset tests that went away when he retried the test:

https://github.com/Agoric/agoric-sdk/actions/runs/3711195860/jobs/6291423608#step:5:1596

in the test-swingset4 (18.x) batch:

anachrophobia strikes vat v2 on delivery 9
delivery completed with 4 expected syscalls remaining
expected: {"0":"vatstoreGet","1":"vom.rc.o-50","length":2}
expected: {"0":"vatstoreGetAfter","1":"","2":"vom.ir.o-50|","length":3}
expected: {"0":"dropImports","1":{"0":"o-50","length":1},"length":2}
expected: {"0":"retireImports","1":{"0":"o-50","length":1},"length":2}
REJECTED from ava test: (Error#126)
Error#126: historical inaccuracy in replay of v2
  at Object.finishReplayDelivery (.../swingset-vat/src/kernel/vat-loader/transcript.js:81:13)
  at replayOneDelivery (.../swingset-vat/src/kernel/vat-loader/manager-helper.js:188:19)
  at async Object.replayTranscript (.../swingset-vat/src/kernel/vat-loader/manager-helper.js:219:1)
  at async ensureVatOnline (.../swingset-vat/src/kernel/vat-warehouse.js:129:1)
  at async Object.start (.../swingset-vat/src/kernel/vat-warehouse.js:177:1)
  at async Object.start (.../swingset-vat/src/kernel/kernel.js:1566:1)
  at async makeSwingsetController (packages/SwingSet/src/controller/controller.js:334:3)
  at async buildVatController (packages/SwingSet/src/controller/controller.js:561:22)
  at async packages/SwingSet/test/vat-admin/terminate/test-terminate.js:416:16

  ✘ [fail]: vat-admin › terminate › terminate › dispatches to the dead do not harm kernel Rejected promise returned by test

─

  vat-admin › terminate › terminate › dispatches to the dead do not harm kernel

  Rejected promise returned by test. Reason:

  Error {
    message: 'historical inaccuracy in replay of v2',
  }

  › Object.finishReplayDelivery (.../swingset-vat/src/kernel/vat-loader/transcript.js:81:13)
  › replayOneDelivery (.../swingset-vat/src/kernel/vat-loader/manager-helper.js:188:19)
  › async Object.replayTranscript (.../swingset-vat/src/kernel/vat-loader/manager-helper.js:219:1)
  › async ensureVatOnline (.../swingset-vat/src/kernel/vat-warehouse.js:129:1)
  › async Object.start (.../swingset-vat/src/kernel/vat-warehouse.js:177:1)
  › async Object.start (.../swingset-vat/src/kernel/kernel.js:1566:1)
  › async makeSwingsetController (packages/SwingSet/src/controller/controller.js:334:3)
  › async buildVatController (packages/SwingSet/src/controller/controller.js:561:22)
  › async packages/SwingSet/test/vat-admin/terminate/test-terminate.js:416:16

  ─

  1 test failed

The core part is the anachrophobia error:

anachrophobia strikes vat v2 on delivery 9
delivery completed with 4 expected syscalls remaining
expected: {"0":"vatstoreGet","1":"vom.rc.o-50","length":2}
expected: {"0":"vatstoreGetAfter","1":"","2":"vom.ir.o-50|","length":3}
expected: {"0":"dropImports","1":{"0":"o-50","length":1},"length":2}
expected: {"0":"retireImports","1":{"0":"o-50","length":1},"length":2}

This test (test-terminate.js) creates a vat, arranges for it to subscribe to a promise, kills it (from the outside, with E(adminNode).terminateWithFailure(err)), then resolves the promise, which enqueues a notify for the now-dead vat. Then it restarts the whole kernel, before the notify is delivered. This ensures that the kernel doesn't panic when it tries to process the notify for a vat that no longer exists.

When the kernel is restarted, the workers must be restarted, which involves replaying the vat transcript. The kernel compares the syscalls made by the new worker (during replay) against the ones recorded in the transcript by the original execution.

The CI failure suggests that the original execution observed a GC finalizer run on a Presence (o-50), which required it to check references counts (vom.rc.o-50) and the recognizer table (all keys under vom.ir.o-50|${recognizer}). It must have seen none, because it then did a dropImports and retireImports.

When I re-run this test locally, v2 is vat-admin.

I'm guessing this is another symptom of GC being inconsistent when running AVA and using a local vat worker (coresident within in the Node.js process, not an xsnap child process). We moved test-terminate-replay.js out to a separate file (in commit 2f3d54b, refs #5266) in the hopes that it would avoid a similar intermittent test failure: perhaps it is time to move the dispatches to the dead do not harm kernel test out to a separate file as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working SwingSet package: SwingSet test vaults_triage DO NOT USE
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants