Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"anachrophobia" replay error on restart #68

Closed
warner opened this issue Nov 4, 2019 · 1 comment · Fixed by #666
Closed

"anachrophobia" replay error on restart #68

warner opened this issue Nov 4, 2019 · 1 comment · Fixed by #666
Labels
SwingSet package: SwingSet

Comments

@warner
Copy link
Member

warner commented Nov 4, 2019

We've noticed occasional errors when restarting a swingset (generally a cosmic-swingset ag-solo node) in which some vat is trying to replay the transcript but is making different syscalls than it did the previous time. @michaelfig and I came up with a hypothesis that the "command device" (which is listening for WebSocket connections) is becoming active too soon: an inbound connection is established (causing a message to be added to the run queue, for delivery to some handler vat) while it's in the middle of replaying the transcript. This causes a different set of messages to arrive at that vat than did in the previous run, which then triggers the anachrophobia error.

The fix will be to defer building the WebSocket listener until we've finished replaying the transcripts. I think that means waiting for the Promise that comes back from buildVatController() to fire. I know there's a kernel.start() that returns the Promise that fires after all transcripts have been replayed, but now I'm not sure whether that's exposed as a controller.start() or if it's folded into buildVatController().

We also need to change the way we react to this error: the entire kernel should be terminated. In the current code, an error is raised within the Vat Manager, but a higher caller is catching/logging/continuing because the same pathway might be triggered by ordinary vat code throwing an exception. The Vat Manager needs to set a flag that says "abandon ship" and not try to limp along anyways.

@warner warner transferred this issue from Agoric/SwingSet Dec 1, 2019
@warner warner added the SwingSet package: SwingSet label Dec 1, 2019
dckc pushed a commit to dckc/agoric-sdk that referenced this issue Dec 5, 2019
a remake of Agoric#68, just to streamline the merging
dckc pushed a commit to dckc/agoric-sdk that referenced this issue Dec 5, 2019
(rebased from original due to conflict)

closes Agoric#68
dckc pushed a commit to dckc/agoric-sdk that referenced this issue Dec 5, 2019
Describe handoff service and canvasStatePublisher for the Pixel demo …
@michaelfig
Copy link
Member

I've fixed the ordering of the kernel restart (replay happens before connections are initialised). Next comes making the abandonShip flag.

michaelfig added a commit that referenced this issue Mar 6, 2020
Closes #68

Now the SwingSet kernel will not proceed if there is anachrophobia.
The cases that cause this still need to be diagnosed, but at least
we won't have easily-overlooked failures.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
SwingSet package: SwingSet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants