-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"anachrophobia" replay error on restart #68
Labels
SwingSet
package: SwingSet
Comments
dckc
pushed a commit
to dckc/agoric-sdk
that referenced
this issue
Dec 5, 2019
dckc
pushed a commit
to dckc/agoric-sdk
that referenced
this issue
Dec 5, 2019
(rebased from original due to conflict) closes Agoric#68
dckc
pushed a commit
to dckc/agoric-sdk
that referenced
this issue
Dec 5, 2019
Describe handoff service and canvasStatePublisher for the Pixel demo …
I've fixed the ordering of the kernel restart (replay happens before connections are initialised). Next comes making the |
michaelfig
added a commit
that referenced
this issue
Mar 6, 2020
Closes #68 Now the SwingSet kernel will not proceed if there is anachrophobia. The cases that cause this still need to be diagnosed, but at least we won't have easily-overlooked failures.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
We've noticed occasional errors when restarting a swingset (generally a cosmic-swingset ag-solo node) in which some vat is trying to replay the transcript but is making different syscalls than it did the previous time. @michaelfig and I came up with a hypothesis that the "command device" (which is listening for WebSocket connections) is becoming active too soon: an inbound connection is established (causing a message to be added to the run queue, for delivery to some handler vat) while it's in the middle of replaying the transcript. This causes a different set of messages to arrive at that vat than did in the previous run, which then triggers the anachrophobia error.
The fix will be to defer building the WebSocket listener until we've finished replaying the transcripts. I think that means waiting for the Promise that comes back from
buildVatController()
to fire. I know there's akernel.start()
that returns the Promise that fires after all transcripts have been replayed, but now I'm not sure whether that's exposed as acontroller.start()
or if it's folded intobuildVatController()
.We also need to change the way we react to this error: the entire kernel should be terminated. In the current code, an error is raised within the Vat Manager, but a higher caller is catching/logging/continuing because the same pathway might be triggered by ordinary vat code throwing an exception. The Vat Manager needs to set a flag that says "abandon ship" and not try to limp along anyways.
The text was updated successfully, but these errors were encountered: