-
Notifications
You must be signed in to change notification settings - Fork 283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve restarter logic #717
Conversation
One simple test for recovery after a single "crash" + one test of handling "infinitely" crashing kernels, i.e. that the restart limit is respected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for spending time on this Vidar. The tests look really useful as well.
I'm happy to help with the provisioner-based changes if you like.
This is mainly due to the weakness of the previous method: `is_alive` only tests that the kernel process is alive, it does not indicate that the kernel has successfully completed startup. To solve this correctly, we would need to wait for a kernel info reply, but it is not necessarily appropriate to start a kernel client + channels in the restarter. Therefore, we use a "has been alive continuously for X time" as a heuristic for a stable start up.
cba77a0
to
7ed3379
Compare
The ubuntu 3.6 test was hung up (after 5 hours) so I cancelled and restarted the workflow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - thank you @vidartf. I think the test intermittencies are unrelated.
Resolves #715.
There is a small change in the behavior of
KernelRestarter.poll()
and the async variant that should be considered whether it is appropriate (or if a better alternative exists). I.e. 2fabf92 . The other parts should be uncontroversial.