-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SIGSEGV on startup of nomad client since 0.5.3 #2256
Comments
We have the same when upgraded nomad to 0.5.3, without node-drain:
after we have done some cleanup (stop jobs that was placed on upgraded node, also we have made |
Hey sorry this happened 👎 to recover can you delete the clients data_dir and bring it back up. |
We will make sure 0.5.4 allows an in-place upgrade path for those who would like to wait! |
Alex, may i make conclusion, that make |
Potentially not. The client has some state files in the data_dir that it tries to restore from. In 0.5.3 we introduced new fields in that state_file and the upgrade isn't being handled properly it seems. So I suggest you |
Repro'd in like 30s using Nomad 0.5.2 and 0.5.3 binaries with the example.nomad Redis job. Very embarrassed I let this slip in. Fix coming. |
Hey @schmichael , any idea when EDIT: it's there now! |
@schmichael we love you anyway! |
@holtwilkins It would have been up sooner, but this was only the second time I've driven a release and was pretty slow at it. Thanks for your patience! |
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
Nomad version
Output from
nomad version
Nomad v0.5.3
64bit, tried both the LXC and non-LXC versions.
Operating system and Environment details
Host: Ubuntu Xenial, running in a LXC container.
Docker for Jobs:
Issue
After updating to 0.5.3, nomad agent crashes on startup if started in client mode.
There seems to be some kind of correlation between the presence and absence of jobs on that node (I just upgraded without draining and ended up with a useless cluster). I’ve attached those two types of crashes.
Currently I’m unable to get one of my nodes back up. :( The others for some reason are working again.
Let me know if you need any more intel or if you know have any hints on how to resolve this…
Reproduction steps
Start it.
Nomad Server logs (if appropriate)
n/a, server works fine.
Nomad Client logs (if appropriate)
No jobs, clean docker
Jobs present, still running inside Docker
The text was updated successfully, but these errors were encountered: