-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
network hook fails after client restart w/ non-Docker driver #9750
Comments
Hi @shishir-a412ed! The So I think that error is bubbling up either from |
@tgross Thank you for the quick response. Your analysis is spot on! I added some
The error is indeed coming from Looks like the nomad client has no problems creating the When the nomad + containerd-driver restarts, it tries to make an
I checked the process, and it's running as Also,
Where do you see in the containerd-driver it's being deferred to the Linux default? |
Ok, so good news and bad news. The good news is that I was able to reproduce the behavior with the Jobspec: job "execjob" {
datacenters = ["dc1"]
group "execgroup" {
network {
mode = "bridge"
port "www" {
to = "8000"
}
}
task "exectask" {
driver = "exec"
config {
command = "python"
args = ["-m", "SimpleHTTPServer"]
}
}
}
} Run the job, which works fine. Take a look at the permissions for that netns:
Restart the Nomad client, as root:
And this ends up causing a restart of the task.
I might be missing it, but there's no implementation of I'm going to rename this bug, and we'll dig in further to figure out what's going on here. Edit: interesting, it looks like way back in 0.10.0 I'd tried to solve for not recreate network namespaces: e17901d I suspect either there's a bug there we missed or a regression since then. |
Ok I went thru #6315 and it looks like I introduced a fix for Docker (see e17901d#diff-13af1c2034f8a861c687bbeea321da745d2490f0110857c3a805fb385bcf0804R50-R59) but that the fix was missing what we needed for the default path. I think when we create the file, if we get an error, we should then check for the existence of the file (which means it was previously created), and return |
I've opened #9757 with patch for this. |
That PR is merged and the fix will ship in 1.0.2 |
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
Nomad version
Nomad v0.11.4+ent
Operating system and Environment details
Ubuntu 18.04.5 LTS (Bionic Beaver)
Issue
We are seeing this error when the
containerd-driver
restarts and it tries to reattach to the existing allocation.nomad is unable to attach to the existing allocation and throws this error:
and starts a new allocation.
Reproduction steps
containerd-driver
nomad + containerd-driver
nomad job status <job>
should show a new allocation and a previously failed allocation.nomad alloc status <failed_alloc_id>
should show the above error message.Logs
The text was updated successfully, but these errors were encountered: