-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dns controller rs fails to run pod after upgrade from 1.5 to 1.6.2 #2594
Comments
I think this is a duplicate |
It starts off similar... two other tickets seem to be about creating the config map, and redeploying weave. I've done that, and the cluster validates. However, the dns-controller replica set can't deploy it's pod due to taints (I think). Kops get ig returns this by the way:
Is there a taint I need to apply? |
Thanks to @justinsb 's deep diagnostic session, we found a solution. It looks like the 1.6 toleration got written, but then got overwritten by something (possibly an HA master still on 1.5).
will have a missing tolerance section in the spec section (not the 1.5 annotation - that'll still be there). The solution was to run
and simply removing the line that has the dns-cotnroller annotation. Once that's done, it self heals, and dns-controller starts running as expected. |
I am going to reopen this to see if we can recreate. |
This issue is being tracked upstream kubernetes/kubernetes#46073 |
Just ran into this as well! I rolled my public topology cluster from 1.5.4 to 1.6.4 on friday, and all went quite smoothly. Yesterday I noticed two of my three masters had regular pods scheduled on them, and didn't have their taints properly set up. Stopping them caused the ASG to kick in with a new instance, which seemed fine. This had as a cause however that DNS controller got into the state described above. Removing the dns-controller add-on annotation indeed caused it to self heal. Edit: spoke too soon, something fishy is still going on. DNS-controller spun up a new replica set, but the dashboard does not have picked up on it, and only shows the old, empty RS as "New Replica Set" on the deployment page... |
Issues go stale after 90d of inactivity. Prevent issues from auto-closing with an If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
I ran kops upgraded cluster with a new kops, and at first things didn't work as expected. I replaced the weave-net stuff, created the kube dns config map (based on other reported issues), and nearly everything works. kube ui is available, not all my apps are running. However, digging deeper, I see that the dns controller rs can't find nodes to run its single pod on. Running kops edit ig [ig-name-for-a-master] shows me this:
Running describe for a master node shows me the following:
the dns controller rs has the following description:
I tried to kubectl replace with the file https://raw.githubusercontent.com/kubernetes/kops/release-1.6/upup/models/cloudup/resources/addons/dns-controller.addons.k8s.io/k8s-1.6.yaml.template , but it's failing stating
The text was updated successfully, but these errors were encountered: