-
Notifications
You must be signed in to change notification settings - Fork 558
kube-dns & kubernetes-dashboard pods crashing on Kubernetes 1.9 #2100
Comments
Dashboard is depending on kube-dns, so should fix kube-dns first. Could you check kube-dns status first? E.g.
|
Here you are:
|
@odauby From kube-dns logs, 10.0.0.1:443 i/o timeout. This is the kubernetes services used for connecting kube-apiserver. Could you verify whether it is health? e.g.
|
Here you are:
From the node itself:
|
I think I'm affected by the same issue, but I might be muddling this issue a bit because I appear to only get the problem using nonstandard dockerengineversion. E.g. the following apimodel gives me a working 1 master 1 node k82 1.9.1 ubuntu cluster on ACS-engine 0.12.2 (and .1):
Whereas the following gives the exact same symptoms mentioned by odauby:
The apiserver appears fine in both cases, but does log instances of this:
Similarly |
Tested stopping the VMs and booting them in order - same result as above when nodes come back to life. Didn't expect this to have any effect, but worth a shot. Also attempted to deploy unspecified docker version (so 1.12.6) vs specified (still 17.05.*) with networkpolicy explicitly set to "none" instead of the implicit "azure" from above, yielding the same results:
@odauby are you getting this on docker 1.12, not just with 17.* relases? I was hoping I'd find acs-engine 0.12.0 to have 17.* as the default for k8s 1.9.1 to explain the differences we observe in behaviour, but it appears that is not the case. |
@feiskyer yep, that's it! I added ACCEPT, deleted the pods to make them restart faster, and now they're connecting properly, leaving me with a working cluster. Is this something I can fix via acs-engine while deploying, or will I have to keep doing this manually? I suppose defaulting to ACCEPT isn't the most ideal solution to the problem, but guessing this is a k8s issue more than an acs-engine issue. |
@feiskyer Does not help for me: since I did not set any
|
@oivindoh Kubernetes community has already added FORWARD rules in kube-proxy (kubernetes/kubernetes#52569). We need to figure out why it is not working on Azure. @odauby While I couldn't repro the problem with docker v1.12. Did recreating kube-dns pod help in your case? e.g. # new pods will be created automatically after this.
kubectl delete pod kube-dns-v20-55498dbf49-rpzhb kube-dns-v20-55498dbf49-zpgz4 |
They get recreated and then Crash loop again
I also Azure-reallocated the master and azgent nodes, no improvement. |
@odauby Seems there is something wrong with Pod networking. As you confirmed above, the node itself could access |
@feiskyer no, I did not. This is a vanilla acs-engine deploy:
Where : |
@odauby Could you upgrade to latest acs-engine and try again? |
Just noticed kubernetes/kubernetes#52569 only fixes problem for NodePort services, we still need to enable FORWARD. |
@feiskyer just tried with latest acs-engine, no improvement.
api-model:
Deployment :
where: Outcome:
|
Just tried with Kubernetes 1.8 by updating this line in the api-model:
Same CrashLoopBackOff issue on the kube-dns pods. So it does not seem to be Kubernetes 1.9 related IMHO. |
@feiskyer just noticed that if I remove the Windows machine from my api-model, Could it be that for some odd reason the
|
Actually I couldn't repro the problem with or without windows nodes. I have verified api-model from both #2100 (comment) and #2100 (comment). @odauby have you joined kubernetes slack? If so, I can help to check what's wrong tomorrow. |
For anyone looking, #2174 fixes the issue if you use networkPolicy azure. |
Yes. If you want to use networkPolicy azure, please use #2174 . It should be merged soon. |
Yes, indeed, the principle didn't have "Contributor" role on the resource group. |
Thank you @jackfrancis, I used v0.13.0 and all pods are running good, but we still have the internal windows DNS problem, windows containers can't resolve hostnames and can't access Internet. I've seen some quick fixes :
Are theses changes already merged to master ? any new release that will include that ? |
@magnock Why do you consider using microsoft/windowsservercore:1709_KB4074588 a quick fix? |
Is this a request for help?:
NO
Is this an ISSUE or FEATURE REQUEST? (choose one):
ISSUE
What version of acs-engine?:
Version: v0.12.0
GitCommit: 1d33229
GitTreeState: clean
Orchestrator and version (e.g. Kubernetes, DC/OS, Swarm)
Kubernetes 1.9
What happened:
When deploying hybrid (Linux/Windows) Kubernetes cluster,
kube-dns
andkubernetes-dashboard
pods land inCrashLoopBackOff
state:This prevents me from using Kubernetes dashboard.
Service discovery within the cluster is impossible.
What you expected to happen:
kube-dns and kubernetes-dashboard to be in Running state.
How to reproduce it (as minimally and precisely as possible):
acs-engine emplate:
Anything else we need to know:
The text was updated successfully, but these errors were encountered: