-
Notifications
You must be signed in to change notification settings - Fork 31
kube-apiserver bootstrap is unstable #49
Comments
@schu on which infrastructure are you setting up your cluster? I haven't seen this behaviour on OpenStack. |
@afritzler on AWS. |
I have also applied this patch to kubify to get more logs and keep finished containers for debugging:
|
Thanks @schu So at the end the problem is that the kubernetes service in the default namespace has the following session affinity timeout
|
Here is the corresponding default in the code: https://github.com/kubernetes/kubernetes/blob/master/pkg/apis/core/types.go#L2970 |
This issue is a known one and has been fixed with kubernetes/kubernetes#56690 I created 2 backport PRs for the 1.9 and 1.10 release branch in kubernetes. |
@afritzler thanks 👍 |
Well, thank you guys for the findings! |
Ok, the cherry pick for 1.10 has been merged now. For 1.9 and 1.8 are coming up as well. I will close this issue. |
landscape-setup-template users frequently hit an error during cluster setup or end up with an unhealthy cluster where only 2 out of 3 kube-apiserver pods are running. Currently, we know of the following symptoms:
Cluster setup fails early due to etcd operator errors (deploy_kubify.sh fails to deploy etcd #48):
Cluster is unhealty due to kube-controller-manager continuously throwing errors (pod stays running though):
The fact that the error (
dial tcp 10.241.0.1:443: getsockopt: connection refused
) is encountered for all requests looks like a routing error at first: 2 out of 3 apiserver instances are running and reachable after all and we expect the requests to the service IP to be distributed among the set of available pods (i.e. shouldn't 2 out of 3 requests succeed?).This is most likely due to the (default)
sessionAffinity
setting for thedefault/kubernetes
service:When a request from a source IP was routed to a
KUBE-SEP
once, it will be routed there for the next 3 hours (10800 seconds). E.g. if the leading controller-manager pod happens to be routed to the faulty node (w/o kube-apiserver running), all requests will end up there until the timeout is reached. The iptables rules for that look like:By removing the
sessionAffinity
setting from thedefault/kubernetes
service (e.g. withkubectl edit svc kubernetes
), the problem can be fixed for symptom 2 (as described above): controller-manager will eventually hit a healthy apiserver instance and be able go on with its tasks. The missing kube-apiserver pod will be rescheduled shortly after.Noteworthy is that on the faulty master node where kube-apiserver is not running, the checkpoint is also missing (otherwise the pod should be running again shortly after it stopped),
find /etc/kubernetes/ -iname '*api*'
is empty. The checkpointer logs shows the following:Current status:
I don't know yet why this happens, but the root cause seems to be a problem during kube-apiserver bootstrapping. I'll add more info as I find it.
Any ideas? :)
The text was updated successfully, but these errors were encountered: