You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While launching new Kubernetes clusters, we occasionally get into a state where leader election of kube-scheduler and kube-controller-manager get into a deadlock state. After investigating, we believe the source of this problem to be data inconsistency between etcd, akin to "split brain".
The following shows the k8s leader election key inside the etcd db. Notice the renewTime value, which is inconsistent on only one of the three members of the quorum -- the etcd leader.
$ etcdctl get /registry/services/endpoints/kube-system/kube-scheduler --endpoints (done per node)
The leader node, vm-etcd-aaacrowtherkube81-1 in this case, has the value "renewTime":"2020-03-31T22:15:05Z", which is stuck in the past. For context, the primary kube-scheduler process will update this etcd key every 15 seconds to prove that it is still healthy and alive. While the value of this renewTime string gets updated every 15 seconds on both etcd-0 and etcd-2 nodes, on etcd-1 the value appears to be stuck in the past (an hour ago in this case, which is when this cluster was launched).
Apparent Problem: Even though a commit is made to etcd, the leader node fails to actually update its values while the other two etcd nodes do properly update their values.
Reproducibility: Unfortunately this is very difficult to do. We only notice this error in our CI environment, which launches a test cluster and runs automated tests against it on every new PR. Maybe 1 out of every 20 clusters launches exhibit this failure type. We were finally able to reproduce this on a developer's test cluster, hence opening this ticket with details now. We will keep this cluster around for at least a week to aid in debugging, so feel free to request debugging info.
Etcd version: v3.4.3 though this has been noticed intermittently for months on v3.3.15
There are three known issues #11651#11689#11613 which can cause data inconsistency. Do you enable auth? Is your etcd cluster version consistent all the time? Did you upgrade your etcd cluster? Did you run defrag command? If not,you can compare which keys are missing from each node. etcd log has very little useful information when data is inconsistent, you have to replace a special debug version.
How do you reproduce this inconsistency? Can you provide detailed operation information? As you said, it is really difficult to reproduce the inconsistency, but we can try to run chaos monkey in the test environment to reproduce it? We have to find the first command that caused the inconsistency. thanks. @jcrowthe
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.
While launching new Kubernetes clusters, we occasionally get into a state where leader election of
kube-scheduler
andkube-controller-manager
get into a deadlock state. After investigating, we believe the source of this problem to be data inconsistency between etcd, akin to "split brain".The following shows the k8s leader election key inside the etcd db. Notice the
renewTime
value, which is inconsistent on only one of the three members of the quorum -- the etcd leader.$ etcdctl get /registry/services/endpoints/kube-system/kube-scheduler --endpoints (done per node)
The leader node,
vm-etcd-aaacrowtherkube81-1
in this case, has the value"renewTime":"2020-03-31T22:15:05Z",
which is stuck in the past. For context, the primarykube-scheduler
process will update this etcd key every 15 seconds to prove that it is still healthy and alive. While the value of this renewTime string gets updated every 15 seconds on both etcd-0 and etcd-2 nodes, on etcd-1 the value appears to be stuck in the past (an hour ago in this case, which is when this cluster was launched).Apparent Problem: Even though a commit is made to etcd, the leader node fails to actually update its values while the other two etcd nodes do properly update their values.
Context:
Full endpoint status output:
$ etcdctl endpoint status --write-out="json"
Inconsistent revision count:
$ etcdctl endpoint status --write-out="json" | jq -r '.[].Status.header.revision'
Environment:
v3.4.3
though this has been noticed intermittently for months onv3.3.15
vm-etcd-aaacrowtherkube81-0.log
vm-etcd-aaacrowtherkube81-1.log
vm-etcd-aaacrowtherkube81-2.log
The text was updated successfully, but these errors were encountered: