-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
One of three nodes didn't delete object which had been deleted by kubernetes #9251
Comments
@learnMachining, thanks for the details last node has larger DB I believe you are hitting #8009 ref: #7116. I will try to reproduce this, could you post the logfile for |
@hexfusion Thanks for reply. The log of
|
I have read the issues #8009 and #7116, I don't think the problem are the same. The node To find more information, I create an object with
and get this object from all the three nodes respectively and that works normally. Then I delete this object with etcdctl and get it from all of the three nodes again. But this time, no nodes return values. So all of them have deleted the object. What's more, I investigate logs of the three nodes and find something strange:
From the above, I guess the problem is relevant to kubernetes. Maybe this is normal behavior for a etcd cluster for kubernetes? But how kubernetes only delete objects from two of the three nodes? I am very confused. |
Now my etcd cluster run normally. I stop all nodes and backup /var/lib/etcd for node After syncing, I redo the create ns test above. And this time all the three nodes delete the namespace created by kubectl. And all the three nodes compact every five minutes. It seems normal. I don't actually know what happened, but it works well till now. |
@learnMachining If you look at header from get response, the revision in 3rd node is quite different, which we need to figure out. How did you operate this etcd cluster? Any maintenance works? |
@gyuho I did nothing to this cluster. All is default. K8s uses etcd v3, calico and flannel use etcd v2. After installation, etcd, k8s, calico and flannel worked as their own way. And I found the inconsistency yesterday. |
@learnMachining Do you have all server logs from beginning (from revision 1) that you can share? |
@gyuho I'm afraid not. I cleared logs some times. But this is my own experiment environment. So I can give you all the logs I still have. Hope they are helpful. |
Also the db file from all 3 nodes? |
@gyuho I backup db files of node I'm sorry for loosing the backup data. Please close this issue. I will reopen it if I encounter simmilar problems or find more useful information. |
@learnMachining Yeah there seems like too many components... but the revision mismatch should not happen. Please keep all the logs or just book-keep all the operations you do with etcd cluster, in case this happens again. And use latest etcd, if you can. Let's close this for now, and reopen when it happens again. Thanks. |
I have a three nodes etcd cluster. It is the data store of my kubernetes cluster. But I found inconsistency of the etcd cluster today. The state of my etcd cluster is:
I get all keys from the three nodes respectively, node 192.168.139.50 and node 192.168.139.51 are totally same. However, node 192.168.139.52 has some objects that I have deleted with kubectl.
To figure out the inconsistency, I run
kubectl create ns etcd-test
. Then get the object from three nodes, the command and results are:Later I delete this namespace using
kubectl delete ns etcd-test
. Then run the same get command, the results are:It is obvious that node 192.168.139.53 didn't delete the object at all. I know there is a quorum mechanism in etcd cluster. But I am not sure if this is right.
What's more, according to https://coreos.com/etcd/docs/latest/faq.html, all request to etcd cluster followers will be forwarded to the leader. Why I got different results from different endpoints?
[configuration]
node 192.168.139.50
node 192.168.139.51
node 192.168.139.52
[version]
etcd
kubernetes
The text was updated successfully, but these errors were encountered: