-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
etcdserver: request timed out #11809
Comments
it seems that your disk is very slow. can you provide all etcd metrics ? You can refer to the following article to improve performance. @alita91 |
Here are the etcd server metrics:
|
I run the benchmark tool on etcd using the instructions from https://github.com/etcd-io/etcd/blob/master/Documentation/op-guide/performance.md, here are the results:
|
It seems that your disk is very slow for a while, which may be related to the loading image you mentioned. does this etcdserver request timeout error keep appearing?
|
The etcdserver request timeout error keeps appearing whenever I follow the process I mention. After I install the services, I do not see such errors. A solution is for sure to upgrade to SSD, but still, a question comes to my mind, why can't etcdserver work well on slower disks? I'm thinking more at a case when etcdserver notices that he is not able to write on disk, and he can keep the data in memory until he is able to write to disk. I guess something like '--snapshot-count' can be used, but not sure how this will work in my case. Also, why tools like kubectl or helm are not able to recover from such errors, because for me looks like a temporary problem, at least in my case. Thank you for your input, really appreciate your support. |
etcdserver has to write wal log into disk and backend has to commit data into disk periodically. you can learn more from here. https://static.sched.com/hosted_files/kccncosschn19eng/ea/KubeCon%20China%202019_%20Raft%20in%20etcd.pdf |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions. |
Hello, etcd service log: Nov 11 12:50:18 etcd[1234]: got unexpected response error (etcdserver: request timed out) Patroni.yml: bootstrap: Disk Type: SSD |
Hello team,
I recently created a multi-node k8s cluster on-premise, and after some time, I decided to install a couple of services to it. During service installation using kubectl or helm, I started to hit a couple of "etcdserver: request timed out" issues.
After I hit this type of issue, I decided to create another k8s cluster on a single node (1 master, 1 worker), to eliminate a possible network issue, and I started to hit the same type of issue.
What was very interesting is that in the past, with older versions of k8s, etcd, and helm/kubectl, I did not hit this type of issue.
It is important to mention that all the docker images are first loaded to local docker registry, and then pushed to the Docker registry that I have inside the cluster, all those operations are executed on the same node.
After some time, I decided to study a little bit the etcdserver source code, and I noticed two parameters that are used in the timeout formula, "heartbeat interval" and the "election-timeout".
Right now, etcdserver throw a timeout issue after 7.5 seconds and I was able to correlate time with the value calculated from the timeout formula.
Then, I decided to increase the timeout to a value of 15, but I notice that now I get the timeout issue after 15 seconds.
Here are some logs that I receive on a single k8s node:
Is there any newer version that will mitigate this type of issue?
Best regards,
Alex
The text was updated successfully, but these errors were encountered: