-
Notifications
You must be signed in to change notification settings - Fork 560
openshift: tracker for etcd timeout issues #2918
Comments
https://circleci.com/gh/Azure/acs-engine/25995
|
@jim-minter I'm not sure how ETCD_ELECTION_TIMEOUT would help in this situation. Isn't it for leader election issues (coupled with heartbeat interval)? It seems like we need to wait for a better ready state or have retry mechanisms in place. Since this looks like it's in Ansible I'm thinking ready state in the extensions is maybe a better route. Maybe we can poach |
See https://github.com/coreos/etcd/blob/master/Documentation/faq.md#why-does-etcd-lose-its-leader-from-disk-latency-spikes and https://coreos.com/etcd/docs/latest/tuning.html I'm going to experiment with ETCD_ELECTION_TIMEOUT and see if that helps us as a starting point. |
You suspect disk latency? Let's see what happens when you set it, but it seems we're going to want to try and figure out the actual cause. |
openshift/origin#16248 is relevant but sadly not conclusive. |
/kind flake |
Hrm, we should probably switch to use SSD anyway https://github.com/coreos/etcd/blob/master/Documentation/op-guide/hardware.md#disks |
No description provided.
The text was updated successfully, but these errors were encountered: