-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
flake in cmd: etcdserver: request timed out #16248
Comments
Seeing this on a ton of builds, it looks like something is wrong with etcd:
Other calls later fail after exactly 7 seconds, implies some sort of timeout on either end. I think we have an etcd server or client bug. This may be related to the issues seen on us-west-1 (where certain calls time out after a while from the API server, but not directly to etcd). It's possible that this has something to do with the client library getting stuck - perhaps the particular error here causes it to fail and stay stuck? We should check the upstream etcd versions after 3.2.1. Raising priority. |
Also note:
@deads2k in unified master mode leader election failures shouldn't be terminating the process, which is likely making this problem worse. The new "split controller" logic should not be on when leader election is not on. |
There are just few places in the etcd code that returns "request timed out". Most of them seems to be related to leader election and the watch (but this is not watch problem). In v3 server these are basically: |
@smarterclayton my candidates: Other candidate: Except these two I don't see anything related to this issue in release-3.2 branch... |
@smarterclayton also for the 7s timeout:
So this is really leader election broken? Can we tweak ElectionTicks ? (lets experiment with it here: #16429) |
@deads2k said this could be the tmpfs issue as the I wonder if we should add I changed #16429 hardcode the /tmp mount from the host to prove this. |
@stevekuznetsov closing this as the issue was the tmpfs and it was fixed in ci-cd repo. |
https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/15796/test_pull_request_origin_cmd/2378/
/kind test-flake
The text was updated successfully, but these errors were encountered: