e2e nodes don't come online due to etcd timeouts #1359

jsturtevant · 2021-05-03T23:09:11Z

/kind bug
/kind flake

What steps did you take and what happened:
[A clear and concise description of what the bug is.]
The e2e tests fail with:

/home/prow/go/src/sigs.k8s.io/cluster-api-provider-azure/test/e2e/azure_test.go:457
Timed out after 1200.001s.
Expected
    <int>: 0
to equal
    <int>: 1

kubelet logs on Linux control plane nodes have:

May 03 17:10:02.345717 capz-e2e-xxj97f-control-plane-vlsww kubelet[2703]: E0503 17:10:02.345642    2703 upgradeaware.go:387] Error proxying data from backend to client: tls: use of closed connection
May 03 17:10:21.918168 capz-e2e-xxj97f-control-plane-vlsww kubelet[2703]: E0503 17:10:21.918123    2703 upgradeaware.go:387] Error proxying data from backend to client: tls: use of closed connection
May 03 17:11:04.907224 capz-e2e-xxj97f-control-plane-vlsww kubelet[2703]: E0503 17:11:04.907175    2703 upgradeaware.go:387] Error proxying data from backend to client: tls: use of closed connection
May 03 17:12:02.125808 capz-e2e-xxj97f-control-plane-vlsww kubelet[2703]: E0503 17:12:02.125651    2703 upgradeaware.go:373] Error proxying data from client to backend: readfrom tcp 
127.0.0.1:58776->127.0.0.1:40521: read tcp 
10.0.0.4:10250->10.0.0.5:38492: read: connection reset by peer

Linux worker nodes:

May 03 17:19:05.129231 capz-e2e-xxj97f-md-0-7xbxt kubelet[1960]: E0503 17:19:05.129187    1960 kubelet_node_status.go:470] 
Error updating node status, will retry: failed to patch status 
"{\"status\":{\"$setElementOrder/conditions\":[{\"type\":\"NetworkUnavailable\"},{\"type\":\"MemoryPressure\"},{\"type\":\"DiskPressure\"},{\"type\":\"PIDPressure\"},{\"type\":\"Ready\"}],\"conditions\":[{\"lastHeartbeatTime\":\"2021-05-03T17:18:58Z\",\"type\":\"MemoryPressure\"},{\"lastHeartbeatTime\":\"2021-05-03T17:18:58Z\",\"type\":\"DiskPressure\"},{\"lastHeartbeatTime\":\"2021-05-03T17:18:58Z\",\"type\":\"PIDPressure\"},{\"lastHeartbeatTime\":\"2021-05-03T17:18:58Z\",\"type\":\"Ready\"}]}}" for 
node "capz-e2e-xxj97f-md-0-7xbxt": etcdserver: request timed out

Windows cloudbase-init logs:

error execution phase kubelet-start: cannot get Node "capz-e2e-4vgcq":
 etcdserver: leader changed\nTo see the stack trace of this error execute 
with --v=5 or higher\n' execute_user_data_script

What did you expect to happen:

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]
looks similiar to #832

logs: https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/kubernetes-sigs_cluster-api-provider-azure/1351/pull-cluster-api-provider-azure-e2e-windows/1389262675552243712

Environment:

cluster-api-provider-azure version: main branch
Kubernetes version: (use kubectl version): v1.19.7
OS (e.g. from /etc/os-release):

The text was updated successfully, but these errors were encountered:

k8s-triage-robot · 2021-08-01T23:38:57Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

k8s-triage-robot · 2021-09-01T20:43:15Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2021-10-01T21:02:22Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen
Mark this issue or PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot · 2021-10-01T21:02:30Z

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen

Mark this issue or PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. kind/flake Categorizes issue or PR as related to a flaky test. labels May 3, 2021

jsturtevant mentioned this issue May 3, 2021

Enable Windows e2e logging #1351

Merged

3 tasks

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 1, 2021

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 1, 2021

k8s-ci-robot closed this as completed Oct 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

e2e nodes don't come online due to etcd timeouts #1359

e2e nodes don't come online due to etcd timeouts #1359

jsturtevant commented May 3, 2021 •

edited

Loading

k8s-triage-robot commented Aug 1, 2021

k8s-triage-robot commented Sep 1, 2021

k8s-triage-robot commented Oct 1, 2021

k8s-ci-robot commented Oct 1, 2021

e2e nodes don't come online due to etcd timeouts #1359

e2e nodes don't come online due to etcd timeouts #1359

Comments

jsturtevant commented May 3, 2021 • edited Loading

k8s-triage-robot commented Aug 1, 2021

k8s-triage-robot commented Sep 1, 2021

k8s-triage-robot commented Oct 1, 2021

k8s-ci-robot commented Oct 1, 2021

jsturtevant commented May 3, 2021 •

edited

Loading