capz-conformance is failing to start the control plane occasionally #1370

jsturtevant · 2021-05-07T00:58:03Z

/kind bug
/kind flake

What steps did you take and what happened:
[A clear and concise description of what the bug is.]

https://testgrid.k8s.io/provider-azure-master-signal#capz-conformance is failing to start the control plane:

INFO: Creating the workload cluster with name "capz-conf-lxkyn6" using the "conformance-ci-artifacts" template (Kubernetes v1.22.0-alpha.1.178+d5691f754f1812, 1 control-plane machines, 2 worker machines)
INFO: Getting the cluster template yaml
INFO: clusterctl config cluster capz-conf-lxkyn6 --infrastructure (default) --kubernetes-version v1.22.0-alpha.1.178+d5691f754f1812 --control-plane-machine-count 1 --worker-machine-count 2 --flavor conformance-ci-artifacts
INFO: Applying the cluster template yaml to the cluster
INFO: Waiting for the cluster infrastructure to be provisioned
ï¿½[1mSTEPï¿½[0m: Waiting for cluster to enter the provisioned phase
INFO: Waiting for control plane to be initialized
INFO: Waiting for the first control plane machine managed by capz-conf-lxkyn6/capz-conf-lxkyn6-control-plane to be provisioned
ï¿½[1mSTEPï¿½[0m: Waiting for one control plane node to exist
[AfterEach] Conformance Tests
  /home/prow/go/src/sigs.k8s.io/cluster-api-provider-azure/test/e2e/conformance_test.go:139
ï¿½[1mSTEPï¿½[0m: Unable to dump workload cluster logs as the cluster is nil

What did you expect to happen:

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

A few examples:

https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/capz-conformance-master/1390439872413569024
https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/capz-conformance-master/1390303576898670592
https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/capz-conformance-master/1389668383770808320

Environment:

cluster-api-provider-azure version:
Kubernetes version: (use kubectl version):
OS (e.g. from /etc/os-release):

The text was updated successfully, but these errors were encountered:

chewong · 2021-06-07T21:37:31Z

#1419 will add the ability to upload machine boot log even if the provision failed. It should give us more insight into why it failed.

jsturtevant · 2021-06-14T22:28:11Z

We are still seeing this error. Some more logs on where this is happening:

https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/capz-conformance-1-21/1404544474662572032
https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/capz-conformance-master/1404456323445166080

�[1mSTEP�[0m: Waiting for one control plane node to exist
[AfterEach] Conformance Tests
  /home/prow/go/src/sigs.k8s.io/cluster-api-provider-azure/test/e2e/conformance_test.go:172
�[1mSTEP�[0m: Unable to dump workload cluster logs as the cluster is nil
�[1mSTEP�[0m: Dumping all the Cluster API resources in the "capz-conf-gxhzo3" namespace
�[1mSTEP�[0m: Deleting all clusters in the capz-conf-gxhzo3 namespace
�[1mSTEP�[0m: Deleting cluster capz-conf-gxhzo3
INFO: Waiting for the Cluster capz-conf-gxhzo3/capz-conf-gxhzo3 to be deleted
�[1mSTEP�[0m: Waiting for cluster capz-conf-gxhzo3 to be deleted
�[1mSTEP�[0m: Deleting namespace used for hosting the "conformance-tests" test spec
INFO: Deleting namespace capz-conf-gxhzo3
�[1mSTEP�[0m: Redacting sensitive information from logs

�[91m�[1m• Failure [1664.426 seconds]�[0m
Conformance Tests
�[90m/home/prow/go/src/sigs.k8s.io/cluster-api-provider-azure/test/e2e/conformance_test.go:45�[0m
  �[91m�[1mconformance-tests [Measurement]�[0m
  �[90m/home/prow/go/src/sigs.k8s.io/cluster-api-provider-azure/test/e2e/conformance_test.go:79�[0m

  �[91mTimed out after 1200.000s.
  Expected
      <bool>: false
  to be true�[0m

  /home/prow/go/pkg/mod/sigs.k8s.io/cluster-api/[email protected]/framework/controlplane_helpers.go:145

  �[91mFull Stack Trace�[0m
  sigs.k8s.io/cluster-api/test/framework.WaitForOneKubeadmControlPlaneMachineToExist(0x23b2940, 0xc0000640c0, 0x7f0ad038aeb8, 0xc001395180, 0xc00050f1e0, 0xc000fb8000, 0xc000428a00, 0x2, 0x2)
  	/home/prow/go/pkg/mod/sigs.k8s.io/cluster-api/[email protected]/framework/controlplane_helpers.go:145 +0x4c5
  sigs.k8s.io/cluster-api/test/framework.DiscoveryAndWaitForControlPlaneInitialized(0x23b2940, 0xc0000640c0, 0x7f0ad038aeb8, 0xc001395180, 0xc00050f1e0, 0xc000428a00, 0x2, 0x2, 0x0)
  	/home/prow/go/pkg/mod/sigs.k8s.io/cluster-api/[email protected]/framework/controlplane_helpers.go:233 +0x5a5

k8s-triage-robot · 2021-09-12T23:00:50Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2021-10-12T23:53:55Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2021-11-12T00:14:56Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen
Mark this issue or PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot · 2021-11-12T00:15:10Z

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen

Mark this issue or PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. kind/flake Categorizes issue or PR as related to a flaky test. labels May 7, 2021

devigned assigned mboersma May 12, 2021

CecileRobertMichon mentioned this issue Jun 14, 2021

ensure cluster is populated during a failed conformance test #1446

Merged

3 tasks

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 12, 2021

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 12, 2021

k8s-ci-robot closed this as completed Nov 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

capz-conformance is failing to start the control plane occasionally #1370

capz-conformance is failing to start the control plane occasionally #1370

jsturtevant commented May 7, 2021

chewong commented Jun 7, 2021

jsturtevant commented Jun 14, 2021

k8s-triage-robot commented Sep 12, 2021

k8s-triage-robot commented Oct 12, 2021

k8s-triage-robot commented Nov 12, 2021

k8s-ci-robot commented Nov 12, 2021

capz-conformance is failing to start the control plane occasionally #1370

capz-conformance is failing to start the control plane occasionally #1370

Comments

jsturtevant commented May 7, 2021

chewong commented Jun 7, 2021

jsturtevant commented Jun 14, 2021

k8s-triage-robot commented Sep 12, 2021

k8s-triage-robot commented Oct 12, 2021

k8s-triage-robot commented Nov 12, 2021

k8s-ci-robot commented Nov 12, 2021