💚 cluster should have healthy time synchronization #988

mboersma · 2020-10-09T19:05:55Z

What this PR does / why we need it:

Tests each workload cluster node created in e2e tests to see if it has healthy time synchronization, as defined by the output of these commands on the host:

$ systemctl is-active chronyd
$ chronyc tracking

Which issue(s) this PR fixes:

Fixes #705

Special notes for your reviewer:

This reuses the SSH proxy code from #976. ~~I'll refactor that one to put those funcs in the same place in helpers.go, but the two PRs will probably conflict such that one will need to be rebased, just FYI.~~

TODOs:

squashed commits
includes documentation
adds unit tests

Release note:

💚 cluster should have healthy time synchronization

mboersma · 2020-10-09T21:31:11Z

Looks good:

...
STEP: verifying EnableAcceleratedNetworking for the primary NIC of each VM
STEP: checking that time synchronization is healthy on capz-e2e-xcr8ld-control-plane-82jxd
STEP: checking that time synchronization is healthy on capz-e2e-xcr8ld-control-plane-9qnq9
STEP: checking that time synchronization is healthy on capz-e2e-xcr8ld-control-plane-dr2dm
STEP: checking that time synchronization is healthy on capz-e2e-xcr8ld-md-0-68c44c47fc-kdcsk
STEP: checking that time synchronization is healthy on capz-e2e-xcr8ld-md-0-68c44c47fc-rp97d
STEP: Dumping all the Cluster API resources in the "create-workload-cluster-k4sytl" namespace
...

test/e2e/azure_test.go

go.mod

test/e2e/helpers.go

test/e2e/azure_test.go

CecileRobertMichon · 2020-10-13T16:18:52Z

/kind other

k8s-ci-robot · 2020-10-13T16:18:53Z

@CecileRobertMichon: The label(s) kind/other cannot be applied, because the repository doesn't have them

In response to this:

/kind other

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

mboersma · 2020-10-13T22:04:36Z

I'll rebase this after #976 merges. I also have a change to run all the SSH sessions for the whole cluster concurrently, rather than by node, which should make this marginally faster.

test/e2e/helpers.go

nader-ziada · 2020-10-14T12:30:40Z

/test pull-cluster-api-provider-azure-e2e

test/e2e/helpers.go

test/e2e/azure_timesync.go

CecileRobertMichon · 2020-10-15T21:29:47Z

test/e2e/helpers.go

@@ -284,15 +285,31 @@ func logCheckpoint(specTimes map[string]time.Time) {
 	}
 }

+// getMachinesInCluster returns a list of all machines in the given cluster.
+// This is copied from CAPI's test/framework/cluster_proxy.go.
+func getMachinesInCluster(ctx context.Context, c framework.Lister, namespace, name string) (*clusterv1.MachineList, error) {


what about machine pools? Do we want to add that here too @devigned ?

Probably. There should now be similar funcs for MachinePools.

I added support here for MachinePools and created a helper function to collect all the SSH info for the nodes in a cluster, whether they are control plane or agent, VM or VMSS.

One thing I'm curious about is that we anticipated needing port 50001 for VMSS instances, but in practice it seems to be good old port 22 everywhere...

maybe port 50001 was an AKS Engine thing? Although I would expect the VMSS node pools to go through port 22 for the first control plane ssh but only control plane VMSS nodes (not supported currently) to require a different port.

mboersma · 2020-10-28T17:19:32Z

This test passed on the VMSS cluster as well as the VM-based clusters:

Workload cluster creation Creating a VMSS cluster 
  with a single control plane node and an AzureMachinePool with 2 nodes
  /home/prow/go/src/sigs.k8s.io/cluster-api-provider-azure/test/e2e/azure_test.go:234
INFO: "with a single control plane node and an AzureMachinePool with 2 nodes" started at Wed, 28 Oct 2020 
...
INFO: Waiting for the machine deployments to be provisioned
INFO: Waiting for the machine pools to be provisioned
STEP: waiting for the machine pool workload nodes to exist
STEP: checking that time synchronization is healthy on capz-e2e-va72nj-control-plane-5fbtc
STEP: checking that time synchronization is healthy on capz-e2e-va72nj-mp-0000002
STEP: checking that time synchronization is healthy on capz-e2e-va72nj-mp-0000003
...

CecileRobertMichon · 2020-10-28T17:26:26Z

/lgtm

nader-ziada · 2020-10-28T18:03:06Z

/lgtm
/approve

k8s-ci-robot · 2020-10-28T18:03:15Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: nader-ziada

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [nader-ziada]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Oct 9, 2020

k8s-ci-robot requested review from CecileRobertMichon and cpanato October 9, 2020 19:06

k8s-ci-robot added area/provider/azure Issues or PRs related to azure provider sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 9, 2020

mboersma mentioned this pull request Oct 12, 2020

💚 collect workload cluster logs in e2e runs #976

Merged

3 tasks

CecileRobertMichon reviewed Oct 12, 2020

View reviewed changes

test/e2e/azure_test.go Show resolved Hide resolved

nader-ziada reviewed Oct 13, 2020

View reviewed changes

go.mod Outdated Show resolved Hide resolved

go.mod Outdated Show resolved Hide resolved

test/e2e/helpers.go Outdated Show resolved Hide resolved

test/e2e/azure_test.go Show resolved Hide resolved

mboersma force-pushed the healthy-time-sink branch 2 times, most recently from 80e9791 to d6ece1d Compare October 13, 2020 23:12

mboersma commented Oct 13, 2020

View reviewed changes

test/e2e/helpers.go Outdated Show resolved Hide resolved

CecileRobertMichon reviewed Oct 14, 2020

View reviewed changes

test/e2e/helpers.go Outdated Show resolved Hide resolved

mboersma force-pushed the healthy-time-sink branch from c24704c to a1276e0 Compare October 14, 2020 15:51

CecileRobertMichon reviewed Oct 14, 2020

View reviewed changes

test/e2e/helpers.go Outdated Show resolved Hide resolved

CecileRobertMichon reviewed Oct 14, 2020

View reviewed changes

test/e2e/azure_timesync.go Outdated Show resolved Hide resolved

mboersma force-pushed the healthy-time-sink branch from a1276e0 to 5206aaf Compare October 14, 2020 21:39

CecileRobertMichon reviewed Oct 15, 2020

View reviewed changes

test/e2e/azure_timesync.go Show resolved Hide resolved

CecileRobertMichon reviewed Oct 15, 2020

View reviewed changes

test/e2e/azure_timesync.go Show resolved Hide resolved

CecileRobertMichon reviewed Oct 15, 2020

View reviewed changes

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 20, 2020

💚 cluster should have healthy time synchronization

14a716d

mboersma force-pushed the healthy-time-sink branch from 5206aaf to 14a716d Compare October 28, 2020 15:55

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 28, 2020

k8s-ci-robot assigned CecileRobertMichon Oct 28, 2020

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 28, 2020

k8s-ci-robot assigned nader-ziada Oct 28, 2020

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 28, 2020

k8s-ci-robot merged commit 254746e into kubernetes-sigs:master Oct 28, 2020

k8s-ci-robot added this to the v0.4.10 milestone Oct 28, 2020

mboersma deleted the healthy-time-sink branch October 28, 2020 18:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

💚 cluster should have healthy time synchronization #988

💚 cluster should have healthy time synchronization #988

mboersma commented Oct 9, 2020 •

edited

Loading

mboersma commented Oct 9, 2020

CecileRobertMichon commented Oct 13, 2020

k8s-ci-robot commented Oct 13, 2020

mboersma commented Oct 13, 2020

nader-ziada commented Oct 14, 2020

CecileRobertMichon Oct 15, 2020

devigned Oct 15, 2020

mboersma Oct 28, 2020

CecileRobertMichon Oct 28, 2020

mboersma commented Oct 28, 2020

CecileRobertMichon commented Oct 28, 2020

nader-ziada commented Oct 28, 2020

k8s-ci-robot commented Oct 28, 2020

💚 cluster should have healthy time synchronization #988

💚 cluster should have healthy time synchronization #988

Conversation

mboersma commented Oct 9, 2020 • edited Loading

mboersma commented Oct 9, 2020

CecileRobertMichon commented Oct 13, 2020

k8s-ci-robot commented Oct 13, 2020

mboersma commented Oct 13, 2020

nader-ziada commented Oct 14, 2020

CecileRobertMichon Oct 15, 2020

Choose a reason for hiding this comment

devigned Oct 15, 2020

Choose a reason for hiding this comment

mboersma Oct 28, 2020

Choose a reason for hiding this comment

CecileRobertMichon Oct 28, 2020

Choose a reason for hiding this comment

mboersma commented Oct 28, 2020

CecileRobertMichon commented Oct 28, 2020

nader-ziada commented Oct 28, 2020

k8s-ci-robot commented Oct 28, 2020

mboersma commented Oct 9, 2020 •

edited

Loading