Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

💚 cluster should have healthy time synchronization #988

Merged
merged 1 commit into from
Oct 28, 2020

Conversation

mboersma
Copy link
Contributor

@mboersma mboersma commented Oct 9, 2020

What this PR does / why we need it:

Tests each workload cluster node created in e2e tests to see if it has healthy time synchronization, as defined by the output of these commands on the host:

$ systemctl is-active chronyd
$ chronyc tracking

Which issue(s) this PR fixes:

Fixes #705

Special notes for your reviewer:

This reuses the SSH proxy code from #976. I'll refactor that one to put those funcs in the same place in helpers.go, but the two PRs will probably conflict such that one will need to be rebased, just FYI.

TODOs:

  • squashed commits
  • includes documentation
  • adds unit tests

Release note:

💚 cluster should have healthy time synchronization

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Oct 9, 2020
@k8s-ci-robot k8s-ci-robot added area/provider/azure Issues or PRs related to azure provider sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 9, 2020
@mboersma
Copy link
Contributor Author

mboersma commented Oct 9, 2020

Looks good:

...
STEP: verifying EnableAcceleratedNetworking for the primary NIC of each VM
STEP: checking that time synchronization is healthy on capz-e2e-xcr8ld-control-plane-82jxd
STEP: checking that time synchronization is healthy on capz-e2e-xcr8ld-control-plane-9qnq9
STEP: checking that time synchronization is healthy on capz-e2e-xcr8ld-control-plane-dr2dm
STEP: checking that time synchronization is healthy on capz-e2e-xcr8ld-md-0-68c44c47fc-kdcsk
STEP: checking that time synchronization is healthy on capz-e2e-xcr8ld-md-0-68c44c47fc-rp97d
STEP: Dumping all the Cluster API resources in the "create-workload-cluster-k4sytl" namespace
...

go.mod Outdated Show resolved Hide resolved
go.mod Outdated Show resolved Hide resolved
test/e2e/helpers.go Outdated Show resolved Hide resolved
test/e2e/azure_test.go Show resolved Hide resolved
@CecileRobertMichon
Copy link
Contributor

/kind other

@k8s-ci-robot
Copy link
Contributor

@CecileRobertMichon: The label(s) kind/other cannot be applied, because the repository doesn't have them

In response to this:

/kind other

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@mboersma
Copy link
Contributor Author

I'll rebase this after #976 merges. I also have a change to run all the SSH sessions for the whole cluster concurrently, rather than by node, which should make this marginally faster.

@mboersma mboersma force-pushed the healthy-time-sink branch 2 times, most recently from 80e9791 to d6ece1d Compare October 13, 2020 23:12
test/e2e/helpers.go Outdated Show resolved Hide resolved
@nader-ziada
Copy link
Contributor

/test pull-cluster-api-provider-azure-e2e

@@ -284,15 +285,31 @@ func logCheckpoint(specTimes map[string]time.Time) {
}
}

// getMachinesInCluster returns a list of all machines in the given cluster.
// This is copied from CAPI's test/framework/cluster_proxy.go.
func getMachinesInCluster(ctx context.Context, c framework.Lister, namespace, name string) (*clusterv1.MachineList, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about machine pools? Do we want to add that here too @devigned ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably. There should now be similar funcs for MachinePools.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added support here for MachinePools and created a helper function to collect all the SSH info for the nodes in a cluster, whether they are control plane or agent, VM or VMSS.

One thing I'm curious about is that we anticipated needing port 50001 for VMSS instances, but in practice it seems to be good old port 22 everywhere...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe port 50001 was an AKS Engine thing? Although I would expect the VMSS node pools to go through port 22 for the first control plane ssh but only control plane VMSS nodes (not supported currently) to require a different port.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 20, 2020
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 28, 2020
@mboersma
Copy link
Contributor Author

This test passed on the VMSS cluster as well as the VM-based clusters:

Workload cluster creation Creating a VMSS cluster 
  with a single control plane node and an AzureMachinePool with 2 nodes
  /home/prow/go/src/sigs.k8s.io/cluster-api-provider-azure/test/e2e/azure_test.go:234
INFO: "with a single control plane node and an AzureMachinePool with 2 nodes" started at Wed, 28 Oct 2020 
...
INFO: Waiting for the machine deployments to be provisioned
INFO: Waiting for the machine pools to be provisioned
STEP: waiting for the machine pool workload nodes to exist
STEP: checking that time synchronization is healthy on capz-e2e-va72nj-control-plane-5fbtc
STEP: checking that time synchronization is healthy on capz-e2e-va72nj-mp-0000002
STEP: checking that time synchronization is healthy on capz-e2e-va72nj-mp-0000003
...

@CecileRobertMichon
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 28, 2020
@nader-ziada
Copy link
Contributor

/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: nader-ziada

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 28, 2020
@k8s-ci-robot k8s-ci-robot merged commit 254746e into kubernetes-sigs:master Oct 28, 2020
@k8s-ci-robot k8s-ci-robot added this to the v0.4.10 milestone Oct 28, 2020
@mboersma mboersma deleted the healthy-time-sink branch October 28, 2020 18:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/provider/azure Issues or PRs related to azure provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[e2e] cluster should have healthy time synchronization
5 participants