generalize helm install during E2E testing #2264

jackfrancis · 2022-04-25T21:47:30Z

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

This PR follows up from #2209

Here we generalize some of the "install helm chart" foo so that it can be composed in different ways to accommodate different test scenarios.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

TODOs:

squashed commits
includes documentation
adds unit tests

Release note:

NONE

jackfrancis · 2022-04-25T22:03:44Z

/test pull-cluster-api-provider-azure-e2e-optional
/test pull-cluster-api-provider-azure-e2e-exp

CecileRobertMichon

@jsturtevant @marosset I wonder if we could use this for metrics-server and other addons we're currently installing via ClusterResourceSet for windows e2e tests?

jsturtevant · 2022-04-26T16:12:12Z

Looks like the metric server has a chart. Some of the window specific things like kube-proxy/calico don't have charts.

jackfrancis · 2022-04-26T17:51:02Z

How much customization are we doing for metrics-server, kube-proxy, and calico? Depending upon the amount of "capz optimizations" that we're performing in these yaml specs it may be appropriate to host our own "capz-tested" charts for such components in this repo. And there's little work to maintain a canonical repo like we do here:

https://github.com/kubernetes-sigs/cloud-provider-azure/tree/master/helm

jsturtevant · 2022-04-26T18:06:09Z

We can likely use the metric-server chart. For Windows kube-proxy/calico the yaml is from https://github.com/kubernetes-sigs/sig-windows-tools/tree/master/hostprocess. That would be the place where we could add those charts. The long term goal is for sig-windows to not host the calico and kube-proxy images so I am hesitant to put much effort in to create charts. If capz is moving away from CRS completely shorter term then I guess we might need too as it is still a ways out before we can get those images to proper owner repositories do to changes needed in the HostProcess containers implementation.

test/e2e/helpers.go

CecileRobertMichon · 2022-04-27T14:58:50Z

test/e2e/helpers.go

+	}, input.WaitForControlPlaneIntervals...).Should(Succeed())
+}
+
+func WaitForRunningNodes(ctx context.Context, input clusterctl.ApplyClusterTemplateAndWaitInput) {


same here and for running pods below. As much as possible we should share e2e logic with the other providers in the CAPI framework instead of implementing our own logic.

I was able to rely upon pre-existing cluster-api conveniences for a part of this, but there is the "ensure n pods are Running" that I had to implement here.

jackfrancis · 2022-04-27T23:05:27Z

/test pull-cluster-api-provider-azure-e2e-optional
/test pull-cluster-api-provider-azure-e2e-exp

jackfrancis · 2022-05-03T18:15:22Z

@CecileRobertMichon this should be ready for a review round

CecileRobertMichon · 2022-05-03T21:04:25Z

test/e2e/cloud-provider-azure.go

+			LabelSelector: cloudNodeManagerPodLabel,
+			Namespace:     "kube-system",
+		},
+		Condition: podListHasNumPods(int(to.Int64(input.ConfigCluster.ControlPlaneMachineCount) + to.Int64(input.ConfigCluster.WorkerMachineCount))),


does this also count also need to add windows worker nodes?

I think what we're saying is that we need to multiply the input.ConfigCluster.WorkerMachineCount by the number of pools in the cluster.

The current OOT template only has a single node pool, which is why this is working.

I don't think we have a way to determine the number of pools from the input to this func, but because we enter into this after all the control plane nodes and machines are online with node references, we should be able to count the number of nodes in the cluster, right?

windows is its own variable so we should be able to at least add that to the count (even if it's zero for now, that will make it more future proof)

test/e2e/helpers.go

jackfrancis · 2022-05-03T23:56:40Z

test/e2e/cloud-provider-azure.go

+			LabelSelector: cloudNodeManagerPodLabel,
+			Namespace:     "kube-system",
+		},
+		Condition: podListHasNumPods(len(machineList.Items)),


@CecileRobertMichon I think counting the machines in the cluster is probably the best approach for this

what about machine pools? Should we count nodes instead?

We should probably count both Machines and MachinePoolMachines, right? (a cluster can exist with both types, I assume)

IMO doing it by number of nodes is more future-proof. There's no guarantee CAPI won't add a new type of Machine in the future.

The problem with using nodes is that they are not the authoritative unit in terms of calculating "expected number" outcomes. For example, a node may be offline, or not yet joined, or other states that are not as definitive in terms of gathering a cluster count as compared to number of machines (plus number of machinepoolmachines).

So, if we use nodes then we'd probably want to change the pod list criteria to >= to account for the fact that a new node may have joined after we calculated num nodes, but before we gathered a list of cloud-node-manager pods.

It would be nice to establish a resilient (hopefully simple to understand and maintain) pattern here, as I do think being able to both validate the helm install and validate expected outcomes of that helm install is valuable.

I see. In that case thoughts about adding helper functions to the CAPI test framework that return the number of expected control plane nodes and the number of expected worker nodes? Control planes can be the total number of control plane replicas of the control plane object (or 0 if the control plane does not implement replicas https://cluster-api.sigs.k8s.io/developer/architecture/controllers/control-plane.html?highlight=spec#required-spec-fields-for-implementations-using-replicas). Worker nodes can be the total of machines + machine pool replicas.

It's a bit more resilient to abstract what a worker machine can be and have one central place to make that assumption than having to update every place in our CAPZ tests where we make that assumption later if a new type of Machine is added.

Also worth checking if something like this doesn't already exist in CAPI, it's very possible it does.

I checked and capi E2E doesn't have a convenience for this (there are things sort of similar, with lots of buried Expect() funcs that are in the process of being cleaned up, but not for this purpose explicitly). How about this as a way of getting this merged ASAP.

We verify the existence of the cloud-node-manager daemonset

We wait until all of its replicas are Ready

jackfrancis · 2022-05-06T00:07:43Z

/test pull-cluster-api-provider-azure-e2e-optional

jackfrancis · 2022-05-06T19:35:23Z

/test pull-cluster-api-provider-azure-e2e-optional

jackfrancis · 2022-05-17T20:21:45Z

/test pull-cluster-api-provider-azure-e2e-optional

jackfrancis · 2022-05-17T21:04:39Z

/test pull-cluster-api-provider-azure-e2e-optional

netlify · 2022-05-20T17:08:27Z

👷 Deploy request for kubernetes-sigs-cluster-api-provider-azure pending review.

Visit the deploys page to approve it

Name	Link
🔨 Latest commit	`23666ad`

jackfrancis · 2022-05-20T17:10:00Z

test/e2e/helpers.go

+}
+
+// WaitForDaemonset retries during E2E until a daemonset's pods are all Running
+func WaitForDaemonset(ctx context.Context, input clusterctl.ApplyClusterTemplateAndWaitInput, cl client.Client, name, namespace string) {


@CecileRobertMichon this is the alternative solution to validate the cloud-node-manager pods. Rather than comparing against number of machines, let's just make sure the daemonset pods are all running. There are several reasons out of our control why the number of nodes running these pods may not be easily predictable. Rather than try to deal with all of those edge cases, we can simply rely upon the daemonset API to pick the appropriate schedulable nodes, and ensure that the pods reach a Running state.

jackfrancis · 2022-05-20T17:10:22Z

/test pull-cluster-api-provider-azure-e2e-optional

CecileRobertMichon

Thanks @jackfrancis, looks great

/lgtm
/assign @Jont828

CecileRobertMichon · 2022-05-20T17:26:24Z

@jackfrancis we should now be able to switch over Calico as well. I believe we just need a bit of kustomization in the values.yaml to make it work for Azure (mainly VXLAN encapsulation)

https://kubernetes.slack.com/archives/CEX9HENG7/p1652473459598119

jackfrancis · 2022-05-20T18:52:24Z

/retest

jackfrancis · 2022-05-20T20:49:32Z

/retest

Jont828 · 2022-05-24T20:40:35Z

This looks great! Was useful for getting us off the ground with HelmChartProxy as well!

/lgtm

CecileRobertMichon · 2022-05-24T20:55:26Z

/approve

k8s-ci-robot · 2022-05-24T20:55:47Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: CecileRobertMichon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [CecileRobertMichon]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

jackfrancis · 2022-05-24T22:25:18Z

/retest

k8s-ci-robot requested review from alexeldeib and mboersma April 25, 2022 21:47

k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Apr 25, 2022

CecileRobertMichon reviewed Apr 26, 2022

View reviewed changes

CecileRobertMichon reviewed Apr 27, 2022

View reviewed changes

test/e2e/helpers.go Outdated Show resolved Hide resolved

CecileRobertMichon reviewed Apr 27, 2022

View reviewed changes

jackfrancis force-pushed the helm-general branch 2 times, most recently from 97224aa to b5b5528 Compare April 27, 2022 22:52

CecileRobertMichon reviewed May 3, 2022

View reviewed changes

test/e2e/helpers.go Show resolved Hide resolved

jackfrancis force-pushed the helm-general branch from b5b5528 to 3b13524 Compare May 3, 2022 23:55

jackfrancis commented May 3, 2022

View reviewed changes

jackfrancis force-pushed the helm-general branch from 3b13524 to 9612614 Compare May 6, 2022 00:01

jackfrancis force-pushed the helm-general branch from 9612614 to 88bb76f Compare May 6, 2022 19:20

jackfrancis force-pushed the helm-general branch from 88bb76f to 55da681 Compare May 16, 2022 22:45

CecileRobertMichon mentioned this pull request May 17, 2022

Add E2E test for cluster class #2235

Merged

3 tasks

jackfrancis force-pushed the helm-general branch from 55da681 to b2f0201 Compare May 17, 2022 21:02

generalize helm install during E2E testing

23666ad

jackfrancis force-pushed the helm-general branch from b2f0201 to 23666ad Compare May 20, 2022 17:08

jackfrancis commented May 20, 2022

View reviewed changes

CecileRobertMichon reviewed May 20, 2022

View reviewed changes

k8s-ci-robot assigned Jont828 and CecileRobertMichon May 20, 2022

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 20, 2022

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 24, 2022

k8s-ci-robot merged commit 7d5e541 into kubernetes-sigs:main May 24, 2022

k8s-ci-robot added this to the v1.4 milestone May 24, 2022

jackfrancis deleted the helm-general branch December 9, 2022 22:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

generalize helm install during E2E testing #2264

generalize helm install during E2E testing #2264

jackfrancis commented Apr 25, 2022 •

edited

Loading

jackfrancis commented Apr 25, 2022

CecileRobertMichon left a comment

jsturtevant commented Apr 26, 2022

jackfrancis commented Apr 26, 2022

jsturtevant commented Apr 26, 2022

CecileRobertMichon Apr 27, 2022

jackfrancis Apr 27, 2022

jackfrancis commented Apr 27, 2022

jackfrancis commented May 3, 2022

CecileRobertMichon May 3, 2022

jackfrancis May 3, 2022

CecileRobertMichon May 3, 2022

jackfrancis May 3, 2022

CecileRobertMichon May 9, 2022

jackfrancis May 9, 2022

CecileRobertMichon May 9, 2022

jackfrancis May 9, 2022

CecileRobertMichon May 9, 2022

jackfrancis May 16, 2022

jackfrancis commented May 6, 2022

jackfrancis commented May 6, 2022

jackfrancis commented May 17, 2022

jackfrancis commented May 17, 2022

netlify bot commented May 20, 2022

jackfrancis May 20, 2022

jackfrancis commented May 20, 2022

CecileRobertMichon left a comment

CecileRobertMichon commented May 20, 2022

jackfrancis commented May 20, 2022

jackfrancis commented May 20, 2022

Jont828 commented May 24, 2022

CecileRobertMichon commented May 24, 2022

k8s-ci-robot commented May 24, 2022

jackfrancis commented May 24, 2022

generalize helm install during E2E testing #2264

generalize helm install during E2E testing #2264

Conversation

jackfrancis commented Apr 25, 2022 • edited Loading

jackfrancis commented Apr 25, 2022

CecileRobertMichon left a comment

Choose a reason for hiding this comment

jsturtevant commented Apr 26, 2022

jackfrancis commented Apr 26, 2022

jsturtevant commented Apr 26, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jackfrancis commented Apr 27, 2022

jackfrancis commented May 3, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jackfrancis commented May 6, 2022

jackfrancis commented May 6, 2022

jackfrancis commented May 17, 2022

jackfrancis commented May 17, 2022

netlify bot commented May 20, 2022

👷 Deploy request for kubernetes-sigs-cluster-api-provider-azure pending review.

Choose a reason for hiding this comment

jackfrancis commented May 20, 2022

CecileRobertMichon left a comment

Choose a reason for hiding this comment

CecileRobertMichon commented May 20, 2022

jackfrancis commented May 20, 2022

jackfrancis commented May 20, 2022

Jont828 commented May 24, 2022

CecileRobertMichon commented May 24, 2022

k8s-ci-robot commented May 24, 2022

jackfrancis commented May 24, 2022

jackfrancis commented Apr 25, 2022 •

edited

Loading