-
Notifications
You must be signed in to change notification settings - Fork 430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
E2E: verify daemonset pods after machines #2950
E2E: verify daemonset pods after machines #2950
Conversation
cc1c839
to
986d711
Compare
/assign @CecileRobertMichon @marosset @jsturtevant |
986d711
to
76e94a6
Compare
@CecileRobertMichon do you prefer this updated use of the @marosset this doesn't really address your valid observation. I'm not entirely certain why we wrap this test cases in a func and then invoke the input vars "just in time", rather than just passing them into the func as values. I assume there is some async value mutation that we need to properly track and that these func closures enable. Maybe @nojnhuh has more context? |
yes 100%. Although not sure why we need to pass in a func as param, commented on that. @marosset has a good point, there is a lot of duplication. Perhaps we could refactor our uses of the clusterctl |
Now that we're actually testing for the presence of the calico-node daemonset, it's failing three of the E2E test scenarios 🤯 |
@jackfrancis looking at the test output it's not even finding the daemonset, not failing to wait for pods (that part was passing before). I think that's because you are using the bootstrap cluster proxy to look for the daemonsets instead of the workload cluster where calico is actually installed https://github.com/kubernetes-sigs/cluster-api-provider-azure/pull/2950/files#diff-c95013f3c197cb1edeb15be6787e403afd41244efd8ee819d5cefd24deca32dcR79 |
76e94a6
to
3848b87
Compare
@jsturtevant @marosset see @CecileRobertMichon's comment above, I'm attempting to address that. tl;dr 🤦, not 🤯. |
64398c7
to
6e1f197
Compare
It definitely seems like our VMSS cluster template is failing to launch Windows csi-node-driver pods. See: @marosset @jsturtevant any ideas? |
6e1f197
to
fc39439
Compare
That seems like the same error I was trying to fix in #2947 |
fc39439
to
474e237
Compare
@CecileRobertMichon indeed: #2992 |
474e237
to
dd5de35
Compare
/retest (cluster delete timeout flake) |
@CecileRobertMichon @marosset this is passing tests and ready for another review round |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This lgtm and is equivalent to previous functionality (but better because we're actually waiting for machines now). One thought: have you considered listing all daemonsets and waiting for all of them to to be available instead hardcoding a select few we can about?
If we want to make sure the daemonsets for calico exist we could potentially leave the existing code waiting for them to be available in the install Calico func and then do a general "wait for all daemonsets" post node provisioning
@CecileRobertMichon this generalized approach appears to work fine, you can see the outcomes if you search for "Waiting for all DaemonSet Pods to be Running" in E2E output. |
/test pull-cluster-api-provider-azure-e2e-optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good. Squash?
@mboersma yeah just wanna get one more eyeball on this approach so I can revert if necessary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
1174fe1
to
0dcd63c
Compare
/retest |
/lgtm |
LGTM label has been added. Git tree hash: 98c019c3a8441432643ad24f8f5031f4f3fd7669
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: CecileRobertMichon The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
1 similar comment
/retest |
/retest |
/retest |
what's up with all the flakes 👀 anything related to this change? |
I think that's it! |
@CecileRobertMichon sorry, not related to this change, I've seen every one of those flakes on other PRs :( |
What type of PR is this?
/kind failing-test
What this PR does / why we need it:
This PR moves the "validate expected daemonset pods" after "verify control plane is ready", because at that point in the cluster creation flow machines are not actually online, and so daemonset pods are never going to be scheduled.
Instead, we move daemonset pod validation after
ApplyClusterTemplateAndWait
, which waits for control plane and worker machines. At this point in the E2E flow we will have all of our expected number of machines verified as Ready, at which point we can check for the expected daemonset pods.Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #2933
Special notes for your reviewer:
Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.
TODOs:
Release note: