Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wait for provisioning state in AKS e2e tests #4255

Merged
merged 1 commit into from
Nov 14, 2023

Conversation

nojnhuh
Copy link
Contributor

@nojnhuh nojnhuh commented Nov 11, 2023

What type of PR is this?
/kind flake

What this PR does / why we need it:

Many of the e2e test specs for managed clusters make some update to a CAPZ resource and then wait for that change to be reflected in Azure. Until now, this wait hasn't taken into account the provisioningState of the resource in Azure, which changes from "Updating" to "Succeeded" when the operation is done. Anecdotally, I tend to observe updates in the Azure API to apply a noticeable amount of time before operations finish. I have a hypothesis that the current behavior is causing the tests to progress too quickly and cause issues like #3955 and #4069 (comment).

This PR inserts checks that the provisioning state of the AKS resources is Succeeded alongside the existing checks that the other relevant API fields have been updated to ensure the cluster is in a successful state and not actively being updated before moving on.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #3955

Special notes for your reviewer:

  • cherry-pick candidate

TODOs:

  • squashed commits
  • includes documentation
  • adds unit tests

Release note:

NONE

@@ -65,6 +66,7 @@ func AKSUpgradeSpec(ctx context.Context, inputGetter func() AKSUpgradeSpecInput)
Eventually(func(g Gomega) {
resp, err := managedClustersClient.Get(ctx, infraControlPlane.Spec.ResourceGroupName, infraControlPlane.Name, nil)
g.Expect(err).NotTo(HaveOccurred())
g.Expect(resp.Properties.ProvisioningState).To(Equal(ptr.To("Succeeded")))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI @willie-yao I think this overlaps with a change you added to #4155.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like my change is accomplishing the same thing although the provisioning state is coming from the managed cluster itself rather than the response properties. Is the behavior any different?

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/flake Categorizes issue or PR as related to a flaky test. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Nov 11, 2023
Copy link

codecov bot commented Nov 11, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (8a09a42) 58.16% compared to head (201101f) 58.16%.
Report is 16 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #4255   +/-   ##
=======================================
  Coverage   58.16%   58.16%           
=======================================
  Files         187      187           
  Lines       19351    19351           
=======================================
  Hits        11256    11256           
  Misses       7457     7457           
  Partials      638      638           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@nojnhuh
Copy link
Contributor Author

nojnhuh commented Nov 12, 2023

/test pull-cluster-api-provider-azure-conformance-custom-builds
/test pull-cluster-api-provider-azure-e2e-aks

@nojnhuh
Copy link
Contributor Author

nojnhuh commented Nov 12, 2023

/test pull-cluster-api-provider-azure-e2e-aks

Flake hunting 🔎

@nojnhuh
Copy link
Contributor Author

nojnhuh commented Nov 12, 2023

/test pull-cluster-api-provider-azure-e2e-aks

1 similar comment
@nojnhuh
Copy link
Contributor Author

nojnhuh commented Nov 12, 2023

/test pull-cluster-api-provider-azure-e2e-aks

Copy link
Contributor

@CecileRobertMichon CecileRobertMichon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 13, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 59448553444bc6247ef8041b3ef87fcc40eab376

@willie-yao
Copy link
Contributor

/lgtm

Copy link
Contributor

@mboersma mboersma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mboersma

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 14, 2023
@mboersma
Copy link
Contributor

/cherry-pick release-1.11
/cherry-pick release-1.10

@k8s-infra-cherrypick-robot

@mboersma: once the present PR merges, I will cherry-pick it on top of release-1.11 in a new PR and assign it to you.

In response to this:

/cherry-pick release-1.11
/cherry-pick release-1.10

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot merged commit fc261f0 into kubernetes-sigs:main Nov 14, 2023
28 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.12 milestone Nov 14, 2023
@k8s-infra-cherrypick-robot

@mboersma: #4255 failed to apply on top of branch "release-1.11":

Applying: wait for provisioning state in AKS e2e tests
Using index info to reconstruct a base tree...
M	test/e2e/aks_autoscaler.go
M	test/e2e/aks_azure_cluster_autoscaler.go
M	test/e2e/aks_node_labels.go
M	test/e2e/aks_node_taints.go
M	test/e2e/aks_tags.go
M	test/e2e/aks_upgrade.go
Falling back to patching base and 3-way merge...
Auto-merging test/e2e/aks_upgrade.go
CONFLICT (content): Merge conflict in test/e2e/aks_upgrade.go
Auto-merging test/e2e/aks_tags.go
CONFLICT (content): Merge conflict in test/e2e/aks_tags.go
Auto-merging test/e2e/aks_node_taints.go
CONFLICT (content): Merge conflict in test/e2e/aks_node_taints.go
Auto-merging test/e2e/aks_node_labels.go
Auto-merging test/e2e/aks_azure_cluster_autoscaler.go
Auto-merging test/e2e/aks_autoscaler.go
CONFLICT (content): Merge conflict in test/e2e/aks_autoscaler.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 wait for provisioning state in AKS e2e tests
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

In response to this:

/cherry-pick release-1.11
/cherry-pick release-1.10

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-infra-cherrypick-robot

@mboersma: #4255 failed to apply on top of branch "release-1.10":

Applying: wait for provisioning state in AKS e2e tests
Using index info to reconstruct a base tree...
M	test/e2e/aks_autoscaler.go
M	test/e2e/aks_azure_cluster_autoscaler.go
M	test/e2e/aks_node_labels.go
M	test/e2e/aks_node_taints.go
M	test/e2e/aks_tags.go
M	test/e2e/aks_upgrade.go
Falling back to patching base and 3-way merge...
Auto-merging test/e2e/aks_upgrade.go
CONFLICT (content): Merge conflict in test/e2e/aks_upgrade.go
Auto-merging test/e2e/aks_tags.go
CONFLICT (content): Merge conflict in test/e2e/aks_tags.go
Auto-merging test/e2e/aks_node_taints.go
CONFLICT (content): Merge conflict in test/e2e/aks_node_taints.go
Auto-merging test/e2e/aks_node_labels.go
Auto-merging test/e2e/aks_azure_cluster_autoscaler.go
CONFLICT (content): Merge conflict in test/e2e/aks_azure_cluster_autoscaler.go
Auto-merging test/e2e/aks_autoscaler.go
CONFLICT (content): Merge conflict in test/e2e/aks_autoscaler.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 wait for provisioning state in AKS e2e tests
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

In response to this:

/cherry-pick release-1.11
/cherry-pick release-1.10

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@nojnhuh nojnhuh deleted the aks-e2e-provstate branch November 22, 2023 02:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/flake Categorizes issue or PR as related to a flaky test. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

AKS e2e test flakes in the tags tests with a "not found" error
6 participants