Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🌱 ClusterClass & test/framework: consider replicas for control plane readiness #7914

Conversation

sbueringer
Copy link
Member

@sbueringer sbueringer commented Jan 12, 2023

What this PR does / why we need it:
This is a follow-up to #7833. While triaging further we found out that:

  • the topology controller was not waiting until KCP was entirely stable before triggering a rollout of the control plane
  • the test framework wasn't waiting until KCP was entirely stable
    • in the upgrade test this means the upgrade is triggered while KCP is still not stable from the create

Thx @fabriziopandini for collaborating on debugging this stuff! :)

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Follow-up to: #7833

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jan 12, 2023
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jan 12, 2023
@sbueringer sbueringer changed the title ClusterClass & test/framework: consider replicas for control plane readiness 🌱 ClusterClass & test/framework: consider replicas for control plane readiness Jan 12, 2023
@sbueringer sbueringer force-pushed the pr-cc-improve-kcp-readiness-checks branch from 4a4dfc9 to 1d1f58a Compare January 12, 2023 18:32
@sbueringer
Copy link
Member Author

/test pull-cluster-api-e2e-full-main

updatedReplicas != *desiredReplicas ||
readyReplicas != *desiredReplicas ||
unavailableReplicas > 0 {
return false, nil
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do this because:

  • KCP is only stable if all of those conditions are fulfilled
  • By waiting for the same conditions in the test code and in the topology controller we can ensure that setting etcdImageTag and the version in Cluster.spec.topology (in the cluster upgrade test) doesn't trigger a double rollout.
    This is because when both the version and the variable are set the topology controller will roll them out at the same time and not hold back the version because KCP is not stable.

@sbueringer sbueringer force-pushed the pr-cc-improve-kcp-readiness-checks branch from 1d1f58a to 213ae0c Compare January 12, 2023 19:15
@sbueringer
Copy link
Member Author

cc @ykakarap

@sbueringer sbueringer force-pushed the pr-cc-improve-kcp-readiness-checks branch from 213ae0c to b8729c3 Compare January 12, 2023 19:19
@sbueringer
Copy link
Member Author

/test pull-cluster-api-e2e-full-main

@sbueringer
Copy link
Member Author

/test ?

@k8s-ci-robot
Copy link
Contributor

@sbueringer: The following commands are available to trigger required jobs:

  • /test pull-cluster-api-build-main
  • /test pull-cluster-api-e2e-main
  • /test pull-cluster-api-test-main
  • /test pull-cluster-api-test-mink8s-main
  • /test pull-cluster-api-verify-main

The following commands are available to trigger optional jobs:

  • /test pull-cluster-api-apidiff-main
  • /test pull-cluster-api-e2e-full-main
  • /test pull-cluster-api-e2e-informing-ipv6-main
  • /test pull-cluster-api-e2e-informing-main
  • /test pull-cluster-api-e2e-workload-upgrade-1-26-latest-main

Use /test all to run the following jobs that were automatically triggered:

  • pull-cluster-api-apidiff-main
  • pull-cluster-api-build-main
  • pull-cluster-api-e2e-informing-ipv6-main
  • pull-cluster-api-e2e-informing-main
  • pull-cluster-api-e2e-main
  • pull-cluster-api-test-main
  • pull-cluster-api-test-mink8s-main
  • pull-cluster-api-verify-main

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sbueringer
Copy link
Member Author

/test pull-cluster-api-e2e-workload-upgrade-1-26-latest-main

@sbueringer
Copy link
Member Author

Unrelated flake
/retest

@sbueringer
Copy link
Member Author

/retest

@sbueringer
Copy link
Member Author

One more to get more data. But I'll take a closer look what's going on

/retest

@sbueringer sbueringer force-pushed the pr-cc-improve-kcp-readiness-checks branch 2 times, most recently from c34901a to fbe6513 Compare January 13, 2023 13:18
@sbueringer
Copy link
Member Author

/test pull-cluster-api-e2e-workload-upgrade-1-26-latest-main
/test pull-cluster-api-e2e-full-main

@sbueringer
Copy link
Member Author

/retest

@sbueringer
Copy link
Member Author

/retest
Need more data

@sbueringer
Copy link
Member Author

/retest

unrelated flake

@sbueringer sbueringer force-pushed the pr-cc-improve-kcp-readiness-checks branch from fbe6513 to 6c5eb9c Compare January 13, 2023 16:09
@sbueringer
Copy link
Member Author

/test pull-cluster-api-e2e-workload-upgrade-1-26-latest-main
/test pull-cluster-api-e2e-full-main

(pushed a few minor improvements to the e2e tests, but should be unrelated to the failures)

@sbueringer
Copy link
Member Author

Test's green. Let's see how stable it is

/test pull-cluster-api-e2e-workload-upgrade-1-26-latest-main
/test pull-cluster-api-e2e-full-main

@sbueringer
Copy link
Member Author

/test pull-cluster-api-e2e-workload-upgrade-1-26-latest-main
/test pull-cluster-api-e2e-full-main

1 similar comment
@sbueringer
Copy link
Member Author

/test pull-cluster-api-e2e-workload-upgrade-1-26-latest-main
/test pull-cluster-api-e2e-full-main

@sbueringer
Copy link
Member Author

/cherry-pick release-1.3

@k8s-infra-cherrypick-robot

@sbueringer: once the present PR merges, I will cherry-pick it on top of release-1.3 in a new PR and assign it to you.

In response to this:

/cherry-pick release-1.3

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sbueringer
Copy link
Member Author

/cherry-pick release-1.2

@k8s-infra-cherrypick-robot

@sbueringer: once the present PR merges, I will cherry-pick it on top of release-1.2 in a new PR and assign it to you.

In response to this:

/cherry-pick release-1.2

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@fabriziopandini
Copy link
Member

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 16, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: b9f3ea1d18534ba3432a363294f68780fa34de0f

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fabriziopandini

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 16, 2023
@k8s-ci-robot k8s-ci-robot merged commit 6a82514 into kubernetes-sigs:main Jan 16, 2023
@k8s-ci-robot k8s-ci-robot added this to the v1.4 milestone Jan 16, 2023
@k8s-infra-cherrypick-robot

@sbueringer: new pull request created: #7923

In response to this:

/cherry-pick release-1.3

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-infra-cherrypick-robot

@sbueringer: #7914 failed to apply on top of branch "release-1.2":

Applying: ClusterClass: IsScaling now considers unavailableReplicas
Applying: test/framework: WaitForControlPlaneToBeReady now considers replicas
Using index info to reconstruct a base tree...
M	internal/contract/controlplane.go
M	internal/controllers/topology/cluster/desired_state_test.go
M	internal/controllers/topology/cluster/reconcile_state_test.go
M	test/e2e/clusterctl_upgrade.go
M	test/e2e/self_hosted.go
M	test/framework/controlplane_helpers.go
Falling back to patching base and 3-way merge...
Auto-merging test/framework/controlplane_helpers.go
Auto-merging test/e2e/self_hosted.go
CONFLICT (content): Merge conflict in test/e2e/self_hosted.go
Auto-merging test/e2e/clusterctl_upgrade.go
Auto-merging internal/controllers/topology/cluster/reconcile_state_test.go
Auto-merging internal/controllers/topology/cluster/desired_state_test.go
Auto-merging internal/contract/controlplane.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0002 test/framework: WaitForControlPlaneToBeReady now considers replicas
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

In response to this:

/cherry-pick release-1.2

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants