Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🌱 Add timeout to check if KCP object exists #5889

Merged
merged 1 commit into from
Jan 3, 2022

Conversation

namnx228
Copy link
Contributor

In e2e test, when deploying a workload cluster, after applying the template of KCP, if the KCP object doesn't exist yet, then the test will fail. It can be a bit problematic when the test runs in a system with high latency, and KCP object needs more time to be created.
This PR moves the KCP object existence check into an Eventually assertion, so the test will retry to get the KCP object if it fails to do it in the first times.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Dec 27, 2021
@k8s-ci-robot
Copy link
Contributor

Hi @namnx228. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Dec 27, 2021
@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Dec 27, 2021
@namnx228
Copy link
Contributor Author

/assign @sbueringer
/cc @fabriziopandini

@namnx228 namnx228 force-pushed the add-kcp-timeout-e2e-nam branch 2 times, most recently from 2877e8f to c911a4f Compare December 27, 2021 15:08
@@ -228,19 +271,8 @@ func DiscoveryAndWaitForControlPlaneInitialized(ctx context.Context, input Disco
Expect(input.Lister).ToNot(BeNil(), "Invalid argument. input.Lister can't be nil when calling DiscoveryAndWaitForControlPlaneInitialized")
Expect(input.Cluster).ToNot(BeNil(), "Invalid argument. input.Cluster can't be nil when calling DiscoveryAndWaitForControlPlaneInitialized")

controlPlane := GetKubeadmControlPlaneByCluster(ctx, GetKubeadmControlPlaneByClusterInput{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after applying the template of KCP, if the KCP object doesn't exist yet, then the test will fail. It can be a bit problematic when the test runs in a system with high latency, and KCP object needs more time to be created.

If I got this right, this is just a problem of a stale cache/slow API server.
In that case, I would just fix it by adding an eventually inside GetKubeadmControlPlaneByCluster waiting for 2-3 sec; also, worth noticing that running tests in environments with resource limits impacting APIserver/etcd performances from one side it is insightful, from the other introduces a lot of variants that makes test automation complex.

Copy link
Member

@sbueringer sbueringer Jan 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fabriziopandini I think it's not necessarily the stale client-side Client used in the e2e test.

As afaik the ClusterLabel is set by the cluster controller async it could be that the controller hasn't reconciled the control plane yet and thus GetKubeadmControlPlaneByCluster isn't able to get KCP based on the cluster label.

(I'm aware that this PR is already merged and it's fine as is for me too, was just looking at it because of the justification for the cherry-pick PRs :))

}

// WaitForOneKubeadmControlPlane will wait until the KCP object is initialized and all control plane machines have node refs.
func WaitForOneKubeadmControlPlane(ctx context.Context, input WaitForOneKubeadmControlPlaneInput, intervals ...interface{}) *controlplanev1.KubeadmControlPlane {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WaitForOneKubeadmControlPlane is kind of confusing for me, given that there is always only one KCP for each cluster; also, the func description states that we are waiting for all control plane machines to have node refs, but we are not testing count against KCP.replicas.

Given the problem statement in the PR description, I would suggest a simpler solution described in the previous comment.

@namnx228 namnx228 force-pushed the add-kcp-timeout-e2e-nam branch from c911a4f to aceb966 Compare January 3, 2022 07:53
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 3, 2022
@namnx228
Copy link
Contributor Author

namnx228 commented Jan 3, 2022

Hi @fabriziopandini, thanks for your suggestion. That is a good idea to have a simpler fix for this issue.

@fabriziopandini
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 3, 2022
@fabriziopandini
Copy link
Member

/test pull-cluster-api-e2e-full-main

@namnx228 namnx228 force-pushed the add-kcp-timeout-e2e-nam branch from aceb966 to c67b056 Compare January 3, 2022 10:02
@namnx228
Copy link
Contributor Author

namnx228 commented Jan 3, 2022

/test pull-cluster-api-e2e-full-main

@fabriziopandini
Copy link
Member

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 3, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fabriziopandini

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 3, 2022
@k8s-ci-robot k8s-ci-robot merged commit 181d890 into kubernetes-sigs:main Jan 3, 2022
@k8s-ci-robot k8s-ci-robot added this to the v1.1 milestone Jan 3, 2022
@namnx228
Copy link
Contributor Author

namnx228 commented Jan 5, 2022

/cherry-pick release-0.4

@k8s-infra-cherrypick-robot

@namnx228: only kubernetes-sigs org members may request cherry picks. You can still do the cherry-pick manually.

In response to this:

/cherry-pick release-0.4

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants