-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🌱 Add timeout to check if KCP object exists #5889
🌱 Add timeout to check if KCP object exists #5889
Conversation
Hi @namnx228. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/assign @sbueringer |
2877e8f
to
c911a4f
Compare
@@ -228,19 +271,8 @@ func DiscoveryAndWaitForControlPlaneInitialized(ctx context.Context, input Disco | |||
Expect(input.Lister).ToNot(BeNil(), "Invalid argument. input.Lister can't be nil when calling DiscoveryAndWaitForControlPlaneInitialized") | |||
Expect(input.Cluster).ToNot(BeNil(), "Invalid argument. input.Cluster can't be nil when calling DiscoveryAndWaitForControlPlaneInitialized") | |||
|
|||
controlPlane := GetKubeadmControlPlaneByCluster(ctx, GetKubeadmControlPlaneByClusterInput{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after applying the template of KCP, if the KCP object doesn't exist yet, then the test will fail. It can be a bit problematic when the test runs in a system with high latency, and KCP object needs more time to be created.
If I got this right, this is just a problem of a stale cache/slow API server.
In that case, I would just fix it by adding an eventually inside GetKubeadmControlPlaneByCluster waiting for 2-3 sec; also, worth noticing that running tests in environments with resource limits impacting APIserver/etcd performances from one side it is insightful, from the other introduces a lot of variants that makes test automation complex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fabriziopandini I think it's not necessarily the stale client-side Client used in the e2e test.
As afaik the ClusterLabel is set by the cluster controller async it could be that the controller hasn't reconciled the control plane yet and thus GetKubeadmControlPlaneByCluster isn't able to get KCP based on the cluster label.
(I'm aware that this PR is already merged and it's fine as is for me too, was just looking at it because of the justification for the cherry-pick PRs :))
} | ||
|
||
// WaitForOneKubeadmControlPlane will wait until the KCP object is initialized and all control plane machines have node refs. | ||
func WaitForOneKubeadmControlPlane(ctx context.Context, input WaitForOneKubeadmControlPlaneInput, intervals ...interface{}) *controlplanev1.KubeadmControlPlane { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WaitForOneKubeadmControlPlane is kind of confusing for me, given that there is always only one KCP for each cluster; also, the func description states that we are waiting for all control plane machines to have node refs, but we are not testing count against KCP.replicas.
Given the problem statement in the PR description, I would suggest a simpler solution described in the previous comment.
c911a4f
to
aceb966
Compare
Hi @fabriziopandini, thanks for your suggestion. That is a good idea to have a simpler fix for this issue. |
/ok-to-test |
/test pull-cluster-api-e2e-full-main |
aceb966
to
c67b056
Compare
/test pull-cluster-api-e2e-full-main |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: fabriziopandini The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/cherry-pick release-0.4 |
@namnx228: only kubernetes-sigs org members may request cherry picks. You can still do the cherry-pick manually. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
In e2e test, when deploying a workload cluster, after applying the template of KCP, if the KCP object doesn't exist yet, then the test will fail. It can be a bit problematic when the test runs in a system with high latency, and KCP object needs more time to be created.
This PR moves the KCP object existence check into an Eventually assertion, so the test will retry to get the KCP object if it fails to do it in the first times.