-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ Add UseExperimentalRetryJoin
to KubeadmConfig
#2763
✨ Add UseExperimentalRetryJoin
to KubeadmConfig
#2763
Conversation
I'll be adding tests to test framework and capd to make test connection failures (ala Jepsen) in another PR. @vincepri, you wanted to put this behind a feature flag, right? |
351ae4b
to
cc51348
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@randomvariable thanks for this PR!
Hope this change/all the logging added will help to surface the underlying problem and to implement a proper fix in kubeadm.
74b16cc
to
b07c80b
Compare
7acb417
to
2bb11da
Compare
|
bootstrap/kubeadm/internal/cloudinit/kubeadm-bootstrap-script.sh
Outdated
Show resolved
Hide resolved
Tested with several failure scenarios, including etcd failures on join, and it's WAY more robust than before. 100% success rate vs. 20% success rate with the previous implementation when etcd is overloaded. 🎉 /lgtm |
2bb11da
to
a6f865d
Compare
a6f865d
to
419a9e2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/assign @vincepri
Reviewing now |
/milestone v0.3.3 |
BTW, we had a meeting about the problem and i proposed:
ideally... |
/retitle ✨ Add |
UseExperimentalRetryJoin
to KubeadmConfig
Signed-off-by: Naadir Jeewa <[email protected]>
b443225
to
dcd3d51
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: randomvariable, vincepri The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: Naadir Jeewa [email protected]
What this PR does / why we need it:
Resolve flakey control plane joins by creating a bash script that retries kubeadm control plane join phases. Particularly for CAPV, HAProxy always starts new backends as ready, pending healthchecks (if anyone knows how to change that I'm all ears), vs. AWS ELB which does the exact opposite and hasn't demonstrated these issues before.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #