-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
attempt to work around kube-vip rbac issues #299
Conversation
Note As this is a draft PR no triggers from the PR body will be handled. If you'd like to trigger them while draft please add them as a PR comment. |
0f92732
to
13f6a40
Compare
/run cluster-test-suites |
cluster-test-suites
📋 View full results in Tekton Dashboard Rerun trigger: Tip To only re-run the failed test suites you can provide a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
I don't think there's something like yq
installed in our images, so sed
is probably the best we can do. Could we then at least make sure to only replace this path in the hostPath
volume section? Maybe just including path:
in the sed
would already make it a bit safer. 🙂
yep i can't use yq (i checked). This sed is safe because there is only one occurrence of |
/run cluster-test-suites |
5167b26
to
065cb71
Compare
There were differences in the rendered Helm template, please check! Output
|
cluster-test-suites
📋 View full results in Tekton Dashboard Rerun trigger: Tip To only re-run the failed test suites you can provide a |
Alright! Shoot if I can help you with debugging the tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTMIIW
/run cluster-test-suites TARGET_SUITES=./providers/capv/standard |
I'm curious if tests fail because of the recently releases CAPV v28.0.1... |
cluster-test-suites
📋 View full results in Tekton Dashboard Rerun trigger: Tip To only re-run the failed test suites you can provide a |
/run cluster-test-suites TARGET_SUITES=./providers/capv/standard E2E_WC_KEEP=true |
That doesn't work from CI |
/run cluster-test-suites TARGET_SUITES=./providers/capv/standard |
Right, this PR isn't testable using CI because it's trying to deploy a cluster using
but this is misleading because the file is actually being created by the kube-vip pod manifest configuration. Because of this, the file is actually empty which causes kube-vip to crash:
|
cluster-test-suites
📋 View full results in Tekton Dashboard Rerun trigger: Tip To only re-run the failed test suites you can provide a |
But how did the upgrade test work then? Or did it even replace the nodes and just went on without? 😬 |
You can easily test this PR from the open CAPV v29.0.0 PR. 🙂 |
I would hazard a guess that it's because the new node is joined to an existing node and so the chicken and egg problem doesn't exist. The problem this PR addresses is the cluster bootstrapping when the first node in a new cluster comes up - kube-vip needs to talk to the API server in order to coordinate leader election to bring up the VIP, but because the In an upgrade situation the cluster is already bootstrapped so the behaviour introduced by this PR will be more like the second situation because the new node won't even have the |
tests passed after i abused the v29 release PR giantswarm/releases#1459 (comment) |
Nice! So we can theoretically merge this, get another patch release and continue on releasing CAPV v29.0.0 with it, right? |
yep, i'm going to merge this in and release it so as to unblock the release |
towards https://github.com/giantswarm/giantswarm/issues/31986
Currently, kube-vip uses the admin.conf kubeconfig in order to interact with the API. As of kubernetes 1.29, the behaviour of this kubeconfig has changed due to security reasons.
What happens now is that the
admin.conf
kubeconfig has no clusterrolebinding (and hence no privileges) until the API server is started.This however leaves us with a chicken and egg issue because kube-vip uses this kubeconfig to talk to the API, but because the admin kubeconfig has the kube-vip floating IP as the API address (and kube-vip needs to start in order to advertise it), it is unable to start and so the API IP never actually gets advertised and cluster bootstrapping fails.
The new behaviour does place a
super-admin.conf
kubeconfig on the first CP node to be bootstrapped and this kubeconfig has cluster admin permissions (system:masters
), however it is not advised to use this kubeconfig with any workloads (and this kubeconfig does not exist on the 2nd and 3rd CP nodes anyway).The (somewhat hacky) solution in this PR - we initially create the kube-vip static pod manifest with the super-admin kubeconfig which allows bootstrapping of the first CP node to complete and then we use the post kubeadm commands to revert the pod manifest to use the
admin.conf
kubeconfig. This method has different outcomes depending on which CP node it runs on.first CP node
other CP nodes
Yes this is extremely hacky, however there currently doesn't seem to be a better option until kube-vip comes up with a better idea
see kube-vip/kube-vip#684 for discussion
This change successfully created a cluster manually: