-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KubeadmControlPlane stuck rolling out changes to apiServer extraArgs #4583
Comments
Those static pod manifests are created by kubeadm, but without a kubeadm log (usually visible in cloud-init) it's basically impossible to debug what went wrong in your case ;) |
/triage support |
@vincepri: The label(s) In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@wcurry One thing that jumps out is that the Cluster API version is a bit behind and we've had multiple bug fixes to KubeadmControlPlane from v0.3.13. Would you be able to update first? |
I failed to get a repro today on a dev cluster. I did dig into the journald logs on the failed roll cluster before it was torn down and nothing stood out. I saw the util.py logs, the files being created by cloud-init, but I don't remember seeing any obvious. I definitely didn't see anything in the journal logs for kubeadm aside from the creation of a yaml and chmod. There was no join failure. Working on updating CAPI this week. I'll see if I can trigger the bug in my downtime. |
@wcurry Just a hint. In case of an error check the following logs: less /var/log/cloud-init-output.log
journalctl -u cloud-init --since "10 hours ago" If there is nothing there just take a look at: journalctl --since "10 hours ago" If you see that kubeadm times out waiting for the static Pods to come up my best guess is to take a look at the logs of the kubelet / containerd unit logs if the containers were started at all (also P.S. it could also be helpful to configure a higher kubeadm verbosity, there is a flag for that. |
@sbueringer Kind of unrelated to this issue, but the above might be good troubleshooting steps to put in our book ^ |
@vincepri Yes. I might have some more. We're running about 200 test installations (cluster create + updates) every night internally. They are only using ClusterAPI (with CAPO) partially right now but we're using kubeadm there since 1-2 years. As we're trying to achieve a very high success rate we have lots of experience debugging those kind of things. I'll collect the troubleshooting hints which are relevant for CAPI and open a PR so we can discuss it in a bit more detail. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close |
@k8s-triage-robot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What steps did you take and what happened:
Deployed a 1 CP, 1 worker cluster. Realized my OIDC config was set for the wrong environment. Redeployed the workload cluster with the updated apiServer extraArgs. The new machine was created, but got stuck provisioning. I did not see any static pods in /etc/kubernetes/manifests.
What did you expect to happen:
CP roll finishes successfully.
Environment:
/etc/os-release
): Ubuntu 18.04I no longer have access to this cluster.
I noticed that the
non-goals
for CAPI include the following:Is this not supported behavior?
I noticed the
kubeadm-join-config.yaml
does not have a section for apiServer. I went to a 3 CP node cluster I have and found that the same was true there, but apiserver was running on thekubeadm join
'd nodes. It's not clear to my how those static pods get created.kubeadm-join-config.yaml from provisioning CP node
/kind bug
The text was updated successfully, but these errors were encountered: