-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make the kubeadm bootstrap token TTL configurable #2770
Comments
I don't think this is necessarily something that we would want to do, unless (as mentioned below) we add revocation code after the related node has successfully joined.
This makes a lot of sense given that we already refresh the token periodically before the infrastructure is provisioned. Since there are already other controllers where we are adding watches to the workload cluster for Node resources, I do not see any reason why we wouldn't also do the same here, which would more easily allow us to add code to revoke the tokens after node joins. If we do add code to revoke the tokens, we would have to be cautious that it wouldn't impact the use of a shared bootstrap token by MachinePools and would only affect tokens that are used by Nodes generated by individual Machines. |
/remove-kind bug |
@neolit123: The label(s) In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/kind feature |
/assign
@detiber You would suggest the latter one? And if we plan the former, what would be the correct way to make TTL configurable? |
Let's work on make it configurable, we can add a optional |
@vincepri If we want to go down the configurable route, I'm wondering if a field might be better, since one might want a different TTL based on which infrastructure provider you may be using for a given workload cluster. |
flag/field, yes I mistyped |
This feels like a bug, not a feature, to me. What's the remediation if the token expires before the node joins the cluster? |
@zjs There is no automatic remediation today, users can setup a MachineHealthCheck to remediate machines that weren’t able to join a cluster after a specific timeout |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/retitle make the kubeadm bootstrap token TTL configurable |
Refreshing the token does not change the token in this case, it only extends the expiration time for the token, which is actually the current behavior: cluster-api/bootstrap/kubeadm/controllers/token.go Lines 74 to 94 in 9e4e82a
We would just need to extend the behavior longer than we currently do (until Machine.Status.InfrastructureReady) to something more indicative of bootstrapping being completed, such as the presence of a NodeRef. |
Could we also detect if a token is expired (for some reason) and generate a new one? |
Potentially, but my concern there would be around what happens for anything that may be attempting to use that bootstrap token, we don't really have a way to update it for anything that may be in the process of bootstrapping. Probably not much of a concern if we are talking about MachineDeployments/MachineSets/Machines, since an appropriately deployed MHC should "fix" things up eventually. It might be more complicated when dealing with MachinePools, especially if we consider a potential future implementation where scaling is deferred to an external autoscaling integration. |
I was thinking about MachinePool yeah, that might be a separate issue though |
Possibly can be superseded by #3762 |
Our baremetal machine takes a long time to set up,so kubeadm bootstrap token expires.
We will work to reduce the time it takes to set up our machine, but we want TTL to be configurable. |
TIL: kubeadm bootstrap token TTL has already been configurable. fs.DurationVar(&kubeadmbootstrapcontrollers.DefaultTokenTTL, "bootstrap-token-ttl", 15*time.Minute,
"The amount of time the bootstrap token will be valid") Therefore, the value of |
The token should also be renewed if the Machine's Infrastructure doesn't transition to |
/close |
@vincepri: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What steps did you take and what happened:
I setup an e2e test for CAPO and in some cases nodes were not able to join the cluster.
What did you expect to happen:
Nodes are able to join the cluster
Anything else you would like to add:
(See also: https://kubernetes.slack.com/archives/C8TSNPY4T/p1585035222084200)
To be clear that's not a problem for me anymore, but maybe for somebody else. I'm opening this issue only to discuss if we should change anything. My use case is already solved.
I'm currently setting up e2e tests for CAPO in OpenLab. These tests are not using the built-in images / binaries from the OS. Before kubeadm runs, CI artifacts are downloaded from the
kubernetes-release-dev
GCS bucket. Because of oversubscription of the underlying OpenStack and some not-ideal bootstrap code (installing and using gsutil instead of just curl), this sometimes took over 20 min. Thekubeadm join
call then failed because the bootstrap token was not valid anymore.The kubeadm bootstrapper generates bootstrap tokens with a (hard-coded) default TTL of 15 minutes. These tokens are renewed/refreshed until the infrastructure (e.g. OpenStackMachine) is marked as ready. In case of CAPA/CAPO this means the VM is in state running/active. The problem is now that if it takes longer then 15 minutes after the VM is in state running/active to execute
kubeadm join
, so the bootstrap token won't be valid anymore. As I said I solved my problem already (by speeding up the ci artifacts download).Some ideas how this can be solved, if we want to change anything:
Environment:
kubectl version
): 1.17.3/kind bug
The text was updated successfully, but these errors were encountered: