make the kubeadm bootstrap token TTL configurable #2770

sbueringer · 2020-03-25T08:30:31Z

What steps did you take and what happened:
I setup an e2e test for CAPO and in some cases nodes were not able to join the cluster.

What did you expect to happen:
Nodes are able to join the cluster

Anything else you would like to add:

(See also: https://kubernetes.slack.com/archives/C8TSNPY4T/p1585035222084200)

To be clear that's not a problem for me anymore, but maybe for somebody else. I'm opening this issue only to discuss if we should change anything. My use case is already solved.

I'm currently setting up e2e tests for CAPO in OpenLab. These tests are not using the built-in images / binaries from the OS. Before kubeadm runs, CI artifacts are downloaded from the kubernetes-release-dev GCS bucket. Because of oversubscription of the underlying OpenStack and some not-ideal bootstrap code (installing and using gsutil instead of just curl), this sometimes took over 20 min. The kubeadm join call then failed because the bootstrap token was not valid anymore.

The kubeadm bootstrapper generates bootstrap tokens with a (hard-coded) default TTL of 15 minutes. These tokens are renewed/refreshed until the infrastructure (e.g. OpenStackMachine) is marked as ready. In case of CAPA/CAPO this means the VM is in state running/active. The problem is now that if it takes longer then 15 minutes after the VM is in state running/active to execute kubeadm join, so the bootstrap token won't be valid anymore. As I said I solved my problem already (by speeding up the ci artifacts download).

Some ideas how this can be solved, if we want to change anything:

increase the hard coded default TTL
make the TTL configurable
refresh the token until the node joined the cluster (if we want, we can even revoke it after the node has joined)

Environment:

Cluster-api version: v0.3.2
Minikube/KIND version: 0.7.0
Kubernetes version: (use kubectl version): 1.17.3

/kind bug

The text was updated successfully, but these errors were encountered:

detiber · 2020-03-25T14:34:42Z

increase the hard coded default TTL

I don't think this is necessarily something that we would want to do, unless (as mentioned below) we add revocation code after the related node has successfully joined.

make the TTL configurable

refresh the token until the node joined the cluster (if we want, we can even revoke it after the node has joined)

This makes a lot of sense given that we already refresh the token periodically before the infrastructure is provisioned.

Since there are already other controllers where we are adding watches to the workload cluster for Node resources, I do not see any reason why we wouldn't also do the same here, which would more easily allow us to add code to revoke the tokens after node joins.

If we do add code to revoke the tokens, we would have to be cautious that it wouldn't impact the use of a shared bootstrap token by MachinePools and would only affect tokens that are used by Nodes generated by individual Machines.

neolit123 · 2020-03-25T15:49:41Z

/remove-kind bug
/kind future

k8s-ci-robot · 2020-03-25T15:49:43Z

@neolit123: The label(s) kind/future cannot be applied, because the repository doesn't have them

In response to this:

/remove-kind bug
/kind future

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

neolit123 · 2020-03-25T15:49:50Z

/kind feature

prankul88 · 2020-04-06T07:09:50Z

/assign

make the TTL configurable

refresh the token until the node joined the cluster (if we want, we can even revoke it after the node has joined)

@detiber You would suggest the latter one?

And if we plan the former, what would be the correct way to make TTL configurable?

vincepri · 2020-04-06T14:42:21Z

Let's work on make it configurable, we can add a optional ~~flag~~ API field with the same default we have today to KubeadmConfig, this way it'll be backward-compatible and it won't be a breaking change.

detiber · 2020-04-06T14:49:06Z

@vincepri If we want to go down the configurable route, I'm wondering if a field might be better, since one might want a different TTL based on which infrastructure provider you may be using for a given workload cluster.

vincepri · 2020-04-06T14:55:44Z

flag/field, yes I mistyped

prankul88 · 2020-04-08T07:12:26Z

@vincepri @detiber Thanks.

zjs · 2020-05-04T22:18:55Z

refresh the token until the node joined the cluster (if we want, we can even revoke it after the node has joined)

This feels like a bug, not a feature, to me. What's the remediation if the token expires before the node joins the cluster?

vincepri · 2020-05-04T23:04:39Z

@zjs There is no automatic remediation today, users can setup a MachineHealthCheck to remediate machines that weren’t able to join a cluster after a specific timeout

fejta-bot · 2020-08-02T23:07:51Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

neolit123 · 2020-08-03T10:05:00Z

/retitle make the kubeadm bootstrap token TTL configurable
/remove-lifecycle stale

detiber · 2020-08-03T14:28:41Z

refresh the token until the node joined the cluster (if we want, we can even revoke it after the node has joined)

This feels like a bug, not a feature, to me. What's the remediation if the token expires before the node joins the cluster?

Refreshing the token does not change the token in this case, it only extends the expiration time for the token, which is actually the current behavior:

cluster-api/bootstrap/kubeadm/controllers/token.go

Lines 74 to 94 in 9e4e82a

    
           // refreshToken extends the TTL for an existing token 
        
           func refreshToken(c client.Client, token string) error { 
        
           	substrs := bootstraputil.BootstrapTokenRegexp.FindStringSubmatch(token) 
        
           	if len(substrs) != 3 { 
        
           		return errors.Errorf("the bootstrap token %q was not of the form %q", token, bootstrapapi.BootstrapTokenPattern) 
        
           	} 
        
           	tokenID := substrs[1] 
        
           	secretName := bootstraputil.BootstrapTokenSecretName(tokenID) 
        
           	secret := &v1.Secret{} 
        
           	if err := c.Get(context.TODO(), client.ObjectKey{Name: secretName, Namespace: metav1.NamespaceSystem}, secret); err != nil { 
        
           		return err 
        
           	} 
        
           	if secret.Data == nil { 
        
           		return errors.Errorf("Invalid bootstrap secret %q, remove the token from the kubadm config to re-create", secretName) 
        
           	} 
        
           	secret.Data[bootstrapapi.BootstrapTokenExpirationKey] = []byte(time.Now().UTC().Add(DefaultTokenTTL).Format(time.RFC3339)) 
        
           	return c.Update(context.TODO(), secret) 
        
           }

We would just need to extend the behavior longer than we currently do (until Machine.Status.InfrastructureReady) to something more indicative of bootstrapping being completed, such as the presence of a NodeRef.

vincepri · 2020-08-03T16:41:28Z

Could we also detect if a token is expired (for some reason) and generate a new one?

detiber · 2020-08-03T16:49:34Z

Could we also detect if a token is expired (for some reason) and generate a new one?

Potentially, but my concern there would be around what happens for anything that may be attempting to use that bootstrap token, we don't really have a way to update it for anything that may be in the process of bootstrapping. Probably not much of a concern if we are talking about MachineDeployments/MachineSets/Machines, since an appropriately deployed MHC should "fix" things up eventually.

It might be more complicated when dealing with MachinePools, especially if we consider a potential future implementation where scaling is deferred to an external autoscaling integration.

vincepri · 2020-08-03T16:50:43Z

I was thinking about MachinePool yeah, that might be a separate issue though

vincepri · 2020-10-22T16:49:26Z

Possibly can be superseded by #3762

superbrothers · 2020-12-24T08:01:54Z

Our baremetal machine takes a long time to set up,so kubeadm bootstrap token expires.

error execution phase preflight: couldn't validate the identity of the API Server: could not find a JWS signature in the cluster-info ConfigMap for token ID "r7haco"

We will work to reduce the time it takes to set up our machine, but we want TTL to be configurable.

superbrothers · 2020-12-24T15:04:15Z

TIL: kubeadm bootstrap token TTL has already been configurable.

	fs.DurationVar(&kubeadmbootstrapcontrollers.DefaultTokenTTL, "bootstrap-token-ttl", 15*time.Minute,
		"The amount of time the bootstrap token will be valid")

https://github.com/kubernetes-sigs/cluster-api/blob/v0.3.12/bootstrap/kubeadm/main.go#L103-L104

Therefore, the value of DefaultTokenTTL in bootstrap/kubeadm/controllers/token.go has no effect. I was confused about changing DefaultTokenTTL (It looks like a constant variable) with an argument in bootstrap/kubeadm/main.go.

vincepri · 2021-01-04T16:50:44Z

The token should also be renewed if the Machine's Infrastructure doesn't transition to Ready state

vincepri · 2021-02-08T18:09:05Z

/close

k8s-ci-robot · 2021-02-08T18:09:10Z

@vincepri: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Mar 25, 2020

sbueringer changed the title ~~Increase kubeadm bootstrap token ttl~~ Increase kubeadm bootstrap token TTL Mar 25, 2020

k8s-ci-robot removed the kind/bug Categorizes issue or PR as related to a bug. label Mar 25, 2020

k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Mar 25, 2020

ncdc added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Mar 25, 2020

ncdc added this to the Next milestone Mar 25, 2020

k8s-ci-robot assigned prankul88 Apr 6, 2020

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 2, 2020

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 3, 2020

k8s-ci-robot changed the title ~~Increase kubeadm bootstrap token TTL~~ make the kubeadm bootstrap token TTL configurable Aug 3, 2020

k8s-ci-robot closed this as completed Feb 8, 2021

randomvariable mentioned this issue Oct 22, 2021

During KCP rollout, etcd churn can prevent renewal of the bootstrap token causing KCP node registration to fail #5477

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make the kubeadm bootstrap token TTL configurable #2770

make the kubeadm bootstrap token TTL configurable #2770

sbueringer commented Mar 25, 2020 •

edited

Loading

detiber commented Mar 25, 2020

neolit123 commented Mar 25, 2020

k8s-ci-robot commented Mar 25, 2020

neolit123 commented Mar 25, 2020

prankul88 commented Apr 6, 2020

vincepri commented Apr 6, 2020 •

edited

Loading

detiber commented Apr 6, 2020

vincepri commented Apr 6, 2020

prankul88 commented Apr 8, 2020

zjs commented May 4, 2020

vincepri commented May 4, 2020

fejta-bot commented Aug 2, 2020

neolit123 commented Aug 3, 2020

detiber commented Aug 3, 2020

vincepri commented Aug 3, 2020

detiber commented Aug 3, 2020

vincepri commented Aug 3, 2020

vincepri commented Oct 22, 2020

superbrothers commented Dec 24, 2020 •

edited

Loading

superbrothers commented Dec 24, 2020

vincepri commented Jan 4, 2021

vincepri commented Feb 8, 2021

k8s-ci-robot commented Feb 8, 2021

make the kubeadm bootstrap token TTL configurable #2770

make the kubeadm bootstrap token TTL configurable #2770

Comments

sbueringer commented Mar 25, 2020 • edited Loading

detiber commented Mar 25, 2020

neolit123 commented Mar 25, 2020

k8s-ci-robot commented Mar 25, 2020

neolit123 commented Mar 25, 2020

prankul88 commented Apr 6, 2020

vincepri commented Apr 6, 2020 • edited Loading

detiber commented Apr 6, 2020

vincepri commented Apr 6, 2020

prankul88 commented Apr 8, 2020

zjs commented May 4, 2020

vincepri commented May 4, 2020

fejta-bot commented Aug 2, 2020

neolit123 commented Aug 3, 2020

detiber commented Aug 3, 2020

vincepri commented Aug 3, 2020

detiber commented Aug 3, 2020

vincepri commented Aug 3, 2020

vincepri commented Oct 22, 2020

superbrothers commented Dec 24, 2020 • edited Loading

superbrothers commented Dec 24, 2020

vincepri commented Jan 4, 2021

vincepri commented Feb 8, 2021

k8s-ci-robot commented Feb 8, 2021

sbueringer commented Mar 25, 2020 •

edited

Loading

vincepri commented Apr 6, 2020 •

edited

Loading

superbrothers commented Dec 24, 2020 •

edited

Loading