✨ Add nodeDeletionTimeout property to Machine #5608

schrej · 2021-11-08T15:48:03Z

What this PR does / why we need it:
Adds a nodeDeletionTimeout property to Machines to specify the timeout when deleting the Node belonging to a Machine.

This contains the following change to Machine behavior:
Currently, errors occurring during Node deletion are ignored, which was introduced due to #1446.
With this PR, Node deletion errors are returned by the reconcile function, which causes the Machine to get re-queued for reconciliation. This leads to an infinite retry of Node deletion.

I've also added a test for machine deletion.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
closes #5516

k8s-ci-robot · 2021-11-08T15:48:11Z

Hi @schrej. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

controllers/machine_controller.go

MaxRink · 2021-11-08T16:57:53Z

/ok-to-test

enxebre · 2021-11-10T12:16:43Z

The reason behind that issue is a problem we've faced with metal3, and the only proper solution for that is waiting indefinitely.

If this needs to be granular per pools of nodes i.e machineDeployment and per providers it should be a field, but only if we think the use case is justified. Long term we might want to consider grouping node lifecycle related properties e.g drain-time-out/node-delete-time-out under the same struct field. Also depending on the problem and scenarios we might want to use a different angle, e.g lifecycle hooks #5556.

@schrej could you please elaborate on the particular problem and scenarios being faced in metal3 so it's easier to understand and agree on the rationale for the best solution?

schrej · 2021-11-10T13:14:54Z

When, for whatever reason, node deletion fails with metal3, the node will never get deleted and remain in the cluster indefinitely. There is also no other component that takes care of deletion, like in other providers where a CPI would do that.
If a new machine gets created that has the same IP address as the one where the node wasn't deleted, you end up with two nodes in the cluster that both have the same IP address.
In our case this prevented Calico from deploying to that new node properly, as it doesn't like it if two nodes have the same IP.

While retrying indefinitely doesn't guarantee that the node gets deleted, it's at least continuously attempted, and it is also visible in the control plane cluster that something is wrong.

enxebre · 2021-11-10T13:46:07Z

for whatever reason

Is there any particular predictable reason you've identified for this to happen? Would retrying indefinitely solve any of them? have we considered e.may be have a condition instead to address the visibility concern?

MaxRink · 2021-11-10T14:18:48Z

for whatever reason

Is there any particular predictable reason you've identified for this to happen? Would retrying indefinitely solve any of them? have we considered e.may be have a condition instead to address the visibility concern?

There are a number of race conditions that lead to this, like MHC kicking in on a slow provisioning, cleaning up a node that is just joining, deletion of nodes on the backend taking a bit long, so kubelet is still running when CAPI tries to delete the node and others.
All of them can be resolved by retrying the deletion lateron tho, when the kubelet is not running anymore.

enxebre · 2021-11-10T14:34:04Z

Thanks @MaxRink

There are a number of race conditions that lead to this, like MHC kicking in on a slow provisioning, cleaning up a node that is just joining,
cleaning up a node that is just joining
deletion of nodes on the backend taking a bit long, so kubelet is still running when CAPI tries to delete the node and others.

How exactly does this make client.delete(node) fail?

How can the kubelet be running when CAPI tries to delete the node? We intentionally only try to delete the Node after we ensure the underlying infra is gone [1], otherwise stateful assumptions might be broken [2].
Is this any different in metal3 context?

[1]

cluster-api/controllers/machine_controller.go

Lines 367 to 376 in c52fdb3

    
           if ok, err := r.reconcileDeleteInfrastructure(ctx, m); !ok || err != nil { 
        
           	return ctrl.Result{}, err 
        
           } 
        
           if ok, err := r.reconcileDeleteBootstrap(ctx, m); !ok || err != nil { 
        
           	return ctrl.Result{}, err 
        
           } 
        
           // We only delete the node after the underlying infrastructure is gone. 
        
           // https://github.com/kubernetes-sigs/cluster-api/issues/2565

[2] #2565

MaxRink · 2021-11-10T15:41:43Z

Like i said, its not only an issue with Metal3 and not limited to those exact conditions above. That race condition can also arise due to API Server being unreachable for a moment as the stacked loadbalancer lease changes after an controlplane upgrade, general network whoes, implementation bugs in providers ....

enxebre · 2021-11-12T14:28:45Z

that race condition can also arise due to API Server being unreachable for a moment as the stacked loadbalancer lease changes after an controlplane upgrade, general network whoes, implementation bugs in providers ....

All those seems to be temporary exceptional cases that I question they should be addressed by a runtime flag or a spec.field that is to express user intent.

We already have NodeDrainTimeout so it would be nicely consistent to have another field like NodeDeletionTimeout.

I think I see NodeDrainTimeout from a different angle as it intersects with particular user workloads and user defined particular node pools, therefore it providers value for users to customise it and it make sense for it to be a spec.field. IMO NodeDeletionTimeout is something intrinsic to the cluster lifecycle which has no intersection with user intent or workloads, so I'd expect CAPI to handle it without putting any burden on the user.

Based on the above I wonder if we should consider to approach the problem transparently for users and retry indefinitely by erroring so controller runtime would handle the rate limiting and backoff. Then #1446 would require to either fix the issue or delete the cluster as suggested above which seems reasonable and isDeleteNodeAllowed already accounts for it. E.g

if isDeleteNodeAllowed {
	log.Info("Deleting node", "node", m.Status.NodeRef.Name)

	var deleteNodeErr error
	waitErr := wait.PollImmediate(2*time.Second, 10*time.Second, func() (bool, error) {
		if deleteNodeErr = r.deleteNode(ctx, cluster, m.Status.NodeRef.Name); deleteNodeErr != nil && !apierrors.IsNotFound(errors.Cause(deleteNodeErr)) {
			return false, nil
		}
		return true, nil
	})
	if waitErr != nil {
		conditions.MarkFalse(m, clusterv1.MachineNodeHealthyCondition, clusterv1.DeletionFailedReason, clusterv1.ConditionSeverityWarning, "")
		r.recorder.Eventf(m, corev1.EventTypeWarning, "FailedDeleteNode", "error deleting Machine's node: %v", deleteNodeErr)
		return ctrl.Result{}, deleteNodeErr
	}
}

Thoughts?

schrej · 2021-11-15T13:49:34Z

Based on the above I wonder if we should consider to approach the problem transparently for users and retry indefinitely by erroring so controller runtime would handle the rate limiting and backoff. Then #1446 would require to either fix the issue or delete the cluster as suggested above which seems reasonable and isDeleteNodeAllowed already accounts for it.

I initially thought the current behaviour is a bug, so I fully agree 😄. I never really understood the use case in that issue either. Maybe at that time the behaviour of the machine controller was different, and it didn't consider cluster deletion yet?

A configurable timeout might make sense in addition though. Not to solve this problem, but as a general feature. Imo that could also be a configurable flag, and doesn't need to be configurable on a per cluster level, which just makes it more complex.

vincepri · 2021-12-01T16:35:00Z

@schrej Are you ok proceeding with a field here instead of a flag?

schrej · 2021-12-01T18:29:15Z

@schrej Are you ok proceeding with a field here instead of a flag?

I am. But what will it be? WaitForNodeDeletion or NodeDeletionTimeout?
With the latter we would always wait for node deletion again (see previous comment), which would be a behaviour change.

vincepri · 2021-12-01T19:49:06Z

NodeDeletionTimeout sgtm

linux-foundation-easycla · 2021-12-06T15:57:10Z

The committers are authorized under a signed CLA.

✅ Jakob (0d8d072)

sbueringer · 2022-01-27T18:17:52Z

Sorry for the delay. I'll take a look next week. Currently a lot to do for the v1.1 release.

vincepri

LGTM
/assign @sbueringer

sbueringer

Last nit, otherwise lgtm from my side.

sbueringer · 2022-02-01T09:03:50Z

controlplane/kubeadm/api/v1beta1/kubeadm_control_plane_webhook.go

@@ -152,6 +152,7 @@ func (in *KubeadmControlPlane) ValidateUpdate(old runtime.Object) error {
 		{spec, "machineTemplate", "infrastructureRef", "apiVersion"},
 		{spec, "machineTemplate", "infrastructureRef", "name"},
 		{spec, "machineTemplate", "nodeDrainTimeout"},
+		{spec, "machineTemplate", "nodeDeletionTimeout"},


Can you please also extend the unit test?

You only have to set a value in ~l.242 and then change it in ~367.

We just had a few cases in the past where we thought something is mutable, but it wasn't.

sbueringer · 2022-02-01T09:12:58Z

@schrej EasyCLA seems to have an issue, but I think it should be still in non-blocking mode until 5th February.

sbueringer · 2022-02-01T10:03:35Z

Thx! lgtm from my side. I think we should probably squash the commits into one.

fabriziopandini · 2022-02-06T10:20:33Z

lgtm pending squash
great job @schrej!

schrej · 2022-02-10T06:51:31Z

/easycla

The machine controller now uses the configured timeout when deleting the Node belonging to a Machine. It will also requeue the Machine for reconciliation if deletion fails, which will lead to infinite retries.

schrej · 2022-02-14T11:04:39Z

Squashed.

Sorry for the delay, but it's skiing season 😬

sbueringer · 2022-02-14T11:09:07Z

No worries. Thx!

/lgtm
/cc @fabriziopandini for approval

k8s-ci-robot · 2022-02-14T11:09:09Z

@sbueringer: GitHub didn't allow me to request PR reviews from the following users: for, approval.

Note that only kubernetes-sigs members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

No worries. Thx!

/lgtm
/cc @fabriziopandini for approval

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

sbueringer · 2022-02-22T07:33:14Z

Given that we were only waiting for the squash and already had lgtm's before
/approve

k8s-ci-robot · 2022-02-22T07:33:48Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sbueringer

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [sbueringer]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 8, 2021

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Nov 8, 2021

k8s-ci-robot requested review from CecileRobertMichon and fabriziopandini November 8, 2021 15:48

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 8, 2021

schrej force-pushed the feature/node-drain-timeout-flag branch 2 times, most recently from bdfbba3 to a827644 Compare November 8, 2021 15:53

vincepri reviewed Nov 8, 2021

View reviewed changes

controllers/machine_controller.go Outdated Show resolved Hide resolved

vincepri reviewed Nov 8, 2021

View reviewed changes

controllers/machine_controller.go Outdated Show resolved Hide resolved

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 8, 2021

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 1, 2021

schrej force-pushed the feature/node-drain-timeout-flag branch from a827644 to 5d8e3b4 Compare December 6, 2021 15:57

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 6, 2021

schrej force-pushed the feature/node-drain-timeout-flag branch from 5d8e3b4 to 1ea72e5 Compare December 7, 2021 17:12

schrej changed the title ~~✨ Add --wait-for-node-deletion flag~~ ✨ Add nodeDeletionTimeout property to Machine Dec 7, 2021

schrej force-pushed the feature/node-drain-timeout-flag branch from 424ef66 to f10f591 Compare January 25, 2022 10:01

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 25, 2022

schrej force-pushed the feature/node-drain-timeout-flag branch from f10f591 to 38fc2c3 Compare January 25, 2022 17:50

schrej requested a review from sbueringer January 25, 2022 18:16

vincepri approved these changes Jan 31, 2022

View reviewed changes

k8s-ci-robot assigned sbueringer Jan 31, 2022

sbueringer reviewed Feb 1, 2022

View reviewed changes

schrej force-pushed the feature/node-drain-timeout-flag branch from e504706 to f3b520e Compare February 1, 2022 09:33

k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. and removed cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Feb 10, 2022

add nodeDrainTimeout to Machine

0d8d072

The machine controller now uses the configured timeout when deleting the Node belonging to a Machine. It will also requeue the Machine for reconciliation if deletion fails, which will lead to infinite retries.

schrej force-pushed the feature/node-drain-timeout-flag branch from f3b520e to 0d8d072 Compare February 14, 2022 09:58

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Feb 14, 2022

k8s-ci-robot requested a review from fabriziopandini February 14, 2022 11:09

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 14, 2022

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 22, 2022

k8s-ci-robot merged commit 4c44d05 into kubernetes-sigs:main Feb 22, 2022

k8s-ci-robot added this to the v1.2 milestone Feb 22, 2022

schrej deleted the feature/node-drain-timeout-flag branch March 9, 2022 18:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

✨ Add nodeDeletionTimeout property to Machine #5608

✨ Add nodeDeletionTimeout property to Machine #5608

schrej commented Nov 8, 2021 •

edited

Loading

k8s-ci-robot commented Nov 8, 2021

MaxRink commented Nov 8, 2021

enxebre commented Nov 10, 2021 •

edited

Loading

schrej commented Nov 10, 2021

enxebre commented Nov 10, 2021

MaxRink commented Nov 10, 2021

enxebre commented Nov 10, 2021

MaxRink commented Nov 10, 2021

enxebre commented Nov 12, 2021

schrej commented Nov 15, 2021

vincepri commented Dec 1, 2021

schrej commented Dec 1, 2021

vincepri commented Dec 1, 2021

linux-foundation-easycla bot commented Dec 6, 2021 •

edited

Loading

sbueringer commented Jan 27, 2022

vincepri left a comment

sbueringer left a comment

sbueringer Feb 1, 2022

sbueringer commented Feb 1, 2022

sbueringer commented Feb 1, 2022

fabriziopandini commented Feb 6, 2022

schrej commented Feb 10, 2022

schrej commented Feb 14, 2022

sbueringer commented Feb 14, 2022

k8s-ci-robot commented Feb 14, 2022

sbueringer commented Feb 22, 2022

k8s-ci-robot commented Feb 22, 2022

✨ Add nodeDeletionTimeout property to Machine #5608

✨ Add nodeDeletionTimeout property to Machine #5608

Conversation

schrej commented Nov 8, 2021 • edited Loading

k8s-ci-robot commented Nov 8, 2021

MaxRink commented Nov 8, 2021

enxebre commented Nov 10, 2021 • edited Loading

schrej commented Nov 10, 2021

enxebre commented Nov 10, 2021

MaxRink commented Nov 10, 2021

enxebre commented Nov 10, 2021

MaxRink commented Nov 10, 2021

enxebre commented Nov 12, 2021

schrej commented Nov 15, 2021

vincepri commented Dec 1, 2021

schrej commented Dec 1, 2021

vincepri commented Dec 1, 2021

linux-foundation-easycla bot commented Dec 6, 2021 • edited Loading

sbueringer commented Jan 27, 2022

vincepri left a comment

Choose a reason for hiding this comment

sbueringer left a comment

Choose a reason for hiding this comment

sbueringer Feb 1, 2022

Choose a reason for hiding this comment

sbueringer commented Feb 1, 2022

sbueringer commented Feb 1, 2022

fabriziopandini commented Feb 6, 2022

schrej commented Feb 10, 2022

schrej commented Feb 14, 2022

sbueringer commented Feb 14, 2022

k8s-ci-robot commented Feb 14, 2022

sbueringer commented Feb 22, 2022

k8s-ci-robot commented Feb 22, 2022

schrej commented Nov 8, 2021 •

edited

Loading

enxebre commented Nov 10, 2021 •

edited

Loading

linux-foundation-easycla bot commented Dec 6, 2021 •

edited

Loading