Add condition that indicates when a node has been drained #3365

ncdc · 2020-07-17T13:47:46Z

User Story

As a user/operator I would like to know when the Node for a Machine has finished draining, so I have more insight into the Machine's lifecycle.

Detailed Description

#3132 proposes the addition of pre-drain and pre-terminate Machine lifecycle hooks. If you're writing a hook controller, and you want to run your pre-terminate hook after the node has been drained, there is no easy way to know that the draining has completed. You could query the node, but it would be easier if the Machine had a condition such as NodeDrained that the controller could evaluate.

Anything else you would like to add:

Nope 😄

/kind feature

The text was updated successfully, but these errors were encountered:

vincepri · 2020-07-20T14:04:35Z

/milestone v0.3.x

vincepri · 2020-07-20T14:04:45Z

/help
/priority important-longterm

k8s-ci-robot · 2020-07-20T14:04:46Z

@vincepri:
This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

/help
/priority important-longterm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

michaelgugino · 2020-07-20T20:40:06Z

My thinking on this was to have this be apart of the kubectl library, add some annotation to the node after draining is complete. Annotation should be opt-out (so always applied), and NLM can remove the annotation whenever the node transitions to uncordoned. I haven't bothered to write this up as a KEP yet, but this seems like another place where we can standardize with the rest of the community.

detiber · 2020-07-20T20:43:11Z

@michaelgugino I believe the thought behind this issue was to have a Condition on the Machine that indicates that the node drain has been completed, which could then be used by external lifecycle hooks to trigger action after the node is drained, but before the infrastructure is deleted.

That said, if there is a common annotation that will exist on the Node, then the condition could be populated by the presence/absence of the annotation on the related Node.

michaelgugino · 2020-07-20T21:01:11Z

@detiber yeah, I was just thinking about the context of what the machine-controller is doing. The problem with drain is it's somewhat transient as an admin could uncordon the node and cause funky things to happen. Not saying that's a blocker here, just something to think about (this kinda begets another question: will we attempt to drain after successful drain?)

I guess we can just say we guarantee that we'll drain it completely at least once and that's probably fine.

Another concern is, if we want to add power management to the machines (I think we probably should), we need to be careful about what 'drain' is taking place when we indicate drain has succeeded.

One positive to the machine keeping state is that we're signaling we've run our drain, and not someone else's interpretation of drain. The ExcludeNodeDraining annotation kinda muddies things up. Are we instantly declaring the node to be drained?

Consider:

I use ExcludeNodeDraining to prevent machine-controller from draining. I have another controller that will drain the node after machine is marked deleted. This controller should indicate (on the node? The machine?) that it has finished its custom drain process. Another controller with pre-termination hook wants to begin some action after drain is complete and not before.

I think in the above case, indicating on the machine is easiest. If using ExcludeNodeDraining annotation, machine-controller should not set the status/indication of draining complete, but some other actor may set that annotation if they choose. This is material only if another actor is using the pre-termination hook as otherwise the machine-controller will proceed to immediately terminate the instance.

vincepri · 2020-08-03T18:48:40Z

/milestone v0.4.0

fabriziopandini · 2020-09-30T14:36:58Z

@vincepri @ncdc
Should

cluster-api/api/v1alpha3/condition_consts.go

Lines 84 to 101 in 7523f71

    
           // DrainingSucceededCondition provide evidence of the status of the node drain operation which happens during the machine 
        
           // deletion process. 
        
           DrainingSucceededCondition ConditionType = "DrainingSucceeded" 
        
           // DrainingReason (Severity=Info) documents a machine node being drained. 
        
           DrainingReason = "Draining" 
        
           // DrainingFailedReason (Severity=Warning) documents a machine node drain operation failed. 
        
           DrainingFailedReason = "DrainingFailed" 
        
           // PreDrainDeleteHookSucceededCondition reports a machine waiting for a PreDrainDeleteHook before being delete. 
        
           PreDrainDeleteHookSucceededCondition ConditionType = "PreDrainDeleteHookSucceeded" 
        
           // PreTerminateDeleteHookSucceededCondition reports a machine waiting for a PreDrainDeleteHook before being delete. 
        
           PreTerminateDeleteHookSucceededCondition ConditionType = "PreTerminateDeleteHookSucceeded" 
        
           // WaitingExternalHookReason (Severity=Info) provide evidence that we are waiting for an external hook to complete. 
        
           WaitingExternalHookReason = "WaitingExternalHook"

Be enough to close this issue?

ncdc · 2020-09-30T14:46:17Z

I think so. If we need to make further adjustments re Michael's comments, we can do a new issue for that.

fabriziopandini · 2020-10-01T10:35:26Z

/close
as per comment above (those conditions were implemented by #3527)

k8s-ci-robot · 2020-10-01T10:35:34Z

@fabriziopandini: Closing this issue.

In response to this:

/close
as per comment above

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Jul 17, 2020

ncdc mentioned this issue Jul 17, 2020

📖 CAEP: machine deletion phase hooks #3132

Merged

k8s-ci-robot added this to the v0.3.x milestone Jul 20, 2020

k8s-ci-robot added priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. labels Jul 20, 2020

k8s-ci-robot modified the milestones: v0.3.x, v0.4.0 Aug 3, 2020

k8s-ci-robot closed this as completed Oct 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add condition that indicates when a node has been drained #3365

Add condition that indicates when a node has been drained #3365

ncdc commented Jul 17, 2020

vincepri commented Jul 20, 2020

vincepri commented Jul 20, 2020

k8s-ci-robot commented Jul 20, 2020

michaelgugino commented Jul 20, 2020

detiber commented Jul 20, 2020

michaelgugino commented Jul 20, 2020

vincepri commented Aug 3, 2020

fabriziopandini commented Sep 30, 2020

ncdc commented Sep 30, 2020

fabriziopandini commented Oct 1, 2020 •

edited

Loading

k8s-ci-robot commented Oct 1, 2020

Add condition that indicates when a node has been drained #3365

Add condition that indicates when a node has been drained #3365

Comments

ncdc commented Jul 17, 2020

vincepri commented Jul 20, 2020

vincepri commented Jul 20, 2020

k8s-ci-robot commented Jul 20, 2020

michaelgugino commented Jul 20, 2020

detiber commented Jul 20, 2020

michaelgugino commented Jul 20, 2020

vincepri commented Aug 3, 2020

fabriziopandini commented Sep 30, 2020

ncdc commented Sep 30, 2020

fabriziopandini commented Oct 1, 2020 • edited Loading

k8s-ci-robot commented Oct 1, 2020

fabriziopandini commented Oct 1, 2020 •

edited

Loading