Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add condition that indicates when a node has been drained #3365

Closed
ncdc opened this issue Jul 17, 2020 · 11 comments
Closed

Add condition that indicates when a node has been drained #3365

ncdc opened this issue Jul 17, 2020 · 11 comments
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Milestone

Comments

@ncdc
Copy link
Contributor

ncdc commented Jul 17, 2020

User Story

As a user/operator I would like to know when the Node for a Machine has finished draining, so I have more insight into the Machine's lifecycle.

Detailed Description

#3132 proposes the addition of pre-drain and pre-terminate Machine lifecycle hooks. If you're writing a hook controller, and you want to run your pre-terminate hook after the node has been drained, there is no easy way to know that the draining has completed. You could query the node, but it would be easier if the Machine had a condition such as NodeDrained that the controller could evaluate.

Anything else you would like to add:

Nope 😄

/kind feature

@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Jul 17, 2020
@vincepri
Copy link
Member

/milestone v0.3.x

@k8s-ci-robot k8s-ci-robot added this to the v0.3.x milestone Jul 20, 2020
@vincepri
Copy link
Member

/help
/priority important-longterm

@k8s-ci-robot
Copy link
Contributor

@vincepri:
This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

/help
/priority important-longterm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. labels Jul 20, 2020
@michaelgugino
Copy link
Contributor

My thinking on this was to have this be apart of the kubectl library, add some annotation to the node after draining is complete. Annotation should be opt-out (so always applied), and NLM can remove the annotation whenever the node transitions to uncordoned. I haven't bothered to write this up as a KEP yet, but this seems like another place where we can standardize with the rest of the community.

@detiber
Copy link
Member

detiber commented Jul 20, 2020

@michaelgugino I believe the thought behind this issue was to have a Condition on the Machine that indicates that the node drain has been completed, which could then be used by external lifecycle hooks to trigger action after the node is drained, but before the infrastructure is deleted.

That said, if there is a common annotation that will exist on the Node, then the condition could be populated by the presence/absence of the annotation on the related Node.

@michaelgugino
Copy link
Contributor

@detiber yeah, I was just thinking about the context of what the machine-controller is doing. The problem with drain is it's somewhat transient as an admin could uncordon the node and cause funky things to happen. Not saying that's a blocker here, just something to think about (this kinda begets another question: will we attempt to drain after successful drain?)

I guess we can just say we guarantee that we'll drain it completely at least once and that's probably fine.

Another concern is, if we want to add power management to the machines (I think we probably should), we need to be careful about what 'drain' is taking place when we indicate drain has succeeded.

One positive to the machine keeping state is that we're signaling we've run our drain, and not someone else's interpretation of drain. The ExcludeNodeDraining annotation kinda muddies things up. Are we instantly declaring the node to be drained?

Consider:

I use ExcludeNodeDraining to prevent machine-controller from draining. I have another controller that will drain the node after machine is marked deleted. This controller should indicate (on the node? The machine?) that it has finished its custom drain process. Another controller with pre-termination hook wants to begin some action after drain is complete and not before.

I think in the above case, indicating on the machine is easiest. If using ExcludeNodeDraining annotation, machine-controller should not set the status/indication of draining complete, but some other actor may set that annotation if they choose. This is material only if another actor is using the pre-termination hook as otherwise the machine-controller will proceed to immediately terminate the instance.

@vincepri
Copy link
Member

vincepri commented Aug 3, 2020

/milestone v0.4.0

@k8s-ci-robot k8s-ci-robot modified the milestones: v0.3.x, v0.4.0 Aug 3, 2020
@fabriziopandini
Copy link
Member

@vincepri @ncdc
Should

// DrainingSucceededCondition provide evidence of the status of the node drain operation which happens during the machine
// deletion process.
DrainingSucceededCondition ConditionType = "DrainingSucceeded"
// DrainingReason (Severity=Info) documents a machine node being drained.
DrainingReason = "Draining"
// DrainingFailedReason (Severity=Warning) documents a machine node drain operation failed.
DrainingFailedReason = "DrainingFailed"
// PreDrainDeleteHookSucceededCondition reports a machine waiting for a PreDrainDeleteHook before being delete.
PreDrainDeleteHookSucceededCondition ConditionType = "PreDrainDeleteHookSucceeded"
// PreTerminateDeleteHookSucceededCondition reports a machine waiting for a PreDrainDeleteHook before being delete.
PreTerminateDeleteHookSucceededCondition ConditionType = "PreTerminateDeleteHookSucceeded"
// WaitingExternalHookReason (Severity=Info) provide evidence that we are waiting for an external hook to complete.
WaitingExternalHookReason = "WaitingExternalHook"

Be enough to close this issue?

@ncdc
Copy link
Contributor Author

ncdc commented Sep 30, 2020

I think so. If we need to make further adjustments re Michael's comments, we can do a new issue for that.

@fabriziopandini
Copy link
Member

fabriziopandini commented Oct 1, 2020

/close
as per comment above (those conditions were implemented by #3527)

@k8s-ci-robot
Copy link
Contributor

@fabriziopandini: Closing this issue.

In response to this:

/close
as per comment above

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Projects
None yet
Development

No branches or pull requests

6 participants