Node Maintenance Controller KEP #1080

yanirq · 2019-06-02T13:00:57Z

Proposal for Node Maintenance Controller (Server side drain) As discussed in sig-node meeting (2019/04/30)
and on : kubernetes/kubernetes#25625 (see Comment)

k8s-ci-robot · 2019-06-02T13:00:58Z

Welcome @yanirq!

It looks like this is your first PR to kubernetes/enhancements 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/enhancements has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

k8s-ci-robot · 2019-06-02T13:00:59Z

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.

If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
If you signed the CLA as a corporation, please sign in with your organization's credentials at https://identity.linuxfoundation.org/projects/cncf to be authorized.
If you have done the above and are still having issues with the CLA being reported as unsigned, please log a ticket with the Linux Foundation Helpdesk: https://support.linuxfoundation.org/
Should you encounter any issues with the Linux Foundation Helpdesk, send a message to the backup e-mail support address at: [email protected]

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

k8s-ci-robot · 2019-06-02T13:01:05Z

Hi @yanirq. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k82cn · 2019-06-03T03:35:41Z

keps/sig-node/20190528-node-maintenance-controller.md

+## Motivation
+
+Move kubectl drain into the server to enable execution of node drain using custom resources calls to the API in order to perform the drain.
+This would also reduce the proliferation of drain behaviours in other projects that re-implement the drain logic to accustom their needs.


that'll be better to highlight the gap of taint/unschedulabe

k82cn · 2019-06-03T03:36:07Z

keps/sig-node/20190528-node-maintenance-controller.md

+
+### Goals
+
+- A new controller that implements the drain logic that is currently [implemented](https://github.com/kubernetes/kubernetes/blob/v1.14.0/pkg/kubectl/cmd/drain/drain.go#L307) in [kubectl](https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#drain).


why not handle in node-life-cycle controller?

If it could be incorporated as part of the node-life-cycle controller even better.
This is the initial description following a project already started (https://github.com/kubevirt/node-maintenance-operator) but it makes sense it should be a part of an existing controller.
The doc could be amended to : Either a new controller or use the node-life-cycle controller

k82cn · 2019-06-03T03:42:06Z

keps/sig-node/20190528-node-maintenance-controller.md

+### Goals
+
+- A new controller that implements the drain logic that is currently [implemented](https://github.com/kubernetes/kubernetes/blob/v1.14.0/pkg/kubectl/cmd/drain/drain.go#L307) in [kubectl](https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#drain).
+- The controller watches and acts upon maintenance CRs to put/remove a node on/from maintenance.


The CRs seems a command to k8s cluster, is that only for maintenance?

The CR can be interpreted as a different way to trigger drain using k8s API instead of CLI, opening new options including adding on top of the CR additional info such as: status,reason, custom taints/labels, etc.

oh, I mean introducing a common CR, named as Command, for such kind of feature :)

ah, yes. Here is an example for this common cr:
https://github.com/kubevirt/node-maintenance-operator/blob/master/deploy/crds/nodemaintenance_crd.yaml

rohantmp · 2019-06-04T05:58:29Z

@yanirq

I find the idea of Node Maintenance useful, and I'd like to suggest an optional field for the spec: timeClass/timeEstimate (name TBD).

I think that, when taking a machine into maintenance it's possible to know approximately how long it's going to be in Maintenance, and it would be useful if that info could be passed to the api so other entities that care about node state (like those that replicate on local storage) should know whether to perform an expensive data resizing operation (recreating replicas on another node) or simply wait for the node to come back up.

yanirq · 2019-06-04T11:55:02Z

@rohantmp

I find the idea of Node Maintenance useful, and I'd like to suggest an optional field for the spec: timeClass/timeEstimate (name TBD).

An initial goal similar to your request was mentioned here.

This can certainly be added to design part in case needed.
I would start with adding another user story to the existing list before moving on to the design section.
I can either add it in this PR (given a user story brief that covers your needs/suggestion)
Or it can be added later on as a different PR.

yanirq · 2019-06-05T09:04:10Z

I signed it

eparis · 2019-07-23T14:22:31Z

/ok-to-test

misterikkit · 2019-09-09T17:38:47Z

keps/sig-node/20190528-node-maintenance-controller.md

+
+## Summary
+
+Node maintenance controller that performs node drain - server side using custom resources calls to the API in order to perform the drain (cordon + evictions)


Any reason this proposes a CRD and not a core API?

misterikkit · 2019-09-09T17:40:02Z

keps/sig-node/20190528-node-maintenance-controller.md

+
+## Motivation
+
+Move kubectl drain into the server to enable execution of node drain using custom resources calls to the API in order to perform the drain.


This line is a description of the solution, not the motivation. Maybe it can be incorporated into summary or goals?

misterikkit · 2019-09-09T17:43:18Z

keps/sig-node/20190528-node-maintenance-controller.md

+
+(https://github.com/kubernetes/kubernetes/issues/25625)
+
+### Goals


These goals mostly describe the proposed solution. I would prefer to see the goals section describe the new capabilities or problems being solved.

misterikkit · 2019-09-09T17:46:08Z

keps/sig-node/20190528-node-maintenance-controller.md

+#### Story 1
+As a third party developer/operator I want to evacuate my workloads appropriately when a node goes into maintenance mode, but I do not want to evacuate them when the node is cordoned.
+
+- Today we can not differentiate between cordon and drain using cluster information


I myself don't understand the difference between cordon and drain. Why can we not tell the difference today? Please consider adding a description of the terms somewhere in this proposal.

misterikkit · 2019-09-09T17:50:12Z

keps/sig-node/20190528-node-maintenance-controller.md

+#### Story 4
+As an operator I need to put a number of nodes i.e. a rack of nodes into node maintenance.
+
+- As a cluster admin i would like to shutdown a rack (due to overheating problem) to do that I would need to move hosts to maintenance.


I think this should explain how the operator's experience changes with this proposal. We already have kubectl drain and things like kubectl get nodes --selector=rack=abc. How will this proposal improve upon that?

misterikkit · 2019-09-09T17:51:09Z

keps/sig-node/20190528-node-maintenance-controller.md

+- As a cluster admin i would like to shutdown a rack (due to overheating problem) to do that I would need to move hosts to maintenance.
+
+#### Story 5
+- As a cluster admin, i would like to perform a firmware update on the nodes.


Is this different from Story 4? Are you proposing any new feature/automation around firmware/bios updates?

misterikkit · 2019-09-09T17:52:01Z

keps/sig-node/20190528-node-maintenance-controller.md

+
+### Implementation Details/Notes/Constraints
+
+- Custom taints added to the node by the controller might be overun by other controllers such as the node life cycle controller for the `TaintNodesByCondition` feature or the node controller for the `TaintBasedEvictions` feature.


By "overrun" do you mean "overwritten"? If you are using a custom taint key, I would not expect other controllers to modify your taint.

misterikkit · 2019-09-09T17:54:27Z

keps/sig-node/20190528-node-maintenance-controller.md

+
+## Proposal
+
+Node maintenance controller watches for new or deleted custom resources e.g. `NodeMaintenance` which indicate that a node in the cluster should either:


I think one key question is whether this should be implemented as a CRD or as part of the NodeSpec.

Would this controller be considered a "core" addon? Or an optional addon? Are there any plans to work with sig-cluster-lifecycle to have this controller deployed by default with kubeadm?

This is something that we would like to consume within the Cluster API project, and the more integrated it is with the default deployment scenarios the easier it will be for us to adopt.

The proposal here aims for core integration as much as possible, but i guess this could be discussed in later phases.

I would however add as many use cases and needs for this KEP to strengthen the case, either in this initial PR or in one of the following if it gets merged.

misterikkit · 2019-09-09T17:55:45Z

keps/sig-node/20190528-node-maintenance-controller.md

+
+### Goals
+
+- A new controller that implements the drain logic that is currently [implemented](https://github.com/kubernetes/kubernetes/blob/v1.14.0/pkg/kubectl/cmd/drain/drain.go#L307) in [kubectl](https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#drain).


Do you mind providing a high-level description of the drain logic you reference?

misterikkit · 2019-09-09T17:56:36Z

keps/sig-node/20190528-node-maintenance-controller.md

+## Proposal
+
+Node maintenance controller watches for new or deleted custom resources e.g. `NodeMaintenance` which indicate that a node in the cluster should either:
+ - `NodeMaintenance` CR created: move node into maintenance, cordon the node - set it as unscheduleble and evict the pods (which can be evicted) from that node.


It seems a bit odd to use create/delete as the "spec" for the CRD. Are there any additional options in the CRD spec? What info gets written to the CRD status?

k8s-ci-robot · 2019-09-23T12:11:41Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: yanirq
To complete the pull request process, please assign derekwaynecarr
You can assign the PR to them by writing /assign @derekwaynecarr in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

keps/sig-node/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

yanirq · 2019-09-23T12:23:29Z

@misterikkit thank you for the review.
I re organized and added some more info in the required sections.
Let me know if you find other points for improvement.

keps/sig-node/20190528-node-maintenance-controller.md

misterikkit · 2019-10-03T18:05:03Z

There are probably several good reasons for node maintenance intent to be a separate object from the node spec. I am not by any means an API expert. As I understand it, modifying node labels is to be pretty restricted for security reasons. Having a separate object allows simpler RBAC to control who is allowed to drain nodes, without granting extra privileges.

Also, we can find precedent for adding new core types as v1alpha1 or developing them as CRDs before making them core types. See also, https://docs.google.com/document/d/1mxMzdbf8vnCnU7Yt0z6QNP7EubSyrontBCgvmi4Ri4Q/edit#

yanirq · 2019-11-13T08:57:39Z

/assign @derekwaynecarr

michaelgugino · 2019-12-05T15:57:59Z

I think we've identified a need here, but I'm not set on the implementation.

What we need is a mechanism to say "I'm working on this node" and to inform other well-behaved parties "Don't mess with this node, it's not your turn"

I would call this idea the 'maintenance lock'. This could be an annotation or part of the spec.

maintenanceLock:
  requestedBy: <component's unique name>
  acquiredAt: <timestamp the component acquired the lock>

This allows a distributed model. Some controller might want to drain (Node Maintence Operator/Controller), or some controller might want to reboot the node (cluster-api), or some controller might want to upgrade the node (OpenShift's machine-config-operator), some component might want to schedule a debug pod and then cordon but not drain, the possibilities are many.

This being an informative piece, it allow building integrations with monitoring and remediation systems.

As far as RBAC, if you have permissions to request some "Node Maintenance Object" that directly impacts the node, you probably have admin-level permissions anyway. In any case, there's nothing preventing an operator from deploying something like "Node Maintenance Controller" that interacts with the node directly, thus achieving the same separation (eg, if controller permission to drain the node, then you have permission to update the drain object).

smarterclayton · 2019-12-05T16:39:20Z

@michaelgugino your comment about the "maintenance lock" got me thinking about how locks in distributed systems are just leases without liveness, and we already have a NodeLease object. I might suggest we leverage the existing NodeLease api instead of adding a lock - you "take ownership of the node maintenance lease" in order to do maintenance, and we could define various fields for "why does this lease exist".

That's slightly orthogonal to this KEP, but if you could link any discussion you have about the maintenance lock here (or we can move that discussion to the other spot) that would help.

michaelgugino · 2019-12-05T16:50:07Z

@michaelgugino your comment about the "maintenance lock" got me thinking about how locks in distributed systems are just leases without liveness, and we already have a NodeLease object. I might suggest we leverage the existing NodeLease api instead of adding a lock - you "take ownership of the node maintenance lease" in order to do maintenance, and we could define various fields for "why does this lease exist".

That's slightly orthogonal to this KEP, but if you could link any discussion you have about the maintenance lock here (or we can move that discussion to the other spot) that would help.

@smarterclayton that sounds reasonable. There's not really any other fleshed-out discussion, just mentioned in a couple places. I can put together an enhancment issue to track.

fejta-bot · 2020-03-04T17:07:25Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

yanirq · 2020-03-04T21:36:38Z

/remove-lifecycle stale

fejta-bot · 2020-06-02T22:21:51Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

michaelgugino · 2020-06-03T13:40:08Z

/remove-lifecycle stale

fejta-bot · 2020-09-01T14:40:44Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

yanirq · 2020-09-02T08:23:31Z

/remove-lifecycle stale

fejta-bot · 2020-12-01T09:06:18Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

yanirq · 2020-12-01T13:02:59Z

/remove-lifecycle stale

ehashman · 2021-01-09T00:32:25Z

Hi @yanirq,

This PR has not been updated or discussed in over a year. As such, it is not mergeable in its current state, as it does not follow the standard KEP template, which has been changed significantly since this was opened. I am going to close this PR, but when you have a chance to work on this again, please open an issue, update this to follow the KEP process, and please bring this up again in a SIG Node meeting. Thanks!

k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 2, 2019

k8s-ci-robot requested review from dchen1107 and derekwaynecarr June 2, 2019 13:01

k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 2, 2019

yanirq force-pushed the master branch from 963dc1e to 3b0581c Compare June 2, 2019 13:30

k82cn reviewed Jun 3, 2019

View reviewed changes

Initial KEP for Node Maintenance Controller

6d7d924

yanirq force-pushed the master branch from 3b0581c to 6d7d924 Compare June 3, 2019 09:29

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Jun 5, 2019

dhellmann mentioned this pull request Jun 11, 2019

Machine States & Preboot Bootstrapping kubernetes-sigs/cluster-api#997

Merged

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jul 23, 2019

yanirq mentioned this pull request Jul 28, 2019

define and scope relationship to MCO kubevirt/node-maintenance-operator#28

Closed

pliurh mentioned this pull request Aug 14, 2019

No synchronization between nodes in case of reboot openshift/sriov-network-operator#51

Closed

misterikkit reviewed Sep 9, 2019

View reviewed changes

dlipovetsky mentioned this pull request Sep 25, 2019

Centralize Node Cordon & Drain kubernetes-sigs/cluster-api#994

Closed

stephanh reviewed Sep 27, 2019

View reviewed changes

keps/sig-node/20190528-node-maintenance-controller.md Outdated Show resolved Hide resolved

Move and add descriptions under correct sections.

d20dc09

yanirq force-pushed the master branch from b629b4a to d20dc09 Compare October 2, 2019 09:35

k8s-ci-robot assigned derekwaynecarr Nov 13, 2019

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 4, 2020

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 4, 2020

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 2, 2020

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 3, 2020

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 1, 2020

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 2, 2020

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 1, 2020

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 1, 2020

ehashman closed this Jan 9, 2021


		### Goals

		- A new controller that implements the drain logic that is currently [implemented](https://github.com/kubernetes/kubernetes/blob/v1.14.0/pkg/kubectl/cmd/drain/drain.go#L307) in [kubectl](https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#drain).


		## Summary

		Node maintenance controller that performs node drain - server side using custom resources calls to the API in order to perform the drain (cordon + evictions)


		## Motivation

		Move kubectl drain into the server to enable execution of node drain using custom resources calls to the API in order to perform the drain.


		(https://github.com/kubernetes/kubernetes/issues/25625)

		### Goals


		### Implementation Details/Notes/Constraints

		- Custom taints added to the node by the controller might be overun by other controllers such as the node life cycle controller for the `TaintNodesByCondition` feature or the node controller for the `TaintBasedEvictions` feature.


		## Proposal

		Node maintenance controller watches for new or deleted custom resources e.g. `NodeMaintenance` which indicate that a node in the cluster should either:

Node Maintenance Controller KEP #1080

Node Maintenance Controller KEP #1080

Conversation

yanirq commented Jun 2, 2019

k8s-ci-robot commented Jun 2, 2019

k8s-ci-robot commented Jun 2, 2019

k8s-ci-robot commented Jun 2, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rohantmp commented Jun 4, 2019 • edited Loading

yanirq commented Jun 4, 2019

yanirq commented Jun 5, 2019

eparis commented Jul 23, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yanirq Sep 26, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

k8s-ci-robot commented Sep 23, 2019

yanirq commented Sep 23, 2019

misterikkit commented Oct 3, 2019

yanirq commented Nov 13, 2019

michaelgugino commented Dec 5, 2019

smarterclayton commented Dec 5, 2019

michaelgugino commented Dec 5, 2019

fejta-bot commented Mar 4, 2020

yanirq commented Mar 4, 2020

fejta-bot commented Jun 2, 2020

michaelgugino commented Jun 3, 2020

fejta-bot commented Sep 1, 2020

yanirq commented Sep 2, 2020

fejta-bot commented Dec 1, 2020

yanirq commented Dec 1, 2020

ehashman commented Jan 9, 2021

rohantmp commented Jun 4, 2019 •

edited

Loading

yanirq Sep 26, 2019 •

edited

Loading