Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node Maintenance Controller KEP #1080

Closed
wants to merge 2 commits into from
Closed

Conversation

yanirq
Copy link

@yanirq yanirq commented Jun 2, 2019

Proposal for Node Maintenance Controller (Server side drain) As discussed in sig-node meeting (2019/04/30)
and on : kubernetes/kubernetes#25625 (see Comment)

@k8s-ci-robot
Copy link
Contributor

Welcome @yanirq!

It looks like this is your first PR to kubernetes/enhancements 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/enhancements has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot
Copy link
Contributor

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.


  • If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
  • If you signed the CLA as a corporation, please sign in with your organization's credentials at https://identity.linuxfoundation.org/projects/cncf to be authorized.
  • If you have done the above and are still having issues with the CLA being reported as unsigned, please log a ticket with the Linux Foundation Helpdesk: https://support.linuxfoundation.org/
  • Should you encounter any issues with the Linux Foundation Helpdesk, send a message to the backup e-mail support address at: [email protected]

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 2, 2019
@k8s-ci-robot
Copy link
Contributor

Hi @yanirq. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 2, 2019
## Motivation

Move kubectl drain into the server to enable execution of node drain using custom resources calls to the API in order to perform the drain.
This would also reduce the proliferation of drain behaviours in other projects that re-implement the drain logic to accustom their needs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that'll be better to highlight the gap of taint/unschedulabe


### Goals

- A new controller that implements the drain logic that is currently [implemented](https://github.com/kubernetes/kubernetes/blob/v1.14.0/pkg/kubectl/cmd/drain/drain.go#L307) in [kubectl](https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#drain).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not handle in node-life-cycle controller?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it could be incorporated as part of the node-life-cycle controller even better.
This is the initial description following a project already started (https://github.com/kubevirt/node-maintenance-operator) but it makes sense it should be a part of an existing controller.
The doc could be amended to : Either a new controller or use the node-life-cycle controller

### Goals

- A new controller that implements the drain logic that is currently [implemented](https://github.com/kubernetes/kubernetes/blob/v1.14.0/pkg/kubectl/cmd/drain/drain.go#L307) in [kubectl](https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#drain).
- The controller watches and acts upon maintenance CRs to put/remove a node on/from maintenance.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CRs seems a command to k8s cluster, is that only for maintenance?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CR can be interpreted as a different way to trigger drain using k8s API instead of CLI, opening new options including adding on top of the CR additional info such as: status,reason, custom taints/labels, etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, I mean introducing a common CR, named as Command, for such kind of feature :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rohantmp
Copy link

rohantmp commented Jun 4, 2019

@yanirq

I find the idea of Node Maintenance useful, and I'd like to suggest an optional field for the spec: timeClass/timeEstimate (name TBD).

I think that, when taking a machine into maintenance it's possible to know approximately how long it's going to be in Maintenance, and it would be useful if that info could be passed to the api so other entities that care about node state (like those that replicate on local storage) should know whether to perform an expensive data resizing operation (recreating replicas on another node) or simply wait for the node to come back up.

@yanirq
Copy link
Author

yanirq commented Jun 4, 2019

@rohantmp

I find the idea of Node Maintenance useful, and I'd like to suggest an optional field for the spec: timeClass/timeEstimate (name TBD).

An initial goal similar to your request was mentioned here.

This can certainly be added to design part in case needed.
I would start with adding another user story to the existing list before moving on to the design section.
I can either add it in this PR (given a user story brief that covers your needs/suggestion)
Or it can be added later on as a different PR.

@yanirq
Copy link
Author

yanirq commented Jun 5, 2019

I signed it

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Jun 5, 2019
@eparis
Copy link
Contributor

eparis commented Jul 23, 2019

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jul 23, 2019

## Summary

Node maintenance controller that performs node drain - server side using custom resources calls to the API in order to perform the drain (cordon + evictions)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason this proposes a CRD and not a core API?


## Motivation

Move kubectl drain into the server to enable execution of node drain using custom resources calls to the API in order to perform the drain.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is a description of the solution, not the motivation. Maybe it can be incorporated into summary or goals?


(https://github.com/kubernetes/kubernetes/issues/25625)

### Goals

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These goals mostly describe the proposed solution. I would prefer to see the goals section describe the new capabilities or problems being solved.

#### Story 1
As a third party developer/operator I want to evacuate my workloads appropriately when a node goes into maintenance mode, but I do not want to evacuate them when the node is cordoned.

- Today we can not differentiate between cordon and drain using cluster information

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I myself don't understand the difference between cordon and drain. Why can we not tell the difference today? Please consider adding a description of the terms somewhere in this proposal.

#### Story 4
As an operator I need to put a number of nodes i.e. a rack of nodes into node maintenance.

- As a cluster admin i would like to shutdown a rack (due to overheating problem) to do that I would need to move hosts to maintenance.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should explain how the operator's experience changes with this proposal. We already have kubectl drain and things like kubectl get nodes --selector=rack=abc. How will this proposal improve upon that?

- As a cluster admin i would like to shutdown a rack (due to overheating problem) to do that I would need to move hosts to maintenance.

#### Story 5
- As a cluster admin, i would like to perform a firmware update on the nodes.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this different from Story 4? Are you proposing any new feature/automation around firmware/bios updates?


### Implementation Details/Notes/Constraints

- Custom taints added to the node by the controller might be overun by other controllers such as the node life cycle controller for the `TaintNodesByCondition` feature or the node controller for the `TaintBasedEvictions` feature.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By "overrun" do you mean "overwritten"? If you are using a custom taint key, I would not expect other controllers to modify your taint.


## Proposal

Node maintenance controller watches for new or deleted custom resources e.g. `NodeMaintenance` which indicate that a node in the cluster should either:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think one key question is whether this should be implemented as a CRD or as part of the NodeSpec.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this controller be considered a "core" addon? Or an optional addon? Are there any plans to work with sig-cluster-lifecycle to have this controller deployed by default with kubeadm?

This is something that we would like to consume within the Cluster API project, and the more integrated it is with the default deployment scenarios the easier it will be for us to adopt.

Copy link
Author

@yanirq yanirq Sep 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proposal here aims for core integration as much as possible, but i guess this could be discussed in later phases.

I would however add as many use cases and needs for this KEP to strengthen the case, either in this initial PR or in one of the following if it gets merged.


### Goals

- A new controller that implements the drain logic that is currently [implemented](https://github.com/kubernetes/kubernetes/blob/v1.14.0/pkg/kubectl/cmd/drain/drain.go#L307) in [kubectl](https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#drain).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind providing a high-level description of the drain logic you reference?

## Proposal

Node maintenance controller watches for new or deleted custom resources e.g. `NodeMaintenance` which indicate that a node in the cluster should either:
- `NodeMaintenance` CR created: move node into maintenance, cordon the node - set it as unscheduleble and evict the pods (which can be evicted) from that node.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems a bit odd to use create/delete as the "spec" for the CRD. Are there any additional options in the CRD spec? What info gets written to the CRD status?

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: yanirq
To complete the pull request process, please assign derekwaynecarr
You can assign the PR to them by writing /assign @derekwaynecarr in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@yanirq
Copy link
Author

yanirq commented Sep 23, 2019

@misterikkit thank you for the review.
I re organized and added some more info in the required sections.
Let me know if you find other points for improvement.

@misterikkit
Copy link

There are probably several good reasons for node maintenance intent to be a separate object from the node spec. I am not by any means an API expert. As I understand it, modifying node labels is to be pretty restricted for security reasons. Having a separate object allows simpler RBAC to control who is allowed to drain nodes, without granting extra privileges.

Also, we can find precedent for adding new core types as v1alpha1 or developing them as CRDs before making them core types. See also, https://docs.google.com/document/d/1mxMzdbf8vnCnU7Yt0z6QNP7EubSyrontBCgvmi4Ri4Q/edit#

@yanirq
Copy link
Author

yanirq commented Nov 13, 2019

/assign @derekwaynecarr

@michaelgugino
Copy link

I think we've identified a need here, but I'm not set on the implementation.

What we need is a mechanism to say "I'm working on this node" and to inform other well-behaved parties "Don't mess with this node, it's not your turn"

I would call this idea the 'maintenance lock'. This could be an annotation or part of the spec.

maintenanceLock:
  requestedBy: <component's unique name>
  acquiredAt: <timestamp the component acquired the lock>

This allows a distributed model. Some controller might want to drain (Node Maintence Operator/Controller), or some controller might want to reboot the node (cluster-api), or some controller might want to upgrade the node (OpenShift's machine-config-operator), some component might want to schedule a debug pod and then cordon but not drain, the possibilities are many.

This being an informative piece, it allow building integrations with monitoring and remediation systems.

As far as RBAC, if you have permissions to request some "Node Maintenance Object" that directly impacts the node, you probably have admin-level permissions anyway. In any case, there's nothing preventing an operator from deploying something like "Node Maintenance Controller" that interacts with the node directly, thus achieving the same separation (eg, if controller permission to drain the node, then you have permission to update the drain object).

@smarterclayton
Copy link
Contributor

@michaelgugino your comment about the "maintenance lock" got me thinking about how locks in distributed systems are just leases without liveness, and we already have a NodeLease object. I might suggest we leverage the existing NodeLease api instead of adding a lock - you "take ownership of the node maintenance lease" in order to do maintenance, and we could define various fields for "why does this lease exist".

That's slightly orthogonal to this KEP, but if you could link any discussion you have about the maintenance lock here (or we can move that discussion to the other spot) that would help.

@michaelgugino
Copy link

@michaelgugino your comment about the "maintenance lock" got me thinking about how locks in distributed systems are just leases without liveness, and we already have a NodeLease object. I might suggest we leverage the existing NodeLease api instead of adding a lock - you "take ownership of the node maintenance lease" in order to do maintenance, and we could define various fields for "why does this lease exist".

That's slightly orthogonal to this KEP, but if you could link any discussion you have about the maintenance lock here (or we can move that discussion to the other spot) that would help.

@smarterclayton that sounds reasonable. There's not really any other fleshed-out discussion, just mentioned in a couple places. I can put together an enhancment issue to track.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 4, 2020
@yanirq
Copy link
Author

yanirq commented Mar 4, 2020

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 4, 2020
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 2, 2020
@michaelgugino
Copy link

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 3, 2020
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 1, 2020
@yanirq
Copy link
Author

yanirq commented Sep 2, 2020

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 2, 2020
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 1, 2020
@yanirq
Copy link
Author

yanirq commented Dec 1, 2020

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 1, 2020
@ehashman
Copy link
Member

ehashman commented Jan 9, 2021

Hi @yanirq,

This PR has not been updated or discussed in over a year. As such, it is not mergeable in its current state, as it does not follow the standard KEP template, which has been changed significantly since this was opened. I am going to close this PR, but when you have a chance to work on this again, please open an issue, update this to follow the KEP process, and please bring this up again in a SIG Node meeting. Thanks!

@ehashman ehashman closed this Jan 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory ok-to-test Indicates a non-member PR verified by an org member that is safe to test. sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.