Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration with (future) NodeMaintenance API #9757

Open
sftim opened this issue Nov 22, 2023 · 3 comments
Open

Integration with (future) NodeMaintenance API #9757

sftim opened this issue Nov 22, 2023 · 3 comments
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@sftim
Copy link
Contributor

sftim commented Nov 22, 2023

What would you like to be added (User Story)?

Given that the NodeMaintenance API becomes adopted as a proposed or alpha API:

As a cluster operator, I would like to use to specify node-level maintenance operations.
And I would like the cluster API's controllers to take appropriate actions when observing relevant node maintenance objects,
so that the workloads I run on my cluster remain available.

Detailed Description

Cluster API develops the ability (eg through Discovery) to detect the existence of a NodeMaintenance API. Maintenance, such as declarative drain events, is accounted for by cluster autoscaling and other controllers.

Anything else you would like to add?

Example: I have a MachineDeployment for each of three physical zones where I run my cluster. A controller I have made myself manages the scale subresource for each MachineDeployment.

Separately, I make a NodeMaintenance that matches topology.kubernetes.io/zone: "antartica-1c"; I'm updating the network configuration in just that one zone. Let's assume that antartica-1a and antartica-1b are the other two zones.

I'd like to be able to observe a condition on the antartica-1c MachineSet, because when all the machines are under maintenance I think a condition is appropriate. I'd also expect to see something .status showing the maintenance. That might be status.readyReplicas as the actual maintenance happens, or another new field.

The outcome should be that my custom controller has enough information to scale out the MachineDeployments in antartica-1a and antartica-1b to cover the expected shortfall, so that there is a home for the replacement Pods as soon as the app-level self healing mechanism kicks in.


This issue is also a call to action; I'd love to see better intergration between CAPI and kubectl drain, etc. NodeMaintenance could be part of that story.

Label(s) to be applied

/kind feature

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 22, 2023
@sbueringer
Copy link
Member

Very interesting. Thx for surfacing this here!

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 12, 2023
@fabriziopandini
Copy link
Member

/priority important-longterm

@k8s-ci-robot k8s-ci-robot added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Apr 11, 2024
@fabriziopandini
Copy link
Member

/remove-triage accepted
The KEP is not yet merged, so we should wait

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. and removed triage/accepted Indicates an issue or PR is ready to be actively worked on. labels May 6, 2024
@fabriziopandini fabriziopandini added the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Jul 17, 2024
@k8s-ci-robot k8s-ci-robot removed the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Jul 17, 2024
@fabriziopandini fabriziopandini added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Jul 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

4 participants