-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add infrastructureCapabilities to machines #2927
Add infrastructureCapabilities to machines #2927
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: michaelgugino The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold Holding until the linked proposal is accepted / discussed. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
@michaelgugino: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Circling back to this proposal, I'm not 100% sure this should live here. It seems the Machine type is crossing a boundary with the infrastructure type component trying to expose infrastructure capabilities for information purposes. I'm personally -1 to this change, for a few reasons. The contract for this information isn't well specified, and the guarantees aren't clear. What are users expected to do with this information? Can capabilities change? How should infrastructure providers support these? While we're alpha and getting to v1beta1, our project stance on immutable infrastructure still stands. From Cluster API perspective the only operations we support is create/delete/rolling upgrade, some controllers have support for in place updates where it can be done, but those are pretty limited in scope. Infrastructure providers can choose to inform users of these capabilities in some custom status field, users or controllers in need of this information can inspect the reference to retrieve the information. What we could do here is to document a contract on how infrastructure providers may expose this information, some sort of convention that specifies the data model. |
TLDR: If we want to manage power states via machines, we need this feature. These questions are touched on in the linked issue. If we want to build power management into the machine api to support reboot, start/stop type remediation for the MHC or some other component, then we need this feature. For instance, it's not a valid operation to stop a spot instance on AWS, but it can be rebooted. This PR follows the same format we're using to scrape other data from the infrastructure types and putting them on the machine. We want other components to integrate with the machine so they don't have to worry about the idiosyncrasies of each underlying provider. These fields are optional for infrastructure providers but would inform other components such as MHC whether or not it can reboot a particular instance. A user could also observe these fields and set some power management field in the spec in the future. Other components (like the MHC) could refer to this info in making a determination as to whether or not the machine needs to be remediated (eg, power state set to off by admin, probably shouldn't delete this machine). Ultimately, the question is, do we want to support power states? |
This makes sense, although now that we have external remediation, I'd expect an infrastructure provider looking to reboot a Machine to handle that separately.
We could support it in the future, although for now I'd say it's a little bit out of scope to support reboot and other power states within Cluster API. This is mostly to keep the scope down while we get to beta and can definitely be revisited later on. As mentioned above, the work that went into external remediation allows for these use cases to be fulfilled with external code. |
Yes, but part of those conversations IIRC was the external remediation was needed because we lacked power management, and I think the mid-term goal was to add it. We should work to prioritize features with our users. Powerstates is optional behavior. Maybe it will be buggy and suck at first, but that's not a reason to not do it at all. Anyway, I wrote this PR to demonstrate the concept of the direction I think we need to go to support power states. I tried to make it somewhat generic in case other capabilities are thought of in the future (maybe something like re-provision in place for bare metal hosts, though that's probably a terrible idea in itself but people want to do it).
I'm not saying this has to merge now, but I don't see any reason to not work towards it if people are interested. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
@michaelgugino: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
@fejta-bot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What this PR does / why we need it:
Well behaved machine clients need information as to what (future) features are available on a particular machine. Not all machines support the same operations, and exposing the capabilities of an un
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes # #1811