Add infrastructureCapabilities to machines #2927

michaelgugino · 2020-04-16T20:04:06Z

What this PR does / why we need it:
Well behaved machine clients need information as to what (future) features are available on a particular machine. Not all machines support the same operations, and exposing the capabilities of an un

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes # #1811

k8s-ci-robot · 2020-04-16T20:04:16Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: michaelgugino
To complete the pull request process, please assign timothysc
You can assign the PR to them by writing /assign @timothysc in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

vincepri · 2020-04-16T20:08:02Z

/hold

Holding until the linked proposal is accepted / discussed.

fejta-bot · 2020-07-15T21:05:29Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

k8s-ci-robot · 2020-07-15T21:05:36Z

@michaelgugino: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

vincepri · 2020-07-20T15:46:11Z

Circling back to this proposal, I'm not 100% sure this should live here.

It seems the Machine type is crossing a boundary with the infrastructure type component trying to expose infrastructure capabilities for information purposes.

I'm personally -1 to this change, for a few reasons. The contract for this information isn't well specified, and the guarantees aren't clear. What are users expected to do with this information? Can capabilities change? How should infrastructure providers support these?

While we're alpha and getting to v1beta1, our project stance on immutable infrastructure still stands. From Cluster API perspective the only operations we support is create/delete/rolling upgrade, some controllers have support for in place updates where it can be done, but those are pretty limited in scope.

Infrastructure providers can choose to inform users of these capabilities in some custom status field, users or controllers in need of this information can inspect the reference to retrieve the information. What we could do here is to document a contract on how infrastructure providers may expose this information, some sort of convention that specifies the data model.

michaelgugino · 2020-07-21T14:12:20Z

What are users expected to do with this information? Can capabilities change? How should infrastructure providers support these?

TLDR: If we want to manage power states via machines, we need this feature.

These questions are touched on in the linked issue. If we want to build power management into the machine api to support reboot, start/stop type remediation for the MHC or some other component, then we need this feature. For instance, it's not a valid operation to stop a spot instance on AWS, but it can be rebooted.

This PR follows the same format we're using to scrape other data from the infrastructure types and putting them on the machine. We want other components to integrate with the machine so they don't have to worry about the idiosyncrasies of each underlying provider.

These fields are optional for infrastructure providers but would inform other components such as MHC whether or not it can reboot a particular instance. A user could also observe these fields and set some power management field in the spec in the future. Other components (like the MHC) could refer to this info in making a determination as to whether or not the machine needs to be remediated (eg, power state set to off by admin, probably shouldn't delete this machine).

Ultimately, the question is, do we want to support power states?

vincepri · 2020-07-21T14:15:40Z

These fields are optional for infrastructure providers but would inform other components such as MHC whether or not it can reboot a particular instance. A user could also observe these fields and set some power management field in the spec in the future. Other components (like the MHC) could refer to this info in making a determination as to whether or not the machine needs to be remediated (eg, power state set to off by admin, probably shouldn't delete this machine).

This makes sense, although now that we have external remediation, I'd expect an infrastructure provider looking to reboot a Machine to handle that separately.

Ultimately, the question is, do we want to support power states?

We could support it in the future, although for now I'd say it's a little bit out of scope to support reboot and other power states within Cluster API. This is mostly to keep the scope down while we get to beta and can definitely be revisited later on. As mentioned above, the work that went into external remediation allows for these use cases to be fulfilled with external code.

michaelgugino · 2020-07-21T14:37:05Z

We could support it in the future, although for now I'd say it's a little bit out of scope to support reboot and other power states within Cluster API. This is mostly to keep the scope down while we get to beta and can definitely be revisited later on. As mentioned above, the work that went into external remediation allows for these use cases to be fulfilled with external code.

Yes, but part of those conversations IIRC was the external remediation was needed because we lacked power management, and I think the mid-term goal was to add it. We should work to prioritize features with our users. Powerstates is optional behavior. Maybe it will be buggy and suck at first, but that's not a reason to not do it at all.

Anyway, I wrote this PR to demonstrate the concept of the direction I think we need to go to support power states. I tried to make it somewhat generic in case other capabilities are thought of in the future (maybe something like re-provision in place for bare metal hosts, though that's probably a terrible idea in itself but people want to do it).

This is mostly to keep the scope down while we get to beta and can definitely be revisited later on

I'm not saying this has to merge now, but I don't see any reason to not work towards it if people are interested.

fejta-bot · 2020-08-20T14:59:15Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

k8s-ci-robot · 2020-09-03T16:56:00Z

@michaelgugino: The following tests failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
pull-cluster-api-test	`62d4c72`	link	`/test pull-cluster-api-test`
pull-cluster-api-verify	`62d4c72`	link	`/test pull-cluster-api-verify`
pull-cluster-api-e2e	`62d4c72`	link	`/test pull-cluster-api-e2e`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

fejta-bot · 2020-10-03T17:11:43Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot · 2020-10-03T17:11:51Z

@fejta-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Add infrastructureCapabilities to machines

62d4c72

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Apr 16, 2020

k8s-ci-robot requested review from CecileRobertMichon and detiber April 16, 2020 20:04

k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Apr 16, 2020

michaelgugino mentioned this pull request Apr 16, 2020

✨ WIP: Add InfrastructureCapabilities to machines #2320

Closed

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 16, 2020

enxebre mentioned this pull request Apr 23, 2020

External Machine remediation #2846

Closed

k8s-ci-robot added lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Jul 15, 2020

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 20, 2020

k8s-ci-robot closed this Oct 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add infrastructureCapabilities to machines #2927

Add infrastructureCapabilities to machines #2927

michaelgugino commented Apr 16, 2020

k8s-ci-robot commented Apr 16, 2020

vincepri commented Apr 16, 2020

fejta-bot commented Jul 15, 2020

k8s-ci-robot commented Jul 15, 2020

vincepri commented Jul 20, 2020

michaelgugino commented Jul 21, 2020

vincepri commented Jul 21, 2020 •

edited

Loading

michaelgugino commented Jul 21, 2020

fejta-bot commented Aug 20, 2020

k8s-ci-robot commented Sep 3, 2020

fejta-bot commented Oct 3, 2020

k8s-ci-robot commented Oct 3, 2020

Add infrastructureCapabilities to machines #2927

Add infrastructureCapabilities to machines #2927

Conversation

michaelgugino commented Apr 16, 2020

k8s-ci-robot commented Apr 16, 2020

vincepri commented Apr 16, 2020

fejta-bot commented Jul 15, 2020

k8s-ci-robot commented Jul 15, 2020

vincepri commented Jul 20, 2020

michaelgugino commented Jul 21, 2020

vincepri commented Jul 21, 2020 • edited Loading

michaelgugino commented Jul 21, 2020

fejta-bot commented Aug 20, 2020

k8s-ci-robot commented Sep 3, 2020

fejta-bot commented Oct 3, 2020

k8s-ci-robot commented Oct 3, 2020

vincepri commented Jul 21, 2020 •

edited

Loading