📖 CAEP: machine deletion phase hooks #3132

michaelgugino · 2020-06-02T20:48:38Z

What this PR does / why we need it:

Defines a set of annotations that can be applied to a machine which affect the
linear progress of a machine’s lifecycle after a machine has been marked for
deletion. These annotations are optional and may be applied during machine
creation, sometime after machine creation by a user, or sometime after machine
creation by another controller or application.

elmiko

this seems like a reasonable solution to me. i like the opt-in nature and the flexibility it gives to users.

i do wonder about the extended interactions between a, possibly, large number of owning controllers and the ability of a user to reasonably debug issues. i realize some of this will be highly dependent on the complexity that people create, but it concerns me with respect to users inspecting capi artifacts for answers.

i left a couple comments and nits inline

docs/proposals/20200602-machine-deletion-phase-hooks.md

vincepri · 2020-06-23T17:23:20Z

/milestone v0.3.x

elmiko · 2020-07-01T17:18:35Z

i think this is looking good, thanks for the updates.
/lgtm

detiber · 2020-07-01T20:21:26Z

docs/proposals/20200602-machine-deletion-phase-hooks.md

+(pre-drain) As an operator, when I am deleting a control plane machine for
+maintenance reasons, I want to ensure my existing control plane machine is not
+drained until after my replacement comes online.  This will prevent protracted
+periods of a missing control plane member if the replacement machine cannot be


The references here to a control plane member are confusing, especially since it is at odds with the existing work on integrating MachineHealthCheck remediation with KubeadmControlPlane. I think it would be better to genericize this user story to avoid specific mention of control plane

I'm fine with changing the wording, but I don't see how this is at odds with MHC. MHC can delete a machine, as normal. We just prevent the instance from being drained/terminated until the replacement comes online. To the MHC, this just looks like it's taking a long time to drain the machine, it doesn't need to do anything different.

I think when we were discussing the modifications to MHC for the KCP previously, those bits might be a temporary fix until something like this merges.

My concern is that the work that is in-flight (#3185) will block creation of the new Machine until the old Machine has been fully deleted. This is actually needed to avoid cases where scaling up the etcd cluster prior to fully removing the machine being remediated could lead to a loss of quorum.

For example, given a 3 replica Control Plane:

1 machine fails MHC in a way that also impacts the etcd member, we are now at 2/3 healthy etcd members and still have quorum

If we attempt to scale up the Control Plane, part way through the bootstrapping of the 4th replica, the new member will be added to etcd, at this point quorum is lost (2/4 members healthy)

At this point one of two things can happen:

The new etcd member finishes syncing, quorum is restored, and the 4th replica finishes bootstrapping. Cluster wide impact is relatively minimal

The etcd member fails to sync, quorum loss remains. Cluster wide services impacted due to read-only datastore.

If we attempt to scale up the Control Plane, part way through the bootstrapping of the 4th replica, the new member will be added to etcd, at this point quorum is lost (2/4 members healthy)

I have thought about this at great length in the context of what we're doing in OpenShift.

The answer I came up with was divorcing etcd from automatically being enrolled as a member of the cluster upon control plane scale up. We can install the binaries, we can configure it, but just don't add the member to the cluster until some smarter thing elects to do that. So, you could have 6 control plane hosts, and only 3 active etcd members. The others could be 'warm' so you can quickly remove/add the hosts you want. The etcd controller would need to have some idea about fault domains and machine/node statuses, but that is the best way IMO.

This also lets us scale etcd independently of scaling control plane hosts. Want to have a single control plane host with a single etcd member? Easy, scale etcd down to one, then scale down your control plane hosts. etcd controller will ensure that etcd lands in the right spot. Want to scale up to 5 etcd hosts? Fine, scale up the control plane, scale up etcd.

It solves so many problems. I'm not sure exactly how it fits into the existing architecture, but this is the only reasonable setup I can see.

In any case, what we're defining in this CAEP is optional behavior of 3rd party components. Since cluster-api bits are supposed to be reusable (that is still the case, right?), this use case makes sense IMO. If it's creating confusion as you say, I can remove it.

There are probably more nuanced stories we can tease out specifically around control plane / etcd membership and deletion hooks. To move forward with this proposal, though, let's remove this story as you suggested.

detiber · 2020-07-01T20:25:42Z

docs/proposals/20200602-machine-deletion-phase-hooks.md

+
+Hooks defined at this point will prevent the machine-controller from draining a node after the machine-object has been marked for deletion until the hooks are removed.
+##### pre-term
+`pre-term.delete.hook.machine.cluster.x-k8s.io`


would it make sense to be explicit here and call this pre-terminiation.delete.hook.machine.cluster.x-k8s.io instead to avoid any confusion.

Yeah, I think that should be acceptable. The first portion of the key before the '/' character is like 253 char limit, so that should be okay.

docs/proposals/20200602-machine-deletion-phase-hooks.md

detiber · 2020-07-01T20:36:34Z

docs/proposals/20200602-machine-deletion-phase-hooks.md

+
+pre-term.hook.machine.cluster-api.x-k8s.io/preserve-instance: my-instance-replacement-controller
+
+pre-term.hook.machine.cluster-api.x-k8s.io/never-delete: cluster-admin


Might be better to rename this example a bit. I think I get the intention (that Cluster API should not attempt to delete the underlying infrastructure), but this example could make some believe it gives more assurances than it actually does, since there is nothing preventing an external actor from deleting the InfraMachine resource out of band or any type of termination prevention on the infrastructure side preventing removal of the underlying infra directly.

detiber · 2020-07-01T20:49:20Z

docs/proposals/20200602-machine-deletion-phase-hooks.md

+using CRDs external to the machine-api, gRPC communications, or
+additional annotations on the machine or other objects.
+
+#### Owning Controller Design


I think it might be good to try to find a more descriptive name than Owning Controller here, I think there is a high likelihood of confusion for a user between the owning controller of the resource (Machine Controller in this case) and the owning controller of an annotation on the resource.

"Hook Owning Controller" ? Then we can just use HOC all over the doc?

detiber · 2020-07-01T20:51:35Z

docs/proposals/20200602-machine-deletion-phase-hooks.md

+* Watch machine objects and determine when an appropriate action must be taken.
+* After completing the desired hook action, remove the hook annotation.
+##### Owning Controllers may
+* Watch machine objects and add a hook annotation as desired by the operator


What do you mean by operator here? It's not clear if it's referring to a controller or an end-user. It's also not clear when automatically adding an annotation would be considered appropriate.

I meant 'cluster administrator' so I updated the notes. I'll add a new section about when adding is appropriate.

detiber · 2020-07-01T20:54:13Z

docs/proposals/20200602-machine-deletion-phase-hooks.md

+* After completing the desired hook action, remove the hook annotation.
+##### Owning Controllers may
+* Watch machine objects and add a hook annotation as desired by the operator
+* After completing a hook, set an addition annotation that indicates this


This seems a bit out of scope for this proposal and a bit confusing, since the required deletion of the annotation should also be a sufficient signal that this controller has finished it's work.

Yeah, I see what you mean, but it's intentional. This kind of describes an abstraction so one HOC doesn't necessarily have to know about another.

On HOC "A" let's say I am a pre-drain hook. Let's say I take some action like stopping some specific workload safely. I set the "critical-pods-stopped-safely" annotation and remove my hook. On HOC "B", I'm also a pre-drain hook. I don't know anything about HOC "A", but I do know I'm not going to start my work until I see the annotation "critical-pods-stopped-safely" so I can detach some special storage or something. This also might be used to signal some other controller that doesn't know or care about lifecycle points. The possibilities are numerous. Ideally, each HOC is configurable as to waiting for a certain annotation to be present or absent, so this abstraction is redundant, but there's nothing stopping someone from doing this, so I figured it's worth documenting the approach.

I would probably remove this entire "may" section, but I'm also ok leaving it in. It does not affect what we're going to code; it just describes some possibilities created by adding hooks.

I added some of the details based on questions people previously had in the gdoc. I could remove it, but then those questions are just unanswered. People want to know how to order these things.

Ok, thanks for the clarification. Does something like the following wording help at all?

* Coordinate with other hook controllers through any means possible, such as using common annotations, CRDs, etc. For example, one hook controller could set an annotation indicating it has finished its work, and another hook controller could wait for the presence of the annotation before proceeding.

I also wonder if the "Hook Ordering" section above is sufficient and if this item is repetitive?

docs/proposals/20200602-machine-deletion-phase-hooks.md

ncdc · 2020-07-17T14:05:42Z

I am generally 👍 to proceed. My only real question remaining is about the value of each annotation (i.e. the "owner").

docs/proposals/20200602-machine-deletion-phase-hooks.md

vincepri · 2020-07-20T14:26:54Z

Do we envision that down the line we'd keep the annotation as the main approach to add hooks to deletion workflow, or move to a CRD-based, or ~webhook one?

I'm asking this mostly because once this proposal and the related implementation merges, it becomes an API we'll have to maintain and respect for future versions. As we think of v1alpha4 and getting the core types to v1beta1, I'd like to make sure this group has thought about possible implications and future support for these annotations and hooks. Is this the best place for the hooks to live short and long term? Or would it be better to have an experiment registration CRD?

That said, +1 on moving the proposal along given that it's pretty simple and straightforward.

ncdc · 2020-07-20T14:55:23Z

I guess you're thinking about what it would take to convert an annotation to a CR if we decide to use CRDs for hooks in v1alpha4?

detiber · 2020-07-20T15:17:47Z

Do we envision that down the line we'd keep the annotation as the main approach to add hooks to deletion workflow, or move to a CRD-based, or ~webhook one?

Personally, I think annotation-based approaches are fine for proof of concept and experimental support for features, but not much beyond that, since it is basically an un-versioned addition to the API that creates many challenges around backward and forward compatibility support. It then becomes difficult to impossible to change, deprecate, and/or remove support for when consumed by loosely-coupled external integrations.

It might be good to define out a policy of when/where annotations should be used and possibly provide suggested alternatives where they shouldn't be.

For example, annotations do not present much of an issue when used for coordinating actions between core in-tree components that are updated and released on the same release cadence.

However using annotations between in-tree and out-of-tree providers presents a bit more of a challenge. Sure we could encode it as part of the contracts we are publishing, but it complicates transition to a new version of the contract and makes the transition more error prone. Whereas if the behavior is built into the versioned part of the API, then the backward/forward compatibility can easily be built into the conversions.

For any other integrations, I believe we really should avoid annotations all together for a few different reasons:

Integration may not be tied to a single cluster-api version (ala cluster-autoscaler)
Integration may not be built using Go, let alone vendoring helper libraries that we publish
Less visibility into implementations than we currently have with providers, so harder to know when we introduce breaking changes in new versions

I'm asking this mostly because once this proposal and the related implementation merges, it becomes an API we'll have to maintain and respect for future versions. As we think of v1alpha4 and getting the core types to v1beta1, I'd like to make sure this group has thought about possible implications and future support for these annotations and hooks. Is this the best place for the hooks to live short and long term? Or would it be better to have an experiment registration CRD?

That said, +1 on moving the proposal along given that it's pretty simple and straightforward.

+1

ncdc · 2020-07-20T15:27:55Z

@michaelgugino I think we're mostly good to go here. Could you please review the open comments and check if everything has been addressed in the proposal? Thanks!

vincepri · 2020-07-20T15:39:12Z

I guess you're thinking about what it would take to convert an annotation to a CR if we decide to use CRDs for hooks in v1alpha4?

Yes that's correct, more beyond v1alpha4 and getting to beta at some point.

Personally, I think annotation-based approaches are fine for proof of concept and experimental support for features, but not much beyond that, since it is basically an un-versioned addition to the API that creates many challenges around backward and forward compatibility support. It then becomes difficult to impossible to change, deprecate, and/or remove support for when consumed by loosely-coupled external integrations.

I guess given that our types are still alpha, seems we're ok proceeding with annotations for now, and revisit later. The only warning I see here is in terms of migration from annotation to custom CRDs, it's not going to be a conversion webhook, it could be an external utility, or we could expect users to do the necessary changes when ready.

Does this change our approach within Cluster API to adopt these hooks? For example, we talked in the past about getting KCP to register hooks, should we avoid doing so, or are we ok proceeding with that plan if/when it comes down to it?

It might be good to define out a policy of when/where annotations should be used and possibly provide suggested alternatives where they shouldn't be.

💯

detiber · 2020-07-20T15:44:59Z

I guess given that our types are still alpha, seems we're ok proceeding with annotations for now, and revisit later. The only warning I see here is in terms of migration from annotation to custom CRDs, it's not going to be a conversion webhook, it could be an external utility, or we could expect users to do the necessary changes when ready.

If the "external" integration support we are defining here is considered "alpha" or "experimental", then I don't think we need to offer any type of conversion between implementations as the feature matures and could externalize that (either to the external implementation(s) or users).

Does this change our approach within Cluster API to adopt these hooks? For example, we talked in the past about getting KCP to register hooks, should we avoid doing so, or are we ok proceeding with that plan if/when it comes down to it?

I don't think the current implementation details vs future implementation details matter too much for in-tree use, since we know exactly how those components are interacting and can provide the conversion steps needed automatically for those specific use cases.

ncdc · 2020-07-20T15:45:23Z

I'd like to proceed with this proposal so people can start trying out the hooks. We can file issues for the remaining questions. I don't think we need to hold off from using them in CAPI controllers.

vincepri · 2020-07-20T15:47:25Z

Sounds good to me 👍

michaelgugino · 2020-07-20T21:53:09Z

I know there are some perceived shortcomings with annotations compared to CRDs. Sometimes I think there is too much focus on API 'correctness' instead of UX.

With annotations, everything is simple. If I want to add this annotation to every machine in my cluster at creation time, I can add it to my machine-deployments / machine-sets. They're all in one place, I don't have to hunt for them across multiple objects.

There is one CRD implementation I could live with, I think. A 1-to-1 CR to Machine mapping like this:

Kind: MachineLifecycleHook
meta:
  name: my-machine
spec:
  pre-drain-delete:
  - my-hook: some-owner
  - other-hook: owner2
  pre-terminate-delete:
  - wait-for-replacement: custom-controller
...

That still lacks the API of creating a particular hook for every machine at creation time. Then you have to decide who's creating and deleting the object (I suppose it could be a child object of the machine, we'd need to set a finalizer on it, remove the finalizer later). We'd need to do something hacky like 'initialHooks' as part of the machineSpec.

So, now we need lots more controller code and multiple webhooks (need to validate the initial set is valid on the machine, and also validate each create/update on the CR). Now, everyone's controller needs to setup watches on two resources. If you give someone a list, they're going to want to order it, and that's not how this was designed, so that's confusing.

Anyway, there's so many other fish to fry, so I hope we never move to CRDs or some other worse abstraction. Just embrace the annotations :)

docs/proposals/20200602-machine-deletion-phase-hooks.md

enxebre · 2020-07-21T08:16:48Z

docs/proposals/20200602-machine-deletion-phase-hooks.md

+node.
+
+#### Story 3
+(pre-drain) As an operator, when I am deleting a control plane machine for


How is this different from story 1?

what will be immediate the consumers of this proposal? Is there any particular story the core capi components would implement right away out of this proposal that could be included here or is this just to enable others?

detiber · 2020-07-21T16:34:30Z

I know there are some perceived shortcomings with annotations compared to CRDs. Sometimes I think there is too much focus on API 'correctness' instead of UX.

For me UX is a big part of why I'm not a fan of annotations. They are an unversioned part of the API and would require coordination across all components that are using them to be able to make any changes to (including name changes, additions, deprecation, and removal). As such, many times anywhere where compatibility is broken in a non-obvious way, it defers the complications related to upgrade to the user to fix.

With annotations, everything is simple. If I want to add this annotation to every machine in my cluster at creation time, I can add it to my machine-deployments / machine-sets. They're all in one place, I don't have to hunt for them across multiple objects.

I feel like this is a best case scenario, though. In the event that something goes wrong annotations feel much more opaque. It is on the user to figure out what annotations are applied, then they need to figure out what behaviors are modified by those annotations, and then finally try to determine what external controller is acting on those annotations before they can figure out what is the problem.

Defines a set of annotations that can be applied to a machine which affect the linear progress of a machine’s lifecycle after a machine has been marked for deletion. These annotations are optional and may be applied during machine creation, sometime after machine creation by a user, or sometime after machine creation by another controller or application.

ncdc · 2020-07-29T14:26:38Z

/lgtm
/assign @CecileRobertMichon @vincepri @detiber

CecileRobertMichon · 2020-07-29T16:26:15Z

docs/proposals/20200602-machine-deletion-phase-hooks.md

+
+## Motivation
+
+Allow custom and 3rd party components to easily interact with a machine or


Is this mechanism meant to be an alternative to infrastructure provider mechanisms to do this where they don't exist? eg. Azure has scheduled events https://docs.microsoft.com/en-us/azure/virtual-machines/linux/scheduled-events

No. This is meant to prevent the machine-controller from performing certain actions at certain times, such as drain, or instructing the infrastructure provider to removing the VM.

vincepri · 2020-08-03T19:03:10Z

/milestone v0.3.9

ncdc · 2020-08-05T17:06:41Z

/retest

vincepri

/approve
/hold

k8s-ci-robot · 2020-08-05T17:12:27Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: vincepri

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [vincepri]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

vincepri · 2020-08-05T17:13:09Z

/hold cancel

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 2, 2020

k8s-ci-robot requested review from benmoss and vincepri June 2, 2020 20:48

elmiko reviewed Jun 2, 2020

View reviewed changes

fabriziopandini reviewed Jun 16, 2020

View reviewed changes

k8s-ci-robot added this to the v0.3.x milestone Jun 23, 2020

michaelgugino force-pushed the machine-deletion-hooks branch from 309a8e0 to 189a30c Compare June 25, 2020 21:13

michaelgugino mentioned this pull request Jul 1, 2020

✨ Implement deletion lifecycle hooks #3273

Merged

michaelgugino force-pushed the machine-deletion-hooks branch from 189a30c to dffafdc Compare July 1, 2020 03:24

k8s-ci-robot assigned elmiko Jul 1, 2020

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 1, 2020

detiber reviewed Jul 1, 2020

View reviewed changes

michaelgugino force-pushed the machine-deletion-hooks branch from dffafdc to 2171346 Compare July 6, 2020 13:51

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 6, 2020

michaelgugino force-pushed the machine-deletion-hooks branch from 2171346 to a00748f Compare July 6, 2020 13:55

ncdc mentioned this pull request Jul 17, 2020

Add condition that indicates when a node has been drained #3365

Closed

detiber reviewed Jul 17, 2020

View reviewed changes

docs/proposals/20200602-machine-deletion-phase-hooks.md Show resolved Hide resolved

docs/proposals/20200602-machine-deletion-phase-hooks.md Show resolved Hide resolved

enxebre reviewed Jul 21, 2020

View reviewed changes

docs/proposals/20200602-machine-deletion-phase-hooks.md Show resolved Hide resolved

enxebre reviewed Jul 21, 2020

View reviewed changes

michaelgugino force-pushed the machine-deletion-hooks branch from a00748f to 6aba084 Compare July 29, 2020 13:37

michaelgugino force-pushed the machine-deletion-hooks branch from 6aba084 to 5cacc9f Compare July 29, 2020 13:39

k8s-ci-robot assigned CecileRobertMichon, detiber, vincepri and ncdc Jul 29, 2020

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 29, 2020

CecileRobertMichon reviewed Jul 29, 2020

View reviewed changes

k8s-ci-robot modified the milestones: v0.3.x, v0.3.9 Aug 3, 2020

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 5, 2020

vincepri approved these changes Aug 5, 2020

View reviewed changes

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 5, 2020

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 5, 2020

k8s-ci-robot merged commit 60ce90a into kubernetes-sigs:master Aug 5, 2020

maplain mentioned this pull request Oct 23, 2020

Drain workload clusters of service Type=Loadbalancer on teardown #3075

Closed

JoelSpeed mentioned this pull request Sep 28, 2021

Add Machine Deletion Hooks Proposal openshift/enhancements#862

Merged


		pre-term.hook.machine.cluster-api.x-k8s.io/preserve-instance: my-instance-replacement-controller

		pre-term.hook.machine.cluster-api.x-k8s.io/never-delete: cluster-admin


		## Motivation

		Allow custom and 3rd party components to easily interact with a machine or

📖 CAEP: machine deletion phase hooks #3132

📖 CAEP: machine deletion phase hooks #3132

Conversation

michaelgugino commented Jun 2, 2020

elmiko left a comment

Choose a reason for hiding this comment

vincepri commented Jun 23, 2020

elmiko commented Jul 1, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ncdc commented Jul 17, 2020

vincepri commented Jul 20, 2020

ncdc commented Jul 20, 2020

detiber commented Jul 20, 2020

ncdc commented Jul 20, 2020

vincepri commented Jul 20, 2020

detiber commented Jul 20, 2020

ncdc commented Jul 20, 2020

vincepri commented Jul 20, 2020

michaelgugino commented Jul 20, 2020

Choose a reason for hiding this comment

enxebre Jul 21, 2020 • edited Loading

Choose a reason for hiding this comment

detiber commented Jul 21, 2020

ncdc commented Jul 29, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vincepri commented Aug 3, 2020

ncdc commented Aug 5, 2020

vincepri left a comment

Choose a reason for hiding this comment

k8s-ci-robot commented Aug 5, 2020

vincepri commented Aug 5, 2020

enxebre Jul 21, 2020 •

edited

Loading