Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

📖 CAEP: machine deletion phase hooks #3132

Conversation

michaelgugino
Copy link
Contributor

What this PR does / why we need it:

Defines a set of annotations that can be applied to a machine which affect the
linear progress of a machine’s lifecycle after a machine has been marked for
deletion. These annotations are optional and may be applied during machine
creation, sometime after machine creation by a user, or sometime after machine
creation by another controller or application.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 2, 2020
@k8s-ci-robot k8s-ci-robot requested review from benmoss and vincepri June 2, 2020 20:48
Copy link
Contributor

@elmiko elmiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems like a reasonable solution to me. i like the opt-in nature and the flexibility it gives to users.

i do wonder about the extended interactions between a, possibly, large number of owning controllers and the ability of a user to reasonably debug issues. i realize some of this will be highly dependent on the complexity that people create, but it concerns me with respect to users inspecting capi artifacts for answers.

i left a couple comments and nits inline

@vincepri
Copy link
Member

/milestone v0.3.x

@k8s-ci-robot k8s-ci-robot added this to the v0.3.x milestone Jun 23, 2020
@michaelgugino michaelgugino force-pushed the machine-deletion-hooks branch from 309a8e0 to 189a30c Compare June 25, 2020 21:13
@michaelgugino michaelgugino force-pushed the machine-deletion-hooks branch from 189a30c to dffafdc Compare July 1, 2020 03:24
@elmiko
Copy link
Contributor

elmiko commented Jul 1, 2020

i think this is looking good, thanks for the updates.
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 1, 2020
(pre-drain) As an operator, when I am deleting a control plane machine for
maintenance reasons, I want to ensure my existing control plane machine is not
drained until after my replacement comes online. This will prevent protracted
periods of a missing control plane member if the replacement machine cannot be
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The references here to a control plane member are confusing, especially since it is at odds with the existing work on integrating MachineHealthCheck remediation with KubeadmControlPlane. I think it would be better to genericize this user story to avoid specific mention of control plane

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with changing the wording, but I don't see how this is at odds with MHC. MHC can delete a machine, as normal. We just prevent the instance from being drained/terminated until the replacement comes online. To the MHC, this just looks like it's taking a long time to drain the machine, it doesn't need to do anything different.

I think when we were discussing the modifications to MHC for the KCP previously, those bits might be a temporary fix until something like this merges.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is that the work that is in-flight (#3185) will block creation of the new Machine until the old Machine has been fully deleted. This is actually needed to avoid cases where scaling up the etcd cluster prior to fully removing the machine being remediated could lead to a loss of quorum.

For example, given a 3 replica Control Plane:

  • 1 machine fails MHC in a way that also impacts the etcd member, we are now at 2/3 healthy etcd members and still have quorum
  • If we attempt to scale up the Control Plane, part way through the bootstrapping of the 4th replica, the new member will be added to etcd, at this point quorum is lost (2/4 members healthy)
  • At this point one of two things can happen:
    • The new etcd member finishes syncing, quorum is restored, and the 4th replica finishes bootstrapping. Cluster wide impact is relatively minimal
    • The etcd member fails to sync, quorum loss remains. Cluster wide services impacted due to read-only datastore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we attempt to scale up the Control Plane, part way through the bootstrapping of the 4th replica, the new member will be added to etcd, at this point quorum is lost (2/4 members healthy)

I have thought about this at great length in the context of what we're doing in OpenShift.

The answer I came up with was divorcing etcd from automatically being enrolled as a member of the cluster upon control plane scale up. We can install the binaries, we can configure it, but just don't add the member to the cluster until some smarter thing elects to do that. So, you could have 6 control plane hosts, and only 3 active etcd members. The others could be 'warm' so you can quickly remove/add the hosts you want. The etcd controller would need to have some idea about fault domains and machine/node statuses, but that is the best way IMO.

This also lets us scale etcd independently of scaling control plane hosts. Want to have a single control plane host with a single etcd member? Easy, scale etcd down to one, then scale down your control plane hosts. etcd controller will ensure that etcd lands in the right spot. Want to scale up to 5 etcd hosts? Fine, scale up the control plane, scale up etcd.

It solves so many problems. I'm not sure exactly how it fits into the existing architecture, but this is the only reasonable setup I can see.

In any case, what we're defining in this CAEP is optional behavior of 3rd party components. Since cluster-api bits are supposed to be reusable (that is still the case, right?), this use case makes sense IMO. If it's creating confusion as you say, I can remove it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are probably more nuanced stories we can tease out specifically around control plane / etcd membership and deletion hooks. To move forward with this proposal, though, let's remove this story as you suggested.


Hooks defined at this point will prevent the machine-controller from draining a node after the machine-object has been marked for deletion until the hooks are removed.
##### pre-term
`pre-term.delete.hook.machine.cluster.x-k8s.io`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make sense to be explicit here and call this pre-terminiation.delete.hook.machine.cluster.x-k8s.io instead to avoid any confusion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think that should be acceptable. The first portion of the key before the '/' character is like 253 char limit, so that should be okay.


pre-term.hook.machine.cluster-api.x-k8s.io/preserve-instance: my-instance-replacement-controller

pre-term.hook.machine.cluster-api.x-k8s.io/never-delete: cluster-admin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be better to rename this example a bit. I think I get the intention (that Cluster API should not attempt to delete the underlying infrastructure), but this example could make some believe it gives more assurances than it actually does, since there is nothing preventing an external actor from deleting the InfraMachine resource out of band or any type of termination prevention on the infrastructure side preventing removal of the underlying infra directly.

using CRDs external to the machine-api, gRPC communications, or
additional annotations on the machine or other objects.

#### Owning Controller Design
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be good to try to find a more descriptive name than Owning Controller here, I think there is a high likelihood of confusion for a user between the owning controller of the resource (Machine Controller in this case) and the owning controller of an annotation on the resource.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Hook Owning Controller" ? Then we can just use HOC all over the doc?

* Watch machine objects and determine when an appropriate action must be taken.
* After completing the desired hook action, remove the hook annotation.
##### Owning Controllers may
* Watch machine objects and add a hook annotation as desired by the operator
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by operator here? It's not clear if it's referring to a controller or an end-user. It's also not clear when automatically adding an annotation would be considered appropriate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant 'cluster administrator' so I updated the notes. I'll add a new section about when adding is appropriate.

* After completing the desired hook action, remove the hook annotation.
##### Owning Controllers may
* Watch machine objects and add a hook annotation as desired by the operator
* After completing a hook, set an addition annotation that indicates this
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a bit out of scope for this proposal and a bit confusing, since the required deletion of the annotation should also be a sufficient signal that this controller has finished it's work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I see what you mean, but it's intentional. This kind of describes an abstraction so one HOC doesn't necessarily have to know about another.

On HOC "A" let's say I am a pre-drain hook. Let's say I take some action like stopping some specific workload safely. I set the "critical-pods-stopped-safely" annotation and remove my hook. On HOC "B", I'm also a pre-drain hook. I don't know anything about HOC "A", but I do know I'm not going to start my work until I see the annotation "critical-pods-stopped-safely" so I can detach some special storage or something. This also might be used to signal some other controller that doesn't know or care about lifecycle points. The possibilities are numerous. Ideally, each HOC is configurable as to waiting for a certain annotation to be present or absent, so this abstraction is redundant, but there's nothing stopping someone from doing this, so I figured it's worth documenting the approach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably remove this entire "may" section, but I'm also ok leaving it in. It does not affect what we're going to code; it just describes some possibilities created by adding hooks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some of the details based on questions people previously had in the gdoc. I could remove it, but then those questions are just unanswered. People want to know how to order these things.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks for the clarification. Does something like the following wording help at all?

* Coordinate with other hook controllers through any means possible, such as using common annotations, CRDs, etc. For example, one hook controller could set an annotation indicating it has finished its work, and another hook controller could wait for the presence of the annotation before proceeding.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also wonder if the "Hook Ordering" section above is sufficient and if this item is repetitive?

@michaelgugino michaelgugino force-pushed the machine-deletion-hooks branch from dffafdc to 2171346 Compare July 6, 2020 13:51
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 6, 2020
@michaelgugino michaelgugino force-pushed the machine-deletion-hooks branch from 2171346 to a00748f Compare July 6, 2020 13:55
@ncdc
Copy link
Contributor

ncdc commented Jul 17, 2020

I am generally 👍 to proceed. My only real question remaining is about the value of each annotation (i.e. the "owner").

@vincepri
Copy link
Member

Do we envision that down the line we'd keep the annotation as the main approach to add hooks to deletion workflow, or move to a CRD-based, or ~webhook one?

I'm asking this mostly because once this proposal and the related implementation merges, it becomes an API we'll have to maintain and respect for future versions. As we think of v1alpha4 and getting the core types to v1beta1, I'd like to make sure this group has thought about possible implications and future support for these annotations and hooks. Is this the best place for the hooks to live short and long term? Or would it be better to have an experiment registration CRD?

That said, +1 on moving the proposal along given that it's pretty simple and straightforward.

@ncdc
Copy link
Contributor

ncdc commented Jul 20, 2020

I guess you're thinking about what it would take to convert an annotation to a CR if we decide to use CRDs for hooks in v1alpha4?

@detiber
Copy link
Member

detiber commented Jul 20, 2020

Do we envision that down the line we'd keep the annotation as the main approach to add hooks to deletion workflow, or move to a CRD-based, or ~webhook one?

Personally, I think annotation-based approaches are fine for proof of concept and experimental support for features, but not much beyond that, since it is basically an un-versioned addition to the API that creates many challenges around backward and forward compatibility support. It then becomes difficult to impossible to change, deprecate, and/or remove support for when consumed by loosely-coupled external integrations.

It might be good to define out a policy of when/where annotations should be used and possibly provide suggested alternatives where they shouldn't be.

For example, annotations do not present much of an issue when used for coordinating actions between core in-tree components that are updated and released on the same release cadence.

However using annotations between in-tree and out-of-tree providers presents a bit more of a challenge. Sure we could encode it as part of the contracts we are publishing, but it complicates transition to a new version of the contract and makes the transition more error prone. Whereas if the behavior is built into the versioned part of the API, then the backward/forward compatibility can easily be built into the conversions.

For any other integrations, I believe we really should avoid annotations all together for a few different reasons:

  • Integration may not be tied to a single cluster-api version (ala cluster-autoscaler)
  • Integration may not be built using Go, let alone vendoring helper libraries that we publish
  • Less visibility into implementations than we currently have with providers, so harder to know when we introduce breaking changes in new versions

I'm asking this mostly because once this proposal and the related implementation merges, it becomes an API we'll have to maintain and respect for future versions. As we think of v1alpha4 and getting the core types to v1beta1, I'd like to make sure this group has thought about possible implications and future support for these annotations and hooks. Is this the best place for the hooks to live short and long term? Or would it be better to have an experiment registration CRD?

That said, +1 on moving the proposal along given that it's pretty simple and straightforward.

+1

@ncdc
Copy link
Contributor

ncdc commented Jul 20, 2020

@michaelgugino I think we're mostly good to go here. Could you please review the open comments and check if everything has been addressed in the proposal? Thanks!

@vincepri
Copy link
Member

I guess you're thinking about what it would take to convert an annotation to a CR if we decide to use CRDs for hooks in v1alpha4?

Yes that's correct, more beyond v1alpha4 and getting to beta at some point.

Personally, I think annotation-based approaches are fine for proof of concept and experimental support for features, but not much beyond that, since it is basically an un-versioned addition to the API that creates many challenges around backward and forward compatibility support. It then becomes difficult to impossible to change, deprecate, and/or remove support for when consumed by loosely-coupled external integrations.

I guess given that our types are still alpha, seems we're ok proceeding with annotations for now, and revisit later. The only warning I see here is in terms of migration from annotation to custom CRDs, it's not going to be a conversion webhook, it could be an external utility, or we could expect users to do the necessary changes when ready.

Does this change our approach within Cluster API to adopt these hooks? For example, we talked in the past about getting KCP to register hooks, should we avoid doing so, or are we ok proceeding with that plan if/when it comes down to it?

It might be good to define out a policy of when/where annotations should be used and possibly provide suggested alternatives where they shouldn't be.

💯

@detiber
Copy link
Member

detiber commented Jul 20, 2020

I guess given that our types are still alpha, seems we're ok proceeding with annotations for now, and revisit later. The only warning I see here is in terms of migration from annotation to custom CRDs, it's not going to be a conversion webhook, it could be an external utility, or we could expect users to do the necessary changes when ready.

If the "external" integration support we are defining here is considered "alpha" or "experimental", then I don't think we need to offer any type of conversion between implementations as the feature matures and could externalize that (either to the external implementation(s) or users).

Does this change our approach within Cluster API to adopt these hooks? For example, we talked in the past about getting KCP to register hooks, should we avoid doing so, or are we ok proceeding with that plan if/when it comes down to it?

I don't think the current implementation details vs future implementation details matter too much for in-tree use, since we know exactly how those components are interacting and can provide the conversion steps needed automatically for those specific use cases.

@ncdc
Copy link
Contributor

ncdc commented Jul 20, 2020

I'd like to proceed with this proposal so people can start trying out the hooks. We can file issues for the remaining questions. I don't think we need to hold off from using them in CAPI controllers.

@vincepri
Copy link
Member

Sounds good to me 👍

@michaelgugino
Copy link
Contributor Author

I know there are some perceived shortcomings with annotations compared to CRDs. Sometimes I think there is too much focus on API 'correctness' instead of UX.

With annotations, everything is simple. If I want to add this annotation to every machine in my cluster at creation time, I can add it to my machine-deployments / machine-sets. They're all in one place, I don't have to hunt for them across multiple objects.

There is one CRD implementation I could live with, I think. A 1-to-1 CR to Machine mapping like this:

Kind: MachineLifecycleHook
meta:
  name: my-machine
spec:
  pre-drain-delete:
  - my-hook: some-owner
  - other-hook: owner2
  pre-terminate-delete:
  - wait-for-replacement: custom-controller
...

That still lacks the API of creating a particular hook for every machine at creation time. Then you have to decide who's creating and deleting the object (I suppose it could be a child object of the machine, we'd need to set a finalizer on it, remove the finalizer later). We'd need to do something hacky like 'initialHooks' as part of the machineSpec.

So, now we need lots more controller code and multiple webhooks (need to validate the initial set is valid on the machine, and also validate each create/update on the CR). Now, everyone's controller needs to setup watches on two resources. If you give someone a list, they're going to want to order it, and that's not how this was designed, so that's confusing.

Anyway, there's so many other fish to fry, so I hope we never move to CRDs or some other worse abstraction. Just embrace the annotations :)

node.

#### Story 3
(pre-drain) As an operator, when I am deleting a control plane machine for
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this different from story 1?

Copy link
Member

@enxebre enxebre Jul 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what will be immediate the consumers of this proposal? Is there any particular story the core capi components would implement right away out of this proposal that could be included here or is this just to enable others?

@detiber
Copy link
Member

detiber commented Jul 21, 2020

I know there are some perceived shortcomings with annotations compared to CRDs. Sometimes I think there is too much focus on API 'correctness' instead of UX.

For me UX is a big part of why I'm not a fan of annotations. They are an unversioned part of the API and would require coordination across all components that are using them to be able to make any changes to (including name changes, additions, deprecation, and removal). As such, many times anywhere where compatibility is broken in a non-obvious way, it defers the complications related to upgrade to the user to fix.

With annotations, everything is simple. If I want to add this annotation to every machine in my cluster at creation time, I can add it to my machine-deployments / machine-sets. They're all in one place, I don't have to hunt for them across multiple objects.

I feel like this is a best case scenario, though. In the event that something goes wrong annotations feel much more opaque. It is on the user to figure out what annotations are applied, then they need to figure out what behaviors are modified by those annotations, and then finally try to determine what external controller is acting on those annotations before they can figure out what is the problem.

@michaelgugino michaelgugino force-pushed the machine-deletion-hooks branch from a00748f to 6aba084 Compare July 29, 2020 13:37
Defines a set of annotations that can be applied to a machine which affect the
linear progress of a machine’s lifecycle after a machine has been marked for
deletion.  These annotations are optional and may be applied during machine
creation, sometime after machine creation by a user, or sometime after machine
creation by another controller or application.
@michaelgugino michaelgugino force-pushed the machine-deletion-hooks branch from 6aba084 to 5cacc9f Compare July 29, 2020 13:39
@ncdc
Copy link
Contributor

ncdc commented Jul 29, 2020

/lgtm
/assign @CecileRobertMichon @vincepri @detiber

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 29, 2020

## Motivation

Allow custom and 3rd party components to easily interact with a machine or
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this mechanism meant to be an alternative to infrastructure provider mechanisms to do this where they don't exist? eg. Azure has scheduled events https://docs.microsoft.com/en-us/azure/virtual-machines/linux/scheduled-events

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. This is meant to prevent the machine-controller from performing certain actions at certain times, such as drain, or instructing the infrastructure provider to removing the VM.

@vincepri
Copy link
Member

vincepri commented Aug 3, 2020

/milestone v0.3.9

@k8s-ci-robot k8s-ci-robot modified the milestones: v0.3.x, v0.3.9 Aug 3, 2020
@ncdc
Copy link
Contributor

ncdc commented Aug 5, 2020

/retest

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 5, 2020
Copy link
Member

@vincepri vincepri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve
/hold

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: vincepri

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 5, 2020
@vincepri
Copy link
Member

vincepri commented Aug 5, 2020

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 5, 2020
@k8s-ci-robot k8s-ci-robot merged commit 60ce90a into kubernetes-sigs:master Aug 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants