-
Notifications
You must be signed in to change notification settings - Fork 477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Machine Deletion Hooks Proposal #862
Add Machine Deletion Hooks Proposal #862
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for putting this PR together. I added a few notes/questions on accountability and the durability of using an annotation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, i think this reads well.
my only suggestion might be a mention of the existing alert we have for MachineNotYetDeleted, and how this will continue to operate even if a user has opted-in to the hooks. i think it's good to be explicit that using these hooks will not sidestep the alerting we have in place currently.
Proposal lgtm and it's also already in capi. Though I'm only aware of a use case where we'll want to leverage this i.e control plane automated management and I'm not sure why we could use simply a finaliser? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of aligning with the upstream API on this.
to just using annotations in the first place. | ||
|
||
### Spec Field | ||
We probably don’t want other controllers dynamically adding and removing spec fields on an object. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't tell if this is a concern about granting the permission to another controller or a claim that server-side-apply doesn't allow for "I want this field to exist in spec" (I think it does).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
setting annotations requires spec update powers, so I don't think you're gaining anything rbac-wise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll expand this once I've double checked the upstream conversation that happened around this, but as I mentioned in another comment, I think this was more about trying to avoid mixing controller owned fields into the spec which is normally owned by a user. Will get back to this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll expand this once I've double checked the upstream conversation that happened around this, but as I mentioned in another comment, I think this was more about trying to avoid mixing controller owned fields into the spec which is normally owned by a user. Will get back to this
I'm interested in seeing a link from here. Given that server-side-apply is based on per-field ownership, a subdivided spec is tractable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking through the upstream conversation on the enhancement PR (I haven't found the original google doc yet):
Part of the reason to use annotations is that the keys have to be unique, so it's less likely that there would be conflicts between different controllers who are dealing with the hooks. (kubernetes-sigs/cluster-api#3132 (comment))
Another comment specifically on this title is leading to a conversation about possibly making this a spec field in a future iteration, though I don't think anything has been actioned from that (kubernetes-sigs/cluster-api#3132 (comment))
Some more concern about the fact that using annotations is unversioned later in the thread, but agreement that this is ok for an alpha API. Interestingly, the CAPI project is looking to move to beta in the next few weeks, there has been no further conversation around this annotation API, I'll poke them to remind them. (kubernetes-sigs/cluster-api#3132 (comment))
If we can convince the upstream community to revisit how this mechanism works and then it moves to a spec field, then that's great, I think that would work well for us, I'm just a little concerned about the timing here. I'd like to provide something for the control plane scaling project to use, preferably with the same API as CAPI so that there's an easy conversion in the future. Perhaps we can implement the annotations temporarily and then move to a spec before we cut a release, assuming we can upstream consensus 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still isn't clear to me. Ownership in the spec can be separated, it's what server-side-aply is for. In addition, the permissions required to modify annotations are identical to permissions required to modify spec, so I don't see a benefit permissions-wise.
Duplicates are prevented by setting the appropriate list merge key and enforcing as we do for other fields.
Conflicts can be managed via server-side apply, patches, or (default) conditional updates. Using a spec field increases expressivity.
Using an annotation for evaluation guarantees that there will be a migration later when a real field is developed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is reading really well to me, seems like there are just a couple open questions that need to be resolved:
I've pushed in my last commit the updates to the details in the summary, finalizers and the status reporting stuff. I've also take a look at the upstream conversation and left some feedback here. Upstream are still using annotations for the moment, and it looks like they had already considered changing how these would work in a later API, but they haven't got to that point for now. So it makes it difficult for us, due to wanting to try to keep parity, should we keep the parity now and deal with the API change later, or do we try and guess what they new API might look like and deal with a potential delta later? |
One benefit of keeping parity is that whatever upgrade mechanism they develop upstream could be used downstream, too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with this version of the proposal.
The activity around this feature has been very scarce to my awareness. It was only mentioned it here kubernetes-sigs/cluster-api#4707 and we decided to go with core support. I'm personally not aware of any use cases where this is being leveraged (though that doesn't mean there are any). I think the problem that is motivating this proposal could be scoped and solved with finalisers. But if we are going with a more generic solution with wider scope we should share our vision for the long term and (re-)drive the implementation effort upstream. To this end using the current existing annotations and consolidate back seems reasonable to me. This might be a good starting place to bring this up kubernetes-sigs/cluster-api#5175 |
The only place I'm aware of it being used today upstream is within their KubeADM control plane implementation which uses the mechanism to protect etcd, much as intended within OpenShift.
I think it may also be worth starting an email thread to gather feedback and see if there are other people using it. With that feedback we can start to work out what we want the API to look like in the future. I can get a thread going to gather this feedback |
Have posted a thread in the cluster lifecycle mailing list requesting feedback about how the upstream feature has been used, let's see if we get any interesting notes added in here https://groups.google.com/g/kubernetes-sig-cluster-lifecycle/c/UkautWVDVKM |
Some feedback from the community about how they are using this feature upstream:
And
So it seems that some people are using the feature in the upstream at least to be more careful about how their machines are removed, and in particular, the use case about checking for other machines going away to minimise disruptions seems quite interesting. I guess when the drain starts there is nothing to guarantee that other deleted machines have already been cordoned. That aside, I'm not sure if there's anything else left for me to address here, is there anything blocking from anyone for this or are we happy to go ahead with the annotation approach until we work out a more formal API with the upstream? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this reads really well to me, no criticisms or suggestions
/lgtm
Keen to move this enhancement forward as there hasn't been any further comment recently. Anyone on the owners list happy to add their approval? |
This is to implement openshift/enhancements#862 as it currently stands. These consts will be used as annotation key prefixes on machines as an OpenShift internal API for the time being. Once the upstream creates their new API for this feature, we will copy their changes and work on a migration path between the internal annotation API and the first class API
I have now started the implementation work for this in case anyone would like to review, it comes as three PRs for the moment:
|
thanks again Joel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using annotation instead of structured fields is not recommended because of
- lack of expressivity
- lack of versioning for future changes
- guaranteed migration pain during API evolution
You can see the history of pain we had in kube proper here https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api_changes.md#new-field-in-existing-api-version
The Machine status will contain conditions identifying why the Machine controller | ||
has not progressed in removing the Machine. It is expected that hook implementing | ||
controllers should signal to users when they are having issues via some mechanism | ||
(eg. their `ClusterOperator` status) and that the conditions on the Machine should |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The clusteroperator status isn't likely to be an entry point given the cardinality differences between the hook provider and the number of machines.
Is there any mechanism to be included in the machine status. These hooks are pretty deep and condition names can be scope limited. What about encouraging these hooks to place a condition aligned to their name when they fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A well behaved operator could update the Machine status if they so desired, there's nothing to prevent them from doing so. We have a Conditions
list which fits the normal pattern, so they could introduce their own condition. Our controllers will leave unrecognised conditions alone so this should be safe.
Encouraging a condition specific to the operator seems reasonable. I guess we were thinking of the etcd operator use case as an example, the issue would be that the etcd operator can't remove the hook because the etcd operator is waiting for the etcd member to be safe to be removed from the cluster, would it not also make sense to report issues with this process at the etcd operator level as well?
DeletionTimestamp or enters the "Deleting" phase. | ||
|
||
Fine-tuned coordination is not possible at this time; eg, it's not | ||
possible to execute a pre-terminate hook only after a node has been drained. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't obvious to me. Why can a hook implementing controller not detect that the node has been drained?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not impossible, but I guess it's more difficult. Today we don't annotate or signify on a machine anywhere that we have completed a drain operation. We add a taint when we start, but the drain could fail for a period due to PDBs. To be sure the drain has been completed, the HIC would have to, today at least, identify the same set of pods that the machine controller is removing, and work out if that list is non-zero.
This could of course be better expressed by a Machine condition where we mark the machine drained once we think it's drained.
951c539
to
cab6bd7
Compare
Owner string `json:"owner"` | ||
} | ||
``` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we please flesh out the new conditions types and reasons here?
When errors occur, hook implementing controllers are encouraged to add a condition | ||
to Machines with details of the error. The condition `Type` should be formad as the | ||
hook name, prefixed by the lifecycle point. For example, for a `preDrain` hook with name | ||
`MigrateImportantApp`, the condition `Type` would be `PreDrainMigrateImportantApp`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PreDrainMigrateImportantAppSucceeded
? So there's a clear meaning to true an false.
// Owner defines the owner of the lifecycle hook. | ||
// This should be descriptive enough so that users can identify | ||
// who/what is responsible for blocking the lifecycle. | ||
// This could be the name of a controller or an administrator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: example with clusteroperator/etcd
perhaps?
api, failure handling, and debugging lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
content looks good to me, i made some suggestions to help with consistency and also calling out a few mentions of annotations. thanks Joel!
### Deletion Phase | ||
Describes when a machine has been marked for deletion but is still present | ||
in the API. Various actions happen during this phase, such as draining a node, | ||
deleting an instance from a cloud provider, and deleting the node object. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nit for consistency (we seem to use "cloud provider" or "infrastructure provider" interchangeably)
deleting an instance from a cloud provider, and deleting the node object. | |
deleting an instance from an infrastructure provider, and deleting the node object. |
|
||
## Summary | ||
|
||
Defines a set of lifecycle hookos that can be applied to a Machine which can be used |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nit
Defines a set of lifecycle hookos that can be applied to a Machine which can be used | |
Defines a set of lifecycle hooks that can be applied to a Machine which can be used |
### Implementation Details/Notes/Constraints | ||
|
||
For each defined lifecycle point, one or more hooks may be applied within the spec | ||
of the machine object. These annotations will pause reconciliation of a Machine |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still referencing annotations here
``` | ||
|
||
Hooks defined at this point will prevent the machine-controller from | ||
removing/terminating the instance in the cloud provider until the hooks are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nit for consistency
removing/terminating the instance in the cloud provider until the hooks are | |
removing/terminating the instance in the infrastructure provider until the hooks are |
removed. | ||
|
||
"pre-terminate" has been chosen over "pre-delete" because "terminate" is more | ||
easily associated with an instance being removed from the cloud or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we could just drop "cloud" here
easily associated with an instance being removed from the cloud or | |
easily associated with an instance being removed from the |
* Coordinate with other Hook Implementing Controllers through any means | ||
possible, such as using common annotations, CRDs, etc. For example, one hook | ||
controller could set an annotation indicating it has finished its work, and | ||
another hook controller could wait for the presence of the annotation before |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still referencing annotations here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this one is a legitimate reference to an annotation, WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, sorry, my confusion
cab6bd7
to
c7c769a
Compare
/override ci/prow/markdownlint The linter is complaining of missing fields that were added to the template after this PR was majorly reviewed. The new sections won't add much for this proposal so I'm not going to add them at this late stage. I think this is ready to merge now as all of the feedback has been addressed and the API is updated and has been approved by David (not sure how to dismiss the change requested block) The PRs linked in #862 (comment) are now also up to date for this version of the proposal. @elmiko @enxebre Could you add the LGTM and Approve labels respectively if you are happy with the final changes |
@JoelSpeed: Overrode contexts on behalf of JoelSpeed: ci/prow/markdownlint In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this reads well to me, thanks for all the updates @JoelSpeed
/lgtm
This creates a tight coupling between the Machine controller and other components within OpenShift. | ||
The upstream solution today, matches the proposal set out in this document. As we are experimenting | ||
with Cluster API in HyperShift, Central Machine Management and for future IPI clusters, it is | ||
preferred to not create a tight coupling with these components and to copy the upstream solution to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify, none of these fields exists upstream today, though nothing preclude us from proposing this approach. A hook mechanism is to be discussed as part of kubernetes-sigs/cluster-api#5556 (comment)
I still think this could have been hidden to users by using a particular finalizer to satisfy the main known and pursued use case which is the smooth etcd removal (which also it'd be nice to elaborate in this proposal) so we could keep API changes small without exposing more changes to users given the plans already to introduce CAPI. Nevertheless I think this field strategy is also solid and provides a flexibility that may enable other use cases so I'm not oppose to it. Let's implement it and revisit later if necessary. /lgtm |
Thanks. With the API changes and condition specification, I like this. /approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: deads2k, hexfusion The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/override ci/prow/markdownlint Based on above comments, the new headings were added late in the review process for this enahncement |
@JoelSpeed: Overrode contexts on behalf of JoelSpeed: ci/prow/markdownlint In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This proposal was originally implemented upstream by @michaelgugino in the Cluster API project.
We would like to include this downstream to allow other controllers to hook into the Machine removal process.
In particular, this will be used by the etcd team to prevent the Machine controller deleting control plane machines when etcd is in a bad state.
CC @enxebre @elmiko @michaelgugino @alexander-demichev for review from the Machine API/Cluster API side
CC @hexfusion @jeremyeder as this relates to the managed control plane instances work