Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: RFC for Customizable Node Finalizer #843

Conversation

garvinp-stripe
Copy link
Contributor

@garvinp-stripe garvinp-stripe commented Dec 6, 2023

Fixes #743, #622, #740, #690, #5232
Description
RFC for adding customizable finalizer for nodes

How was this change tested?

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Copy link

linux-foundation-easycla bot commented Dec 6, 2023

CLA Signed

The committers listed above are authorized under a signed CLA.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: garvinp-stripe
Once this PR has been reviewed and has the lgtm label, please assign tzneal for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Dec 6, 2023
@k8s-ci-robot
Copy link
Contributor

Hi @garvinp-stripe. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Dec 6, 2023
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand right, this is trying to circumvent the entire termination process in favor of the user running an entirely custom controller here. While I think this is a valid use-case, and definitely an increase in extensibility, I'm worried that this may lead to a lot of issues, and may not actually solve the issue for many of the linked issues, as the solution requires vending a full fledged controller of their own, rather than solving it natively.

Out of the listed issues, I think they're all applicable to the larger idea of "let me customize termination" barring #622 and aws/karpenter-provider-aws#5232. #622 will actually be handled by the nodeclaim disruption controller, which is two steps ahead of the termination controller in the process (nodeclaim disruption controller -> disruption controller -> termination controller). aws/5232 will be within the disruption controller candidacy, where we opt out nodes that have blocking PDBs from consideration.

My last concern is that there's a lot of interplay for the controllers that Karpenter vends, namely the three disruption related controllers and the provisioning controllers, that I think could fight with each other if the custom termination controller doesn't follow the same rules. For instance, you wouldn't be able to modify the existence of the NoSchedule Termination Taint, as the disruption controller will fight with you on it. We should think about how these controllers could continue to fight if we want to make this controller fully replaceable.

Copy link
Contributor Author

@garvinp-stripe garvinp-stripe Dec 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that this could be problematic so I am happy to be wrong. I can see that there are a lot of interactions that occur during termination but I suspect there also some room for safely allowing users to perform certain actions.

First I think it make sense to change this design to just be the node finalizer rather node claim (correct me if I am wrong) simply because I don't think users should be changing nodeclaim deletion behavior.

Second, looking at the finalizer design from K8s, I wonder if breaking down the existing finalizer into multiple finalizer keys would make sense. For example,
https://github.com/kubernetes-sigs/karpenter/blob/main/pkg/controllers/node/termination/controller.go#L76
it seems like we:

  1. Delete node claim
  2. Taint
  3. Drain
  4. Delete node from EC2
  5. Remove finalizer

If we break these up into different finalizers we can choose which ones are critical for Karpenter's functionally/ less messy if we rely on users to create a controller that follows the right rules vs ones that can be altered by users.

It seems to me that we can wrap Delete node claim + taint as a finalizer that must be applied to all nodes. However things like drain, delete node from EC2 can be added depending on if the users wants to or have custom control over?

So the node object would like something like

finalizer:
- karpenter.sh/nodeclaimtaint
- karpenter.sh/drain
- karpenter.sh/deleteNodeFromEC2

Once each step is completed, the corresponding finalizer is removed from the node object. We can order the process by ensuring that the prior action is completed by checking if that finalizer key is still on the node object. For example don't delete the node if you still see karpenter.sh/nodeclaimtaint or karpenter.sh/drain.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I see the value in splitting out one finalizer into many. If we need to add/remove/modify a step of the termination process, that means we need to change the list of well known finalizers, which would create some between-version incompatibilities (i.e. I add a termination feature in version v0.1 and remove it in v0.5, now all nodes created before v0.5 have an additional finalizer that Karpenter needs to be aware of).

Additionally, I think adding more finalizers will actually make this less extensible for the reason above. Adding more API surface onto the nodeclaim makes it tougher to validate/add newer features for the termination loop.

Lastly I think multiple finalizers would add a lot to the number of writes we're making to the APIServer. Let's say we have the 5 steps == 5 finalizers, each node would have 5 patch events, which would be pretty expensive, considering the current is 1 patch event.

Comment on lines +64 to +66
During termination, Karpenter will not perform the finalization for both node claims and nodes but allow the user's custom controller to perform
the finalization of node and nodeclaim. We will allow users to set a grace termination period from [#834](https://github.com/kubernetes-sigs/karpenter/pull/834) where
if set the and breached then Karpenter's termination will occur ungracefully and force terminate both the node and node claim.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting a grace termination period would be up to the user who configures it, right? I don't think this doc needs to cover this, since the feature itself won't be prescribing this or adding this as a feature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense can clean up

### Custom Finalizer behavior/ expectation
During termination, Karpenter will not perform the finalization for both node claims and nodes but allow the user's custom controller to perform
the finalization of node and nodeclaim. We will allow users to set a grace termination period from [#834](https://github.com/kubernetes-sigs/karpenter/pull/834) where
if set the and breached then Karpenter's termination will occur ungracefully and force terminate both the node and node claim.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also nit on the language here. We've been using forceful or involuntary here.

Suggested change
if set the and breached then Karpenter's termination will occur ungracefully and force terminate both the node and node claim.
if set the and breached then Karpenter's termination will occur forcefully and force terminate both the node and node claim.

##Concerns
- I am not completely sure if allowing users have control over the finalizer of nodeclaim fully make sense so I am open to changing this. It could be that user defines
a custom finalizer just at the node level and when Karpenter addFinalizer on nodes it uses the custom finalizer rather than the one provided by Karpenter
- Alternatively Karpenter could provide an option where no node finalizer is set by Karpenter and that way it allows the users to inject their own finalizer without
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worry about this option. As a lot of our controllers rely on the existence of this finalizer to work properly. We'd have to add more checks in throughout the project and add more complexity if this was the way to go.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you say "this finalizer" is it the finalizer on nodeclaim or node?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently finalizers are added to both nodes and node claims, and they each have their own termination finalization flow.

@@ -0,0 +1,72 @@
# Custom Node termination (custom finalizer)

## Motivation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious why we think that this interface needs to be "pluggable" rather than supporting these features natively in Karpenter directly. I know we have linked out a few features here, but I'd be interested to hear the tangible ways that controllers are going to plug-in to this new system, so that we know that we are actually solving these use-cases if we were to implement this interface. IMO, a lot of these features make sense to be part of Karpenter natively, we just haven't had the time or resources to get some of them into the project. There's also some design thinking that needs to go into things like "cordon nodes after expiry."

I think there's two buckets of features that are being listed here: 1) Features that we know that we want to do, but need a little design and direction to determine the exact best way to move forward with it (#743, #622) and 2) Features that we're less sure that we want to pursue, with design that might touch both upstream and Karpenter (#740, #690, aws/karpenter-provider-aws#5232).

The other thing to mention, is that any controller today has the ability to plug an additional finalizer into the Node or the NodeClaim to add additional finalization logic. Granted, this won't stop the Karpenter finalizer from taking action, but you can layer additional things on top of the typical finalization flow.

Copy link
Contributor Author

@garvinp-stripe garvinp-stripe Dec 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TLDR: Can we modularize finalizer architecture to be more Singular Responsibility so contribution may be less complex/ less risky and users can attempt to take over parts they want to customize?

Good question, sorry I never answer similar ask from @njtran.
Initially , we were thinking about a way to notify operators that a node is stuck in Production rather than relying on a timeout to "do something" with the node that's stuck. In theory we could update Karpenter's finalizer to send a more clear event and we can capture that to notify our operators. However this begs the question if there are ways we can have better control over drain and node termination.

Specifically, I understand the desire for Karpenter to natively do more advance termination logic however it is difficult to do this right now because the termination logic is tightly wrapped into a single finalizer. I wonder if alternative approach could be breaking out the finalizer into "stages" and allowing contributors to contribute logic that can perform these stages differently?

I think the end goal for us isn't to deviate from Karpenter but rather we want certain features quicker and at the moment we can't contribute directly to Karpenter because we are still on v1alpha and we won't be on the latest until at least the new year

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this more, I suspect my thinking is bias towards how we did things pre-Karpenter. Since we use ASGs, we rely heavily on lifecycle hooks which in a way I am trying to replace with finalizer. I think finalizer isn't completely the replacement and maybe it shouldn't be.

I think we could move a majority of the logic to be on a daemonset that interacts with host level processes since finalizer does wait for daemonsets to terminate. But this raise a question around graceful termination timeout. I see that the finalizer orders it termination sequence. If we want do move in that direction (termination lifecycle hook logic -> daemonset) we need our daemonset to gracefully terminate which means the graceful termination period will likely cause issues if user pods drain the full termination period. I think we (my team) may want to separate out finalizer between user application vs cluster operator applications graceful termination periods.

Does it make sense for finalizer to capture all these complexity? I am just trying to wrap my head around how to keep Karpenter's API surface clean but yet not be overly complex in what it manages because of user needs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this RFC is prescribing a solution rather than tackling the problem as a whole, which I understand to be a custom drain flow. I think a custom finalizer could be one of the solutions here, but I think we're leaving out some considerations of other alternatives and an understanding of the use-cases here.

Here are some questions I find myself asking, which I'm happy to talk over slack if that makes the discussion easier:

  • What are the alternatives for API Spec here?
    • A setting for enabling/disabling the termination controller response to finalizers?
    • A new finalizer to indicate a custom drain flow?
    • A setting to say which part of each termination action needs to be different?
  • Do all users who want custom termination flow have the ability to run a custom controller that handles the termination flow themselves? Is full extensibility the right answer?
    • If users want all existing features of Karpenter, but a change in one part of the code, does a fully custom termination flow solve this issue for them, or would that be not worth the effort for them?
  • What termination controller features do you need implemented in Karpenter? Can we do any of those without adding an entirely new controller? Are any of those use-cases not covered in an issue?

@coveralls
Copy link

Pull Request Test Coverage Report for Build 7215991368

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.02%) to 80.088%

Totals Coverage Status
Change from base Build 7205850137: 0.02%
Covered Lines: 7678
Relevant Lines: 9587

💛 - Coveralls

Copy link

This PR has been inactive for 14 days. StaleBot will close this stale PR after 14 more days of inactivity.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 10, 2024
@github-actions github-actions bot closed this Jan 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lifecycle/closed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

support drain timeout
5 participants