New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

docs: RFC for Customizable Node Finalizer #843

Closed

garvinp-stripe wants to merge 5 commits into kubernetes-sigs:main from garvinp-stripe:node-termination-hook

Contributor

garvinp-stripe commented Dec 6, 2023 •

edited

Loading

Fixes #743, #622, #740, #690, #5232
Description
RFC for adding customizable finalizer for nodes

How was this change tested?

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

garvinp-stripe added 4 commits

December 5, 2023 16:11


          Create node-termination-hook.md

83ce16a


          Update node-termination-hook.md

695a6b8


          custom Finalizer rfc

0ff3aa4


          lint

8bc5353

linux-foundation-easycla bot commented Dec 6, 2023 •

edited

Loading

The committers listed above are authorized under a signed CLA.

✅ login: garvinp-stripe (83ce16a, 695a6b8, 0ff3aa4, 8bc5353, 34a286e)

Contributor

k8s-ci-robot commented Dec 6, 2023

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: garvinp-stripe
Once this PR has been reviewed and has the lgtm label, please assign tzneal for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot requested review from jackfrancis and tallaxes

December 6, 2023 18:34

k8s-ci-robot added cncf-cla: no needs-ok-to-test labels

Contributor

k8s-ci-robot commented Dec 6, 2023

Hi @garvinp-stripe. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added size/M cncf-cla: yes and removed cncf-cla: no labels

njtran reviewed

View reviewed changes

designs/node-termination-hook.md

Contributor

njtran Dec 6, 2023

If I understand right, this is trying to circumvent the entire termination process in favor of the user running an entirely custom controller here. While I think this is a valid use-case, and definitely an increase in extensibility, I'm worried that this may lead to a lot of issues, and may not actually solve the issue for many of the linked issues, as the solution requires vending a full fledged controller of their own, rather than solving it natively.

Out of the listed issues, I think they're all applicable to the larger idea of "let me customize termination" barring #622 and aws/karpenter-provider-aws#5232. #622 will actually be handled by the nodeclaim disruption controller, which is two steps ahead of the termination controller in the process (nodeclaim disruption controller -> disruption controller -> termination controller). aws/5232 will be within the disruption controller candidacy, where we opt out nodes that have blocking PDBs from consideration.

My last concern is that there's a lot of interplay for the controllers that Karpenter vends, namely the three disruption related controllers and the provisioning controllers, that I think could fight with each other if the custom termination controller doesn't follow the same rules. For instance, you wouldn't be able to modify the existence of the NoSchedule Termination Taint, as the disruption controller will fight with you on it. We should think about how these controllers could continue to fight if we want to make this controller fully replaceable.

Contributor Author

garvinp-stripe Dec 8, 2023 •

edited

Loading

Agree that this could be problematic so I am happy to be wrong. I can see that there are a lot of interactions that occur during termination but I suspect there also some room for safely allowing users to perform certain actions.

First I think it make sense to change this design to just be the node finalizer rather node claim (correct me if I am wrong) simply because I don't think users should be changing nodeclaim deletion behavior.

Second, looking at the finalizer design from K8s, I wonder if breaking down the existing finalizer into multiple finalizer keys would make sense. For example,
https://github.com/kubernetes-sigs/karpenter/blob/main/pkg/controllers/node/termination/controller.go#L76
it seems like we:

Delete node claim
Taint
Drain
Delete node from EC2
Remove finalizer

If we break these up into different finalizers we can choose which ones are critical for Karpenter's functionally/ less messy if we rely on users to create a controller that follows the right rules vs ones that can be altered by users.

It seems to me that we can wrap Delete node claim + taint as a finalizer that must be applied to all nodes. However things like drain, delete node from EC2 can be added depending on if the users wants to or have custom control over?

So the node object would like something like

finalizer:
- karpenter.sh/nodeclaimtaint
- karpenter.sh/drain
- karpenter.sh/deleteNodeFromEC2

Once each step is completed, the corresponding finalizer is removed from the node object. We can order the process by ensuring that the prior action is completed by checking if that finalizer key is still on the node object. For example don't delete the node if you still see karpenter.sh/nodeclaimtaint or karpenter.sh/drain.

Contributor

njtran Dec 26, 2023

I'm not sure I see the value in splitting out one finalizer into many. If we need to add/remove/modify a step of the termination process, that means we need to change the list of well known finalizers, which would create some between-version incompatibilities (i.e. I add a termination feature in version v0.1 and remove it in v0.5, now all nodes created before v0.5 have an additional finalizer that Karpenter needs to be aware of).

Additionally, I think adding more finalizers will actually make this less extensible for the reason above. Adding more API surface onto the nodeclaim makes it tougher to validate/add newer features for the termination loop.

Lastly I think multiple finalizers would add a lot to the number of writes we're making to the APIServer. Let's say we have the 5 steps == 5 finalizers, each node would have 5 patch events, which would be pretty expensive, considering the current is 1 patch event.

designs/node-termination-hook.md

Comment on lines +64 to +66

+              During termination, Karpenter will not perform the finalization for both node claims and nodes but allow the user's custom controller to perform
+              the finalization of node and nodeclaim. We will allow users to set a grace termination period from [#834](https://github.com/kubernetes-sigs/karpenter/pull/834) where
+              if set the and breached then Karpenter's termination will occur ungracefully and force terminate both the node and node claim.

Contributor

njtran Dec 6, 2023

Setting a grace termination period would be up to the user who configures it, right? I don't think this doc needs to cover this, since the feature itself won't be prescribing this or adding this as a feature.

Contributor Author

garvinp-stripe Dec 14, 2023

Make sense can clean up

designs/node-termination-hook.md

+              ### Custom Finalizer behavior/ expectation
+              During termination, Karpenter will not perform the finalization for both node claims and nodes but allow the user's custom controller to perform
+              the finalization of node and nodeclaim. We will allow users to set a grace termination period from [#834](https://github.com/kubernetes-sigs/karpenter/pull/834) where
+              if set the and breached then Karpenter's termination will occur ungracefully and force terminate both the node and node claim.

Contributor

njtran Dec 6, 2023

also nit on the language here. We've been using forceful or involuntary here.

Suggested change

      
            if set the and breached then Karpenter's termination will occur ungracefully and force terminate both the node and node claim. 
          
            if set the and breached then Karpenter's termination will occur forcefully and force terminate both the node and node claim.

designs/node-termination-hook.md

+              ##Concerns
+              - I am not completely sure if allowing users have control over the finalizer of nodeclaim fully make sense so I am open to changing this. It could be that user defines
+              a custom finalizer just at the node level and when Karpenter addFinalizer on nodes it uses the custom finalizer rather than the one provided by Karpenter
+              - Alternatively Karpenter could provide an option where no node finalizer is set by Karpenter and that way it allows the users to inject their own finalizer without

Contributor

njtran Dec 6, 2023

I worry about this option. As a lot of our controllers rely on the existence of this finalizer to work properly. We'd have to add more checks in throughout the project and add more complexity if this was the way to go.

Contributor Author

garvinp-stripe Dec 7, 2023

When you say "this finalizer" is it the finalizer on nodeclaim or node?

Contributor

njtran Dec 26, 2023

Currently finalizers are added to both nodes and node claims, and they each have their own termination finalization flow.

jonathan-innis reviewed

View reviewed changes

designs/node-termination-hook.md

		@@ -0,0 +1,72 @@
		# Custom Node termination (custom finalizer)

		## Motivation

Member

jonathan-innis Dec 10, 2023

I'm curious why we think that this interface needs to be "pluggable" rather than supporting these features natively in Karpenter directly. I know we have linked out a few features here, but I'd be interested to hear the tangible ways that controllers are going to plug-in to this new system, so that we know that we are actually solving these use-cases if we were to implement this interface. IMO, a lot of these features make sense to be part of Karpenter natively, we just haven't had the time or resources to get some of them into the project. There's also some design thinking that needs to go into things like "cordon nodes after expiry."

I think there's two buckets of features that are being listed here: 1) Features that we know that we want to do, but need a little design and direction to determine the exact best way to move forward with it (#743, #622) and 2) Features that we're less sure that we want to pursue, with design that might touch both upstream and Karpenter (#740, #690, aws/karpenter-provider-aws#5232).

The other thing to mention, is that any controller today has the ability to plug an additional finalizer into the Node or the NodeClaim to add additional finalization logic. Granted, this won't stop the Karpenter finalizer from taking action, but you can layer additional things on top of the typical finalization flow.

Contributor Author

garvinp-stripe Dec 11, 2023 •

edited

Loading

TLDR: Can we modularize finalizer architecture to be more Singular Responsibility so contribution may be less complex/ less risky and users can attempt to take over parts they want to customize?

Good question, sorry I never answer similar ask from @njtran.
Initially , we were thinking about a way to notify operators that a node is stuck in Production rather than relying on a timeout to "do something" with the node that's stuck. In theory we could update Karpenter's finalizer to send a more clear event and we can capture that to notify our operators. However this begs the question if there are ways we can have better control over drain and node termination.

Specifically, I understand the desire for Karpenter to natively do more advance termination logic however it is difficult to do this right now because the termination logic is tightly wrapped into a single finalizer. I wonder if alternative approach could be breaking out the finalizer into "stages" and allowing contributors to contribute logic that can perform these stages differently?

I think the end goal for us isn't to deviate from Karpenter but rather we want certain features quicker and at the moment we can't contribute directly to Karpenter because we are still on v1alpha and we won't be on the latest until at least the new year

Contributor Author

garvinp-stripe Dec 22, 2023

Thinking about this more, I suspect my thinking is bias towards how we did things pre-Karpenter. Since we use ASGs, we rely heavily on lifecycle hooks which in a way I am trying to replace with finalizer. I think finalizer isn't completely the replacement and maybe it shouldn't be.

I think we could move a majority of the logic to be on a daemonset that interacts with host level processes since finalizer does wait for daemonsets to terminate. But this raise a question around graceful termination timeout. I see that the finalizer orders it termination sequence. If we want do move in that direction (termination lifecycle hook logic -> daemonset) we need our daemonset to gracefully terminate which means the graceful termination period will likely cause issues if user pods drain the full termination period. I think we (my team) may want to separate out finalizer between user application vs cluster operator applications graceful termination periods.

Does it make sense for finalizer to capture all these complexity? I am just trying to wrap my head around how to keep Karpenter's API surface clean but yet not be overly complex in what it manages because of user needs.

Contributor

njtran Dec 26, 2023

I think this RFC is prescribing a solution rather than tackling the problem as a whole, which I understand to be a custom drain flow. I think a custom finalizer could be one of the solutions here, but I think we're leaving out some considerations of other alternatives and an understanding of the use-cases here.

Here are some questions I find myself asking, which I'm happy to talk over slack if that makes the discussion easier:

What are the alternatives for API Spec here?
- A setting for enabling/disabling the termination controller response to finalizers?
- A new finalizer to indicate a custom drain flow?
- A setting to say which part of each termination action needs to be different?
Do all users who want custom termination flow have the ability to run a custom controller that handles the termination flow themselves? Is full extensibility the right answer?
- If users want all existing features of Karpenter, but a change in one part of the code, does a fully custom termination flow solve this issue for them, or would that be not worth the effort for them?
What termination controller features do you need implemented in Karpenter? Can we do any of those without adding an entirely new controller? Are any of those use-cases not covered in an issue?


          Merge branch 'main' into node-termination-hook

34a286e

coveralls commented Dec 15, 2023

Pull Request Test Coverage Report for Build 7215991368

0 of 0 changed or added relevant lines in 0 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage increased (+0.02%) to 80.088%

Totals
Change from base Build 7205850137:	0.02%
Covered Lines:	7678
Relevant Lines:	9587

💛 - Coveralls

github-actions bot commented Jan 10, 2024

This PR has been inactive for 14 days. StaleBot will close this stale PR after 14 more days of inactivity.

github-actions bot added the lifecycle/stale label

github-actions bot added the lifecycle/closed label

github-actions bot closed this

indra0007 mentioned this pull request

Emit event when pod eviction is blocked due to pod disruption budget #1599

Open

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes lifecycle/closed lifecycle/stale needs-ok-to-test size/M