-
Notifications
You must be signed in to change notification settings - Fork 957
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reserve capacity #987
Comments
I think you might be referring to Spot Ocean headroom? In general, I'm open minded to this approach, but we need to be careful about implementing it in a way that fits naturally with Karpenter's scaleup/scaledown mechanisms. The same mechanism as the cluster autoscaler's overprovisioning works with Karpenter, just in case you need something immediately. Otherwise, I usually recommend:
|
Yeah I was talking about Ocean. Option 3 actually sounds interesting, I'll check it out. For example have an option like this: Then if you don't have 2 empty nodes of these types, launch new ones, and if you want to scale down, make sure to keep 2 empty nodes of this type. This might be problematic though without using something like Descheduler, since not having 2 empty nodes doesn't mean you don't have enough capacity for extra pods. anyway, just a thought. |
Which AZ? Spot or OD? There are many questions that make this challenging. To cover all use cases, you really need to know the eventual pod specs that you'll need and solve the constraints from there. |
This would feature would be great! |
Mega issue: kubernetes-sigs/karpenter#749 |
I'm adding a comment to this issue, rather than opening a new feature request as it feels related. I would also like to be able to specify an additional capacity buffer for a Karpenter provisioner, as a percentage. Karpenter maybe could then allocate extra headroom based on the current CPU/Memory scheduled. This would be instead of creating a dummy pod deployment (with a pause container), which has a lower |
I also wish to add a comment to this issue instead of creating a new one. It would be great if Karpenter would be able to support headroom in order to support sudden traffic spikes. Thank you! |
https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#how-can-i-configure-overprovisioning-with-cluster-autoscaler documents solution 1 - pod level overprovisioning, and is pretty straightforward. Would recommend Karpenter document a similar solution if the dev work is difficult |
@afirth You're right that doing pod-level negative priority for an overprovisioning buffer isn't specified anywhere in our docs. Would you be open to opening a PR that added this information to our docs in the Advanced Scheduling Techniques Section |
Just to add another use case for this feature, I'm using Karpenter for ephemeral workloads like workflow jobs and having some pre-warmed nodes could make the user experience much better as the workflow tasks can be started immediately. |
Closing as a dup of kubernetes-sigs/karpenter#749. #3240, kubernetes-sigs/karpenter#749, and #987 all have a ton of upvotes, but kubernetes-sigs/karpenter#749 has the most. |
@jonathan-innis Sorry no, I don't use Karpenter. |
I want to be able to configure extra reservation capacity (nodes/cpu/memory) in the provisioner settings.
On one hand I have the ttlSecondsAfterEmpty option that scales down empty nodes, which is great. On the other hand, scale up takes some time. If I could keep a reserved instance or two (or any number I choose for that matter) to never scale down even if the node is empty, this will speed up pod autoscaling.
Our workload is unpredictable most of the time. at some random time of day, we might get massive dumps of data to process and need to scale up our pods as fast as possible (our service is provided in near real-time speeds).
New node scale-up takes a minute or two which is not very awesome in terms of "real-time".
at the moment we use a managed service for scaling nodes and they have a similar feature which reserves extra capacity and as soon as it's needed, it's released to us.
Community Note
The text was updated successfully, but these errors were encountered: