Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reserve capacity #987

Closed
ilyatovbin-pp opened this issue Dec 14, 2021 · 12 comments
Closed

Reserve capacity #987

ilyatovbin-pp opened this issue Dec 14, 2021 · 12 comments
Labels
api Issues that require API changes feature New feature or request needs-design Design required v1.x Issues prioritized for post-1.0

Comments

@ilyatovbin-pp
Copy link

I want to be able to configure extra reservation capacity (nodes/cpu/memory) in the provisioner settings.
On one hand I have the ttlSecondsAfterEmpty option that scales down empty nodes, which is great. On the other hand, scale up takes some time. If I could keep a reserved instance or two (or any number I choose for that matter) to never scale down even if the node is empty, this will speed up pod autoscaling.

Our workload is unpredictable most of the time. at some random time of day, we might get massive dumps of data to process and need to scale up our pods as fast as possible (our service is provided in near real-time speeds).
New node scale-up takes a minute or two which is not very awesome in terms of "real-time".

at the moment we use a managed service for scaling nodes and they have a similar feature which reserves extra capacity and as soon as it's needed, it's released to us.

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@ilyatovbin-pp ilyatovbin-pp added the feature New feature or request label Dec 14, 2021
@ellistarn ellistarn added the api Issues that require API changes label Dec 14, 2021
@ellistarn
Copy link
Contributor

I think you might be referring to Spot Ocean headroom? In general, I'm open minded to this approach, but we need to be careful about implementing it in a way that fits naturally with Karpenter's scaleup/scaledown mechanisms.

The same mechanism as the cluster autoscaler's overprovisioning works with Karpenter, just in case you need something immediately.

Otherwise, I usually recommend:

  1. Overprovision at the pod level
  2. Optimize startup latency (we're doing a lot of work here)
  3. Worst case, configure pod level headroom (e.g. negative preemption)

@ellistarn ellistarn added the needs-design Design required label Dec 14, 2021
@ilyatovbin-pp
Copy link
Author

Yeah I was talking about Ocean. Option 3 actually sounds interesting, I'll check it out.
Still, something native might be better.

For example have an option like this:
reservedInstances: 2
reservedInstanceTypes: ['m5.xlarge','m4.xlarge']

Then if you don't have 2 empty nodes of these types, launch new ones, and if you want to scale down, make sure to keep 2 empty nodes of this type. This might be problematic though without using something like Descheduler, since not having 2 empty nodes doesn't mean you don't have enough capacity for extra pods.

anyway, just a thought.

@ellistarn
Copy link
Contributor

reservedInstances: 2
reservedInstanceTypes: ['m5.xlarge','m4.xlarge']

Which AZ? Spot or OD? There are many questions that make this challenging. To cover all use cases, you really need to know the eventual pod specs that you'll need and solve the constraints from there.

@typeBlkCofe
Copy link

This would feature would be great!
In case that the setting is Spot or OD then we use Spot the same as for regular provisioning.

@ellistarn
Copy link
Contributor

Mega issue: kubernetes-sigs/karpenter#749

@RobertNorthard
Copy link
Contributor

RobertNorthard commented Dec 8, 2022

I'm adding a comment to this issue, rather than opening a new feature request as it feels related.

I would also like to be able to specify an additional capacity buffer for a Karpenter provisioner, as a percentage. Karpenter maybe could then allocate extra headroom based on the current CPU/Memory scheduled. This would be instead of creating a dummy pod deployment (with a pause container), which has a lower PriorityClass. If we have multiple Karpenter provisoners we would need to create multiple dummy deployment to satisfy the constraints of the different provisoners. The buffer would probably need a constraint e.g. expand equally over AZs, or inject dummy pods as part of scheduling to add some extra headroom.

@gals-ma
Copy link

gals-ma commented Dec 13, 2022

I also wish to add a comment to this issue instead of creating a new one.

It would be great if Karpenter would be able to support headroom in order to support sudden traffic spikes.
The idea is that karpenter will always be able to reserve 'available' CPU / Memory units for times when traffic spikes so the headroom will be able to serve more pods without delays.

Thank you!

@billrayburn billrayburn added the v1.x Issues prioritized for post-1.0 label Apr 26, 2023
@afirth
Copy link

afirth commented May 5, 2023

https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#how-can-i-configure-overprovisioning-with-cluster-autoscaler documents solution 1 - pod level overprovisioning, and is pretty straightforward. Would recommend Karpenter document a similar solution if the dev work is difficult

@jonathan-innis
Copy link
Contributor

@afirth You're right that doing pod-level negative priority for an overprovisioning buffer isn't specified anywhere in our docs. Would you be open to opening a PR that added this information to our docs in the Advanced Scheduling Techniques Section

@seunggs
Copy link

seunggs commented May 8, 2023

Just to add another use case for this feature, I'm using Karpenter for ephemeral workloads like workflow jobs and having some pre-warmed nodes could make the user experience much better as the workflow tasks can be started immediately.

@ellistarn
Copy link
Contributor

Closing as a dup of kubernetes-sigs/karpenter#749.

#3240, kubernetes-sigs/karpenter#749, and #987 all have a ton of upvotes, but kubernetes-sigs/karpenter#749 has the most.

@afirth
Copy link

afirth commented May 9, 2023

@afirth You're right that doing pod-level negative priority for an overprovisioning buffer isn't specified anywhere in our docs. Would you be open to opening a PR that added this information to our docs in the Advanced Scheduling Techniques Section

@jonathan-innis Sorry no, I don't use Karpenter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api Issues that require API changes feature New feature or request needs-design Design required v1.x Issues prioritized for post-1.0
Projects
None yet
Development

No branches or pull requests

9 participants