-
Notifications
You must be signed in to change notification settings - Fork 204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Option on Provisioner to set a minimum number of on-demand nodes #702
Comments
Have you considered maintaining multiple provisioners one for spot and one for on-demand? |
Do you have native interruption handling enabled? If you do, in most cases Karpenter should have enough time to spin-up replacement nodes before the capacity gets completely reclaimed by EC2.
There's an issue (#2050) covering manual node provisioning right now, so we're definitely hoping that we can have a way to deploy a static number of nodes in the future. Feel free to +1 that issue and add the context around your use-case. |
I have quickly thought about it, but I'm not sure how it would solve this. From my understanding, at best it would randomly choose a on-demand node, but it wouldn't enforce it. |
Closing as a dup of #749 |
Tell us about your request
I have a k8s deployment that currently benefits from spot instances using a karpenter Provisioner with both
spot
andon-demand
capacity types . However for reliability reasons, I would like to set a minimum of X nodes on-demand, even when there is spot availability.Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
My k8s deployment is composed of ML inference nodes that take 5 to 8 minutes to spin up.
Let's take a scenario where 100% of the deployment is made of spot instances. In case of a massive spot interruption from AWS (it happens sometimes), all the nodes in the deployment might go down, and no pods will serve requests until the
on-demand
fallback nodes spin up (which takes up to 8 minutes).Are you currently working around this issue?
We're currently crossing fingers that at least 1 spot instance node will remain in case of a massive spot interruption.
Additional Context
No response
Attachments
No response
Community Note
The text was updated successfully, but these errors were encountered: