EKS jobs not causing Karpenter to scale nodes #7355

oweng · 2024-11-08T16:03:40Z

Description

I've been looking through the docs, and maybe I am missing something, but we currently have all our node pools being scaled via Karpenter with no issues at all for deployments.
Recently we have started some Dagster deployments and when the data runs start up, they start 25 Batch Jobs. When this happens, they are all pinned to the single node in the node pool, and we don't see Karpenter scaling the nodes. Pod-wise, they all startup and immediately enter a running state it seems, and more or less the instance becomes unresponsive until they eventually finish their work.
Just wondering if there is something I am missing?

YuriFrayman · 2024-11-09T10:50:22Z

Take a look at alternative solutions cast.ai where you gained significant stability coupled with significant savings

gladiatr72 · 2024-11-13T21:17:49Z

Sounds like you haven't defined pod.spec.containers.resources.requests.cpu, or, if you have, you've seriously low-balled it. Set ...requests.cpu to 1 and see if that doesn't sort it out. I'm not familiar w/ Dagster but I'd also check the docs to determine how it configures its concurrency without explicit instructions. If it has such a knob, set it to a single worker (or set it how you like but also use that value for ...requests.cpu)

jmdeal · 2024-11-14T01:26:28Z

That definitely seems realistic, Karpenter is not responsible for scheduling nodes, kube-scheduler is. So if the pods successfully scheduled, that means Karpenter fulfilled it's purpose in ensuring enough capacity was available on the cluster to fulfill the pods' requests.

oweng added documentation Improvements or additions to documentation needs-triage Issues that need to be triaged labels Nov 8, 2024

jmdeal added triage/needs-information Marks that the issue still needs more information to properly triage and removed needs-triage Issues that need to be triaged labels Nov 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EKS jobs not causing Karpenter to scale nodes #7355

EKS jobs not causing Karpenter to scale nodes #7355

oweng commented Nov 8, 2024

YuriFrayman commented Nov 9, 2024

gladiatr72 commented Nov 13, 2024 •

edited

Loading

jmdeal commented Nov 14, 2024

EKS jobs not causing Karpenter to scale nodes #7355

EKS jobs not causing Karpenter to scale nodes #7355

Comments

oweng commented Nov 8, 2024

Description

YuriFrayman commented Nov 9, 2024

gladiatr72 commented Nov 13, 2024 • edited Loading

jmdeal commented Nov 14, 2024

gladiatr72 commented Nov 13, 2024 •

edited

Loading