-
Notifications
You must be signed in to change notification settings - Fork 262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dynamic Job Parallelism and Resource Scaling Based on Backlog Metrics #2964
Comments
Reading the ask, I’m not entirely sure Kueue is the right place for this. It sounds like you want metrics to influence elastic job scaling. AFAIK Kueue would help admit jobs based on dynamic scaling but I think the controller that looks at metrics and patches elastic jobs would probably be a separate CRD. I would think that this CRD would be separate from Kueue as it seems you want HPA at the job level. |
I'll leave the final decision on scope to @tenzen-y or @alculquicondor. |
Why not just use KEDA or HPA? I don't think Kueue is the right component to decide that. Still there are two things that we need to do in Kueue to improve the experience:
And in Kubernetes:
|
FYI the request to support dynamically scaled Jobs in Kueue: #77, it already has a KEP: https://github.com/kubernetes-sigs/kueue/tree/main/keps/77-dynamically-sized-jobs |
Yes, that's right. After we implement the feature, we may be able to use DynamicJob + Keda. |
What would you like to be added:
We would like to propose a new feature in Kueue that enables dynamic scaling of job parallelism and resource allocation (CPU, RAM, and pods) based on job backlog metrics and predefined formulas.
Idea: This feature would introduce a custom resource definition (CRD) that allows users to define scaling formulas and thresholds, which dynamically adjust the maximum parallelism and resource limits, similar to KEDA or HPA. A generic approach could be the exposing of the
/scale
subresource to have a generic interface.Why is this needed:
Currently, we are processing around 4.5 million jobs per day, and managing resource usage and costs is critical. There is a need for a mechanism that can dynamically limit or expand the maximum parallelism of jobs based on real-time backlog conditions. This would help ensure that jobs are processed efficiently without overcommitting resources or incurring unnecessary costs.
By introducing a formula-based approach to flavor resources, we can achieve a more granular and responsive system. For example, the system could increase the max CPU or RAM allocation as the admission backlog grows, ensuring that delays are minimized during high-load periods while conserving resources during low-demand times. This functionality is crucial for maintaining both performance and cost-effectiveness in large-scale Kubernetes environments.
This enhancement requires the following artifacts:
The artifacts should be linked in subsequent comments.
The text was updated successfully, but these errors were encountered: