-
Notifications
You must be signed in to change notification settings - Fork 262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support dynamically sized (elastic) jobs #77
Comments
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
/lifecycle frozen |
This will be easier to support with https://github.com/kubernetes/enhancements/tree/master/keps/sig-scheduling/3521-pod-scheduling-readiness |
I am interested in working on this -- this probably needs some sort of design doc, will work with @alculquicondor and see if I can put something together in the next few weeks /assign |
Hi @andrewsykim! Is there any progress? |
@tenzen-y I was planning to work on this in a couple weeks during the holiday season, but feel free to start working on this if you're interested. |
@andrewsykim Thanks. I also don't have enough time now. So, when I can get enough time, I will ask for progress again. |
FYI @vicentefb and I are working on a proposal in a google doc, we will share it here soon when it's ready |
* added kep * kep updated applied toc * updated kep * toc updated * added info in unit tests and integration tests section * added details about workload slices * rephrase scale down section * updated and added details on slices, generalized design details and typos * update * added details about mutikueue and removed users from approvers
/reopen |
@tenzen-y: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
* added kep * kep updated applied toc * updated kep * toc updated * added info in unit tests and integration tests section * added details about workload slices * rephrase scale down section * updated and added details on slices, generalized design details and typos * update * added details about mutikueue and removed users from approvers
* added kep * kep updated applied toc * updated kep * toc updated * added info in unit tests and integration tests section * added details about workload slices * rephrase scale down section * updated and added details on slices, generalized design details and typos * update * added details about mutikueue and removed users from approvers
We should have a clear path towards support spark and other dynamically sized jobs. Another example of this is Ray.
One related aspect is to support dynamically updating the resource requirements of a workload, we can probably limit that to support changing the count of a PodSet in QueuedWorkload (in Spark, the number of workers could change during the runtime of the job, but not the resource requirements of a worker).
One idea is to model it in a way similar to "in-place update to pod resources" [1], but in our case it would be the count that is mutable. The driver pod in spark would be watching for the corresponding QueuedWorkload instance and adjusts the number of workers when the new count is admitted.
[1] https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/1287-in-place-update-pod-resources
The text was updated successfully, but these errors were encountered: