-
Notifications
You must be signed in to change notification settings - Fork 262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP #77 Dynamically Sized Jobs #1851
Conversation
Hi @vicentefb. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
✅ Deploy Preview for kubernetes-sigs-kueue ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
/ok-to-test |
|
||
## Phases for MVP (alpha) | ||
|
||
### Phase 1 - Scale Down |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WIP implementation for Phase 1: #1852
bc85dbb
to
e000b9c
Compare
/release-note-none |
@astefanutti, is this something your team is still interested in? |
## Design Details | ||
|
||
### Workload Slices | ||
To support horizontal scaling of jobs, we will introduce the concept of a "Workload Slice”. A Workload Slice is a Workload object with an owner reference to the original Workload for a job. Workload Slices represent per-replica changes to a job that were not initially accounted for when the job was created. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The constraints probably can be violated if a user creates the Workload slices by hand. Might be worth a note how we handle a mismatch.
Good point, we should document this in the KEP
I'm especially interested in the behavior with the prebuild feature.
applied toc
c4708da
to
9af7511
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise lgtm.
I left a comment to clarify MultiKueue feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
/lgtm
/approve
LGTM label has been added. Git tree hash: 0b6b913a553f2022fc45dc516e7044467714352c
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alculquicondor, tenzen-y, vicentefb The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
* added kep * kep updated applied toc * updated kep * toc updated * added info in unit tests and integration tests section * added details about workload slices * rephrase scale down section * updated and added details on slices, generalized design details and typos * update * added details about mutikueue and removed users from approvers
…#1851 - scale down - patch error field not declared in schema - commented out podSet immutability from workload webhook to be able to update that field - added more comments - clean code - debugging - patch error field not declared in schema - cluster queue reconciliation fixed, it had to do with the infot totalrequests from admission - inside the worklad go file - working with scheduler - intregation test, but it messes up with parallelism test which should be expected - updated parallelism it test - updated wrappers - kep - removed Kep - removed log lines - clean code - added a better conditional for updating the resize if the job is a RayCluster - added Kind condition - updated test and equivalentToWorkload condition - added podset assigments check - updated feature gate - updating equivalentWorkload - fixed lint - removed changes from scheduler and workload controller - testing - updated workload controller reconciler to update spec and status - update feature gate - update variables - made code more generic - updated workload controller helper method - typo - addressed comments - updated workload controller to use unuused quota - updated integration test to work - added unit test in workload controller - changed naming to resizeable and fixed lint - nit - addressed comments
* added kep * kep updated applied toc * updated kep * toc updated * added info in unit tests and integration tests section * added details about workload slices * rephrase scale down section * updated and added details on slices, generalized design details and typos * update * added details about mutikueue and removed users from approvers
What type of PR is this?
/kind documentation
/kind feature
What this PR does / why we need it:
KEP for Dynamically Sized Jobs
Which issue(s) this PR fixes:
Fixes #77
Special notes for your reviewer:
A WIP for Phase 1 can be found here: #1852
Does this PR introduce a user-facing change?