Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP #77 Dynamically Sized Jobs #1851

Merged
merged 10 commits into from
Apr 3, 2024

Conversation

vicentefb
Copy link
Contributor

@vicentefb vicentefb commented Mar 15, 2024

What type of PR is this?

/kind documentation
/kind feature

What this PR does / why we need it:

KEP for Dynamically Sized Jobs

Which issue(s) this PR fixes:

Fixes #77

Special notes for your reviewer:

A WIP for Phase 1 can be found here: #1852

Does this PR introduce a user-facing change?


@k8s-ci-robot k8s-ci-robot added kind/documentation Categorizes issue or PR as related to documentation. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Mar 15, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @vicentefb. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 15, 2024
Copy link

netlify bot commented Mar 15, 2024

Deploy Preview for kubernetes-sigs-kueue ready!

Name Link
🔨 Latest commit d8f5ef0
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/660d99aae146b5000837ecce
😎 Deploy Preview https://deploy-preview-1851--kubernetes-sigs-kueue.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@tenzen-y
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Mar 15, 2024
@vicentefb
Copy link
Contributor Author


## Phases for MVP (alpha)

### Phase 1 - Scale Down
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WIP implementation for Phase 1: #1852

@alculquicondor
Copy link
Contributor

/release-note-none

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Mar 15, 2024
keps/77-dynamically-sized-jobs/README.md Show resolved Hide resolved
keps/77-dynamically-sized-jobs/README.md Show resolved Hide resolved
keps/77-dynamically-sized-jobs/README.md Show resolved Hide resolved
keps/77-dynamically-sized-jobs/README.md Outdated Show resolved Hide resolved
keps/77-dynamically-sized-jobs/README.md Outdated Show resolved Hide resolved
keps/77-dynamically-sized-jobs/README.md Outdated Show resolved Hide resolved
keps/77-dynamically-sized-jobs/README.md Outdated Show resolved Hide resolved
keps/77-dynamically-sized-jobs/README.md Outdated Show resolved Hide resolved
keps/77-dynamically-sized-jobs/README.md Show resolved Hide resolved
keps/77-dynamically-sized-jobs/README.md Outdated Show resolved Hide resolved
keps/77-dynamically-sized-jobs/README.md Outdated Show resolved Hide resolved
keps/77-dynamically-sized-jobs/README.md Outdated Show resolved Hide resolved
keps/77-dynamically-sized-jobs/README.md Outdated Show resolved Hide resolved
keps/77-dynamically-sized-jobs/README.md Show resolved Hide resolved
keps/77-dynamically-sized-jobs/README.md Outdated Show resolved Hide resolved
@alculquicondor
Copy link
Contributor

@astefanutti, is this something your team is still interested in?

keps/77-dynamically-sized-jobs/README.md Show resolved Hide resolved
keps/77-dynamically-sized-jobs/README.md Show resolved Hide resolved
keps/77-dynamically-sized-jobs/README.md Show resolved Hide resolved
## Design Details

### Workload Slices
To support horizontal scaling of jobs, we will introduce the concept of a "Workload Slice”. A Workload Slice is a Workload object with an owner reference to the original Workload for a job. Workload Slices represent per-replica changes to a job that were not initially accounted for when the job was created.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The constraints probably can be violated if a user creates the Workload slices by hand. Might be worth a note how we handle a mismatch.

Good point, we should document this in the KEP

I'm especially interested in the behavior with the prebuild feature.

keps/77-dynamically-sized-jobs/README.md Outdated Show resolved Hide resolved
keps/77-dynamically-sized-jobs/README.md Show resolved Hide resolved
keps/77-dynamically-sized-jobs/README.md Show resolved Hide resolved
keps/77-dynamically-sized-jobs/README.md Outdated Show resolved Hide resolved
keps/77-dynamically-sized-jobs/README.md Outdated Show resolved Hide resolved
keps/77-dynamically-sized-jobs/README.md Show resolved Hide resolved
Copy link
Contributor

@alculquicondor alculquicondor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have any further questions.

I'll leave the lgtm to @tenzen-y

/approve

keps/77-dynamically-sized-jobs/kep.yaml Outdated Show resolved Hide resolved
keps/77-dynamically-sized-jobs/README.md Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 2, 2024
@vicentefb vicentefb requested a review from tenzen-y April 2, 2024 21:36
Copy link
Member

@tenzen-y tenzen-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise lgtm.

I left a comment to clarify MultiKueue feature.

keps/77-dynamically-sized-jobs/README.md Outdated Show resolved Hide resolved
keps/77-dynamically-sized-jobs/README.md Outdated Show resolved Hide resolved
keps/77-dynamically-sized-jobs/kep.yaml Outdated Show resolved Hide resolved
keps/77-dynamically-sized-jobs/kep.yaml Show resolved Hide resolved
Copy link
Member

@tenzen-y tenzen-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!
/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 3, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 0b6b913a553f2022fc45dc516e7044467714352c

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alculquicondor, tenzen-y, vicentefb

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [alculquicondor,tenzen-y]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit e63709b into kubernetes-sigs:main Apr 3, 2024
14 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v0.7 milestone Apr 3, 2024
vsoch pushed a commit to researchapps/kueue that referenced this pull request Apr 18, 2024
* added kep

* kep updated

applied toc

* updated kep

* toc updated

* added info in unit tests and integration tests section

* added details about workload slices

* rephrase scale down section

* updated and added details on slices, generalized design details and typos

* update

* added details about mutikueue and removed users from approvers
akram pushed a commit to akram/kueue that referenced this pull request Oct 8, 2024
…#1851

- scale down
- patch error field not declared in schema
- commented out podSet immutability from workload webhook to be able to update that field
- added more comments
- clean code
- debugging
- patch error field not declared in schema
- cluster queue reconciliation fixed, it had to do with the infot totalrequests from admission
- inside the worklad go file
- working with scheduler
- intregation test, but it messes up with parallelism test which should be expected
- updated parallelism it test
- updated wrappers
- kep
- removed Kep
- removed log lines
- clean code
- added a better conditional for updating the resize if the job is a RayCluster
- added Kind condition
- updated test and equivalentToWorkload condition
- added podset assigments check
- updated feature gate
- updating equivalentWorkload
- fixed lint
- removed changes from scheduler and workload controller
- testing
- updated workload controller reconciler to update spec and status
- update feature gate
- update variables
- made code more generic
- updated workload controller helper method
- typo
- addressed comments
- updated workload controller to use unuused quota
- updated integration test to work
- added unit test in workload controller
- changed naming to resizeable and fixed lint
- nit
- addressed comments
kannon92 pushed a commit to openshift-kannon92/kubernetes-sigs-kueue that referenced this pull request Nov 19, 2024
* added kep

* kep updated

applied toc

* updated kep

* toc updated

* added info in unit tests and integration tests section

* added details about workload slices

* rephrase scale down section

* updated and added details on slices, generalized design details and typos

* update

* added details about mutikueue and removed users from approvers
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/documentation Categorizes issue or PR as related to documentation. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support dynamically sized (elastic) jobs
9 participants