Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle changing Job parallelism #467

Closed
2 of 3 tasks
alculquicondor opened this issue Dec 9, 2022 · 8 comments
Closed
2 of 3 tasks

Handle changing Job parallelism #467

alculquicondor opened this issue Dec 9, 2022 · 8 comments
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@alculquicondor
Copy link
Contributor

What would you like to be added:

Handle the change in parallelism. Before admission, this is simple: we need to update the workload.

What if the workload is already admitted? If there is enough quota, we could directly update the workload, even though that might violate FIFO semantics. Perhaps we will need to use scheduling gates to hold pods.

Why is this needed:

Parallelism is a mutable field.

Completion requirements:

This enhancement requires the following artifacts:

  • Design doc
  • API change
  • Docs update

The artifacts should be linked in subsequent comments.

@alculquicondor alculquicondor added the kind/feature Categorizes issue or PR as related to a new feature. label Dec 9, 2022
@alculquicondor
Copy link
Contributor Author

We could have a short term solution: handle scale down properly and disallow scale up via webhook.

@ahg-g
Copy link
Contributor

ahg-g commented Dec 15, 2022

Note that in the job controller, if parallelism changes, the job gets suspended and the workload object gets deleted, and so forcing re-queueing the job.

if *job.Spec.Parallelism != wl.Spec.PodSets[0].Count {

@mwielgus
Copy link
Contributor

mwielgus commented Jan 4, 2023

/assign @mwielgus

@ahg-g
Copy link
Contributor

ahg-g commented Jan 4, 2023

This is already handled, what else do we want to do with this one?

@ahg-g
Copy link
Contributor

ahg-g commented Jan 4, 2023

I think we want to think of this in the context of #77, not as a separate issue.

@alculquicondor
Copy link
Contributor Author

I had an offline discussion with @mwielgus and we agreed that it's probably not worth doing anything in this space until scheduling readiness gates are GA.

So yes, closing in favor of #77.

@alculquicondor
Copy link
Contributor Author

/close

@k8s-ci-robot
Copy link
Contributor

@alculquicondor: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

4 participants