-
Notifications
You must be signed in to change notification settings - Fork 262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WaitForPodsReady: a mode where jobs don't block the queue head #610
Comments
We probably would start optimistically admitting every workload and then setting some kind of backoff when resources are unavailable. Should the backoff be per flavor? Note: not expecting an answer... just dumping my current open questions :) |
Hi, I think we can do two more things to help solve the problem:
How do you think? @alculquicondor @ahg-g |
/assign |
Hi @KunWuLuan,
I think, what we need to, is to investigate the effect of dropping the kueue/pkg/scheduler/scheduler.go Lines 179 to 189 in 9ca57c8
and continue from there. |
Hi, @trasc I think you are right. 💯 Moreover, maybe we can add a switch to let user choose whether to block the admission while still waiting for pods ready. kueue/pkg/scheduler/scheduler.go Lines 178 to 189 in 9ca57c8
. Then the other jobs can continue being admitted until resources are exhausted. WDYT? @alculquicondor If you have time, you can also participate in the discussion, which will be of great help. Thank you very much. 😆 👍 |
@KunWuLuan thanks for your feedback. I'm currently with limited availability as I'm attending kubecon. I'll get back to this thread next week. But in general, this feature should be optional. |
/close |
@alculquicondor: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What would you like to be added:
A mode of operation for WaitForPodsReady where jobs don't block the head of the queue, but still get suspended if they aren't ready after a while.
Why is this needed:
Blocking the queue until a Job is ready guarantees all-or-nothing scheduling, but it is slow at scale. Consider the case where a large number of jobs are awaiting to be scheduled and suddenly lots of resources become available (e.g., a large job finishes, releasing significant amount of resources).
Completion requirements:
This enhancement requires the following artifacts:
The artifacts should be linked in subsequent comments.
The text was updated successfully, but these errors were encountered: