-
Notifications
You must be signed in to change notification settings - Fork 262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make backoffBaseSeconds default consistent with timeout in waitForPodsReady #2215
Comments
SGTM |
This depends on the cloud provider. Usually somewhere between 1-3 minutes. Within 5 minutes the nodes should be up and pods running unless there is limited availability of the requested resources. |
I would put it to 1 minute and only for v0.7, where it's configurable. |
I agree with both suggestions.
|
/trasc |
/assign |
What would you like to be cleaned:
Increase the default for waitForPodsReady to 5-10 minutes.
Why is this needed:
Currently Kueue retries to run a workload almost immediately after it failed to start it for 5 minutes. If the situation was not favorable for the workload for 5 minutes (there was no space in the cluster, cluster autoscaler failed to deliver nodes, the workload is crashing immediately after start etc) it is unlikely to improve just after 10 or 20 seconds.
cc: @tenzen-y @mimowo
The text was updated successfully, but these errors were encountered: