Skip to content
This repository has been archived by the owner on Sep 12, 2023. It is now read-only.

Limit the number of restarts under ExitCode restartPolicy #167

Closed
goyalankit opened this issue Oct 8, 2021 · 5 comments · Fixed by #168
Closed

Limit the number of restarts under ExitCode restartPolicy #167

goyalankit opened this issue Oct 8, 2021 · 5 comments · Fixed by #168
Assignees

Comments

@goyalankit
Copy link
Contributor

In the current implementation, the runPolicy.BackoffLimit is only applicable on OnFailure and Always restart policies. I was wondering if there's a reason it's not supported on the ExitCode restart policy. Often in the case of OOM, the job exits with an exit code of 137 which is rightfully a retriable error. However, it will keep restarting indefinitely since the policy is not covered by the BackOffLimit. The behavior could be to honor the backOffLimit if present, else it keeps retrying indefinitely?

I am happy to submit a PR if you think this is a reasonable change.

@gaocegege
Copy link
Member

Hi @goyalankit . Thanks for the issue.

I think it makes sense, WDYT @kubeflow/wg-training-leads

@Jeffwan
Copy link
Member

Jeffwan commented Oct 9, 2021

I think this is reasonable improvement. @goyalankit Feel free to cut a PR and assign to us to help review

@johnugeorge
Copy link
Member

Thanks for this

@Jeffwan
Copy link
Member

Jeffwan commented Oct 18, 2021

Let's keep it open to track cherry-pick

/reopen

@google-oss-robot
Copy link

@Jeffwan: Reopened this issue.

In response to this:

Let's keep it open to track cherry-pick

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants