Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update for "Retriable and non-retriable Pod failures for Jobs" #3646

Conversation

mimowo
Copy link
Contributor

@mimowo mimowo commented Nov 4, 2022

  • One-line PR description: Update to reflect decisions taken during the implementation phase

The changes compared to the initial plan for Beta:

  • do not introduce the ResourceExhausted condition (it was planned to be used for pods killed due to OOM killer or exceeding ephemeral storage limits)
  • do not add DisruptionTarget condition in case of admission failures
  • reason for the DisruptionTarget added by Kubelet is renamed from DeletionByKubelet to TerminationByKubelet
  • introduce a new metric called terminated_pods_tracking_finalizer_total instead of extending the existing one job_pods_finished_total
  • do not implement the logic to avoid DisruptionTarget override in case the status=True for the pre-existing condition, because in a scenario when two components add the condition the choice of the condition is arbitrary and without a difference for the job controller. This also eliminates the need to commonize of the logic to avoiding the pod condition update

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory labels Nov 4, 2022
@k8s-ci-robot k8s-ci-robot added the sig/apps Categorizes an issue or PR as relevant to SIG Apps. label Nov 4, 2022
@mimowo mimowo marked this pull request as draft November 4, 2022 15:56
@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 4, 2022
@mimowo mimowo force-pushed the handling-pod-failures-beta-kubelet-update branch from 7ad0bb9 to 9ba3595 Compare November 7, 2022 07:55
@mimowo mimowo marked this pull request as ready for review November 7, 2022 07:55
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 7, 2022
@mimowo mimowo force-pushed the handling-pod-failures-beta-kubelet-update branch 2 times, most recently from 8997606 to 47cb9de Compare November 7, 2022 08:10
@mimowo mimowo changed the title Update on changes for Beta for "Retriable and non-retriable Pod failures for Jobs" Update for "Retriable and non-retriable Pod failures for Jobs" Nov 7, 2022
@mimowo mimowo force-pushed the handling-pod-failures-beta-kubelet-update branch 3 times, most recently from 5acc235 to 2ed5980 Compare November 7, 2022 08:43
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 7, 2022
@mimowo mimowo force-pushed the handling-pod-failures-beta-kubelet-update branch 9 times, most recently from 3514b63 to b4c3029 Compare November 7, 2022 10:04
Copy link
Member

@alculquicondor alculquicondor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 8, 2022
@mimowo mimowo force-pushed the handling-pod-failures-beta-kubelet-update branch from 948f00a to 5efca00 Compare November 8, 2022 10:24
@mimowo mimowo force-pushed the handling-pod-failures-beta-kubelet-update branch from 5efca00 to e3f3fae Compare November 8, 2022 10:33
Copy link
Contributor

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 8, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mimowo, soltysh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 8, 2022
@k8s-ci-robot k8s-ci-robot merged commit 7e98617 into kubernetes:master Nov 8, 2022
@k8s-ci-robot k8s-ci-robot added this to the v1.26 milestone Nov 8, 2022
@mimowo mimowo deleted the handling-pod-failures-beta-kubelet-update branch March 18, 2023 19:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/apps Categorizes an issue or PR as relevant to SIG Apps. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants