Skip to content

Commit

Permalink
Merge pull request #37242 from mimowo/retriable-pod-failures-beta
Browse files Browse the repository at this point in the history
Promote "Retriable and non-retriable pod failures for Jobs" to Beta
  • Loading branch information
k8s-ci-robot authored Nov 18, 2022
2 parents b8fc810 + 1e4a160 commit bfe7bd6
Show file tree
Hide file tree
Showing 4 changed files with 22 additions and 16 deletions.
8 changes: 6 additions & 2 deletions content/en/docs/concepts/workloads/controllers/job.md
Original file line number Diff line number Diff line change
Expand Up @@ -290,6 +290,10 @@ starts a new Pod. This means that your application needs to handle the case whe
pod. In particular, it needs to handle temporary files, locks, incomplete output and the like
caused by previous runs.

By default, each pod failure is counted towards the `.spec.backoffLimit` limit,
see [pod backoff failure policy](#pod-backoff-failure-policy). However, you can
customize handling of pod failures by setting the Job's [pod failure policy](#pod-failure-policy).

Note that even if you specify `.spec.parallelism = 1` and `.spec.completions = 1` and
`.spec.template.spec.restartPolicy = "Never"`, the same program may
sometimes be started twice.
Expand Down Expand Up @@ -694,7 +698,7 @@ mismatch.

### Pod failure policy {#pod-failure-policy}

{{< feature-state for_k8s_version="v1.25" state="alpha" >}}
{{< feature-state for_k8s_version="v1.26" state="beta" >}}

{{< note >}}
You can only configure a Pod failure policy for a Job if you have the
Expand All @@ -703,7 +707,7 @@ enabled in your cluster. Additionally, it is recommended
to enable the `PodDisruptionConditions` feature gate in order to be able to detect and handle
Pod disruption conditions in the Pod failure policy (see also:
[Pod disruption conditions](/docs/concepts/workloads/pods/disruptions#pod-disruption-conditions)). Both feature gates are
available in Kubernetes v1.25.
available in Kubernetes {{< skew currentVersion >}}.
{{< /note >}}

A Pod failure policy, defined with the `.spec.podFailurePolicy` field, enables
Expand Down
14 changes: 11 additions & 3 deletions content/en/docs/concepts/workloads/pods/disruptions.md
Original file line number Diff line number Diff line change
Expand Up @@ -229,12 +229,17 @@ can happen, according to:

## Pod disruption conditions {#pod-disruption-conditions}

{{< feature-state for_k8s_version="v1.25" state="alpha" >}}
{{< feature-state for_k8s_version="v1.26" state="beta" >}}

{{< note >}}
In order to use this behavior, you must enable the `PodDisruptionConditions`
If you are using an older version of Kubernetes than {{< skew currentVersion >}}
please refer to the corresponding version of the documentation.
{{< /note >}}

{{< note >}}
In order to use this behavior, you must have the `PodDisruptionConditions`
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
in your cluster.
enabled in your cluster.
{{< /note >}}

When enabled, a dedicated Pod `DisruptionTarget` [condition](/docs/concepts/workloads/pods/pod-lifecycle/#pod-conditions) is added to indicate
Expand All @@ -254,6 +259,9 @@ indicates one of the following reasons for the Pod termination:
`DeletionByPodGC`
: Pod, that is bound to a no longer existing Node, is due to be deleted by [Pod garbage collection](/docs/concepts/workloads/pods/pod-lifecycle/#pod-garbage-collection).

`TerminationByKubelet`
: Pod has been terminated by the kubelet, because of either {{<glossary_tooltip term_id="node-pressure-eviction" text="node pressure eviction">}} or the [graceful node shutdown](/docs/concepts/architecture/nodes/#graceful-node-shutdown).

{{< note >}}
A Pod disruption might be interrupted. The control plane might re-attempt to
continue the disruption of the same Pod, but it is not guaranteed. As a result,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,8 @@ For a reference to old feature gates that are removed, please refer to
| `InTreePluginvSphereUnregister` | `false` | Alpha | 1.21 | |
| `IPTablesOwnershipCleanup` | `false` | Alpha | 1.25 | |
| `JobMutableNodeSchedulingDirectives` | `true` | Beta | 1.23 | |
| `JobPodFailurePolicy` | `false` | Alpha | 1.25 | - |
| `JobPodFailurePolicy` | `false` | Alpha | 1.25 | 1.25 |
| `JobPodFailurePolicy` | `true` | Beta | 1.26 | |
| `JobReadyPods` | `false` | Alpha | 1.23 | 1.23 |
| `JobReadyPods` | `true` | Beta | 1.24 | |
| `KubeletCredentialProviders` | `false` | Alpha | 1.20 | 1.23 |
Expand Down Expand Up @@ -150,7 +151,8 @@ For a reference to old feature gates that are removed, please refer to
| `PodAndContainerStatsFromCRI` | `false` | Alpha | 1.23 | |
| `PodDeletionCost` | `false` | Alpha | 1.21 | 1.21 |
| `PodDeletionCost` | `true` | Beta | 1.22 | |
| `PodDisruptionConditions` | `false` | Alpha | 1.25 | - |
| `PodDisruptionConditions` | `false` | Alpha | 1.25 | 1.25 |
| `PodDisruptionConditions` | `true` | Beta | 1.26 | |
| `PodHasNetworkCondition` | `false` | Alpha | 1.25 | |
| `ProbeTerminationGracePeriod` | `false` | Alpha | 1.21 | 1.21 |
| `ProbeTerminationGracePeriod` | `false` | Beta | 1.22 | 1.24 |
Expand Down
10 changes: 1 addition & 9 deletions content/en/docs/tasks/job/pod-failure-policy.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ min-kubernetes-server-version: v1.25
weight: 60
---

{{< feature-state for_k8s_version="v1.25" state="alpha" >}}
{{< feature-state for_k8s_version="v1.26" state="beta" >}}

<!-- overview -->

Expand All @@ -28,14 +28,6 @@ You should already be familiar with the basic use of [Job](/docs/concepts/worklo

{{< include "task-tutorial-prereqs.md" >}} {{< version-check >}}

<!-- steps -->

{{< note >}}
As the features are in Alpha, prepare the Kubernetes cluster with the two
[feature gates](/docs/reference/command-line-tools-reference/feature-gates/)
enabled: `JobPodFailurePolicy` and `PodDisruptionConditions`.
{{< /note >}}

## Using Pod failure policy to avoid unnecessary Pod retries

With the following example, you can learn how to use Pod failure policy to
Expand Down

0 comments on commit bfe7bd6

Please sign in to comment.