-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add count of ready Pods in Job status
- Loading branch information
1 parent
4e068bd
commit 0de12d1
Showing
3 changed files
with
385 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
kep-number: 2879 | ||
beta: | ||
approver: "@ehashman" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,348 @@ | ||
# KEP-2879: Track ready Pods in Job status | ||
|
||
<!-- toc --> | ||
- [Release Signoff Checklist](#release-signoff-checklist) | ||
- [Summary](#summary) | ||
- [Motivation](#motivation) | ||
- [Goals](#goals) | ||
- [Non-Goals](#non-goals) | ||
- [Proposal](#proposal) | ||
- [Risks and Mitigations](#risks-and-mitigations) | ||
- [Design Details](#design-details) | ||
- [API](#api) | ||
- [Changes to the Job controller](#changes-to-the-job-controller) | ||
- [Test Plan](#test-plan) | ||
- [Graduation Criteria](#graduation-criteria) | ||
- [Alpha](#alpha) | ||
- [Beta](#beta) | ||
- [GA](#ga) | ||
- [Deprecation](#deprecation) | ||
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy) | ||
- [Version Skew Strategy](#version-skew-strategy) | ||
- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire) | ||
- [Feature Enablement and Rollback](#feature-enablement-and-rollback) | ||
- [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning) | ||
- [Monitoring Requirements](#monitoring-requirements) | ||
- [Dependencies](#dependencies) | ||
- [Scalability](#scalability) | ||
- [Troubleshooting](#troubleshooting) | ||
- [Implementation History](#implementation-history) | ||
- [Drawbacks](#drawbacks) | ||
- [Alternatives](#alternatives) | ||
<!-- /toc --> | ||
|
||
## Release Signoff Checklist | ||
|
||
Items marked with (R) are required *prior to targeting to a milestone / release*. | ||
|
||
- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR) | ||
- [x] (R) KEP approvers have approved the KEP status as `implementable` | ||
- [x] (R) Design details are appropriately documented | ||
- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors) | ||
- [ ] e2e Tests for all Beta API Operations (endpoints) | ||
- [ ] (R) Ensure GA e2e tests for meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) | ||
- [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free | ||
- [ ] (R) Graduation criteria is in place | ||
- [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) | ||
- [x] (R) Production readiness review completed | ||
- [x] (R) Production readiness review approved | ||
- [ ] "Implementation History" section is up-to-date for milestone | ||
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io] | ||
- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes | ||
|
||
[kubernetes.io]: https://kubernetes.io/ | ||
[kubernetes/enhancements]: https://git.k8s.io/enhancements | ||
[kubernetes/kubernetes]: https://git.k8s.io/kubernetes | ||
[kubernetes/website]: https://git.k8s.io/website | ||
|
||
## Summary | ||
|
||
The Job status has a field `active` which counts the number of Job Pods that | ||
are in `Running` or `Pending` phases. In this KEP, we add a field `ready` that | ||
counts the number of Job Pods that have a `Ready` condition, with the same | ||
best effort guarantees as the existing `active` field. | ||
|
||
## Motivation | ||
|
||
Job Pods can remain in the `Pending` phase for a long time in clusters with | ||
tight resources and when image pulls take long. Since the `Job.status.active` | ||
field includes `Pending` Pods, this can give a false impression of progress | ||
to end users or other controllers. This is more important when the pods serve | ||
as workers and need to communicate among themselves. | ||
|
||
A separate `Job.status.ready` field can provide more information for users | ||
and controllers, reducing the need to listen to Pod updates themselves. | ||
|
||
Note that other workload APIs (such as ReplicaSet and StatefulSet) have a | ||
similar field: `.status.readyReplicas`. | ||
|
||
### Goals | ||
|
||
- Add the field `Job.status.ready` that keeps a count of Job Pods with the | ||
`Ready` condition. | ||
|
||
### Non-Goals | ||
|
||
- Provide strong guarantees for the accuracy of the count. Due to the | ||
asynchronous nature of k8s, there are can be more or less Pods currently | ||
ready than what the count provides. | ||
|
||
## Proposal | ||
|
||
Add the field `.status.ready` to the Job API. The job controller updates the | ||
field based on the number of Pods that have the `Ready` condition. | ||
|
||
### Risks and Mitigations | ||
|
||
During upgrades, a cluster can have apiservers with version skew, or the | ||
administrator might decide to do a rollback. This can cause: | ||
|
||
- Loss of the new API field value | ||
|
||
This is acceptable for the first release. The value is only informative: the | ||
kubernetes control plane doesn't use the value to influence behavior. | ||
|
||
- Repeated Job status updates. | ||
|
||
If one apiserver populates the value and another apiserver (running an older | ||
version) drops the field, the job controller might try to update the field | ||
again, potentially causing subsequent updates. This can be mitigated by only | ||
updating the field if the job controller is already updating the status due | ||
to changes in other fields. This check is only necessary in the first release. | ||
|
||
For both problems, in the first release, the API documentation, can state that | ||
the field can remain at zero indefinitely even if pods have been Ready for a long | ||
time. | ||
|
||
## Design Details | ||
|
||
### API | ||
|
||
```golang | ||
type JobStatus struct { | ||
... | ||
Active int32 | ||
Ready int32 // new field | ||
Succeeded int32 | ||
Failed int32 | ||
} | ||
``` | ||
|
||
### Changes to the Job controller | ||
|
||
The Job controller already lists the Pods to populate the `active`, `succeeded` | ||
and `failed` fields. To count `ready` pods, the job controller will filter the | ||
pods that have the `Ready` condition. | ||
|
||
In a first release, the Job controller counts the ready pods and updates the | ||
field if and only if: | ||
- The job controller is already updating other Job status fields. | ||
- The `JobReadyPods` feature gate is enabled. | ||
|
||
In the second release, the Job controller updates the field unconditionally. | ||
|
||
### Test Plan | ||
|
||
- Unit and integration tests covering: | ||
- Count of ready pods. | ||
- Not producing updates in the cases described in the design. | ||
- Verify passing existing E2E and conformance tests for Job. | ||
|
||
### Graduation Criteria | ||
|
||
#### Alpha | ||
|
||
This KEP proposes to skip this stage, for the following reasons: | ||
- The added calculation is trivial. | ||
- It is acceptable to report .status.ready as zero in the first release, as | ||
the value is only informative. | ||
|
||
#### Beta | ||
|
||
- Ability to completely disable the feature, through a feature gate. The feature | ||
gate is enabled by default. | ||
|
||
In a first release: | ||
|
||
- The job controller only fills the field if there are other Job status updates. | ||
- Unit and integration tests. | ||
|
||
In a second release: | ||
|
||
- The job controller fills the field whenever the number of ready Pods changes. | ||
The feature can still be disabled through the feature gate. | ||
|
||
#### GA | ||
|
||
- Every bug report is fixed. | ||
- The job controller ignores the feature gate. | ||
|
||
#### Deprecation | ||
|
||
N/A | ||
|
||
### Upgrade / Downgrade Strategy | ||
|
||
No changes required for existing cluster to use the enhancement. | ||
|
||
### Version Skew Strategy | ||
|
||
The feature doesn't affect nodes. | ||
|
||
In the first release, a version skew between apiservers might cause the new field | ||
to remain at zero even if there are Pods ready. | ||
|
||
## Production Readiness Review Questionnaire | ||
|
||
### Feature Enablement and Rollback | ||
|
||
###### How can this feature be enabled / disabled in a live cluster? | ||
|
||
- [x] Feature gate (also fill in values in `kep.yaml`) | ||
- Feature gate name: JobReadyPods | ||
- Components depending on the feature gate: kube-controller-manager | ||
- [ ] Other | ||
- Describe the mechanism: | ||
- Will enabling / disabling the feature require downtime of the control | ||
plane? | ||
- Will enabling / disabling the feature require downtime or reprovisioning | ||
of a node? (Do not assume `Dynamic Kubelet Config` feature is enabled). | ||
|
||
###### Does enabling the feature change any default behavior? | ||
|
||
Yes, the Job controller might upgrade the Job status more frequently to | ||
report ready pods. | ||
|
||
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)? | ||
|
||
Yes, the lost of information is acceptable as the field is only informative. | ||
|
||
###### What happens if we reenable the feature if it was previously rolled back? | ||
|
||
The Job controller will start populating the field again. | ||
|
||
###### Are there any tests for feature enablement/disablement? | ||
|
||
Yes, at unit and integration level. | ||
|
||
### Rollout, Upgrade and Rollback Planning | ||
|
||
###### How can a rollout or rollback fail? Can it impact already running workloads? | ||
|
||
The field is only informative, it doesn't affect running workloads. | ||
|
||
###### What specific metrics should inform a rollback? | ||
|
||
N/A | ||
|
||
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested? | ||
|
||
N/A | ||
|
||
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.? | ||
|
||
No | ||
|
||
### Monitoring Requirements | ||
|
||
###### How can an operator determine if the feature is in use by workloads? | ||
|
||
The feature applies to all Jobs, unless the feature gate is disabled. | ||
|
||
###### How can someone using this feature know that it is working for their instance? | ||
|
||
- [x] API .status | ||
- Other field: `ready` | ||
|
||
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement? | ||
|
||
The 99% percentile of Job status updates below 1s, when the controller doesn't | ||
create new Pods or tracks finishing Pods. | ||
|
||
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service? | ||
|
||
- [x] Metrics | ||
- Metric name: `job_sync_duration_seconds`, `job_sync_total`. | ||
|
||
###### Are there any missing metrics that would be useful to have to improve observability of this feature? | ||
|
||
No. | ||
|
||
### Dependencies | ||
|
||
###### Does this feature depend on any specific services running in the cluster? | ||
|
||
No. | ||
|
||
### Scalability | ||
|
||
###### Will enabling / using this feature result in any new API calls? | ||
|
||
|
||
- API: PUT Job/status | ||
|
||
Estimated throughput: at most one API call for each Job Pod reaching Ready | ||
condition. | ||
|
||
Originating component: job-controller | ||
|
||
###### Will enabling / using this feature result in introducing new API types? | ||
|
||
No. | ||
|
||
###### Will enabling / using this feature result in any new calls to the cloud provider? | ||
|
||
No. | ||
|
||
###### Will enabling / using this feature result in increasing size or count of the existing API objects? | ||
|
||
- API: Job/status | ||
|
||
Estimated increase in size: New field of less than 10B. | ||
|
||
###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs? | ||
|
||
No. | ||
|
||
###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components? | ||
|
||
No. | ||
|
||
### Troubleshooting | ||
|
||
###### How does this feature react if the API server and/or etcd is unavailable? | ||
|
||
No change from existing behavior of the Job controller. | ||
|
||
###### What are other known failure modes? | ||
|
||
- When the cluster has apiservers with skewed versions, the `Job.status.ready` | ||
might remain zero. | ||
|
||
###### What steps should be taken if SLOs are not being met to determine the problem? | ||
|
||
1. Check reachability between kube-controller-manager and apiserver. | ||
1. If the `job_sync_duration_seconds` is too high, check for the number | ||
of requests in apiserver coming from the kube-system/job-controller service | ||
account. Consider increasing the number of inflight requests for | ||
apiserver or tuning [API priority and fairness](https://kubernetes.io/docs/concepts/cluster-administration/flow-control/) | ||
to give more priority for the job-controller requests. | ||
1. If the steps above are insufficient disable the `JobTrackingWithFinalizers` | ||
feature gate from apiserver and kube-controller-manager and [report an issue](https://github.com/kubernetes/kubernetes/issues). | ||
|
||
## Implementation History | ||
|
||
- 2021-08-19: Proposed KEP starting in beta status. | ||
|
||
## Drawbacks | ||
|
||
The only drawback is an increase in API calls. However, this is capped by | ||
the number of times a Pod flips ready status. This is usually once for each | ||
Pod created. | ||
|
||
## Alternatives | ||
|
||
- Add `Job.status.running`, counting Pods with `Running` phase. The `Running` | ||
phase doesn't take into account preparation work before the worker is ready | ||
to accept connections. The `Ready` condition is configurable through a | ||
readiness probe. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
title: Track ready Pods in Job status | ||
kep-number: 2879 | ||
authors: | ||
- "@alculquicondor" | ||
owning-sig: sig-apps | ||
participating-sigs: | ||
status: implementable | ||
creation-date: 2021-08-19 | ||
reviewers: | ||
- "@soltysh" | ||
- TBD API reviewer | ||
approvers: | ||
- "@soltysh" | ||
|
||
see-also: | ||
replaces: | ||
|
||
stage: beta | ||
|
||
latest-milestone: "v1.23" | ||
|
||
milestone: | ||
beta: "v1.23" | ||
stable: "v1.25" | ||
|
||
feature-gates: | ||
- name: JobReadyPods | ||
components: | ||
- kube-controller-manager | ||
disable-supported: true | ||
|
||
metrics: | ||
- job_sync_duration_seconds | ||
- job_sync_total |