From afbd037aa0f308298179932f705dcdd728459408 Mon Sep 17 00:00:00 2001 From: Aldo Culquicondor Date: Thu, 19 Aug 2021 15:08:06 -0400 Subject: [PATCH] Add count of ready Pods in Job status --- keps/prod-readiness/sig-apps/2879.yaml | 3 + .../2879-ready-pods-job-status/README.md | 348 ++++++++++++++++++ .../2879-ready-pods-job-status/kep.yaml | 33 ++ 3 files changed, 384 insertions(+) create mode 100644 keps/prod-readiness/sig-apps/2879.yaml create mode 100644 keps/sig-apps/2879-ready-pods-job-status/README.md create mode 100644 keps/sig-apps/2879-ready-pods-job-status/kep.yaml diff --git a/keps/prod-readiness/sig-apps/2879.yaml b/keps/prod-readiness/sig-apps/2879.yaml new file mode 100644 index 000000000000..904a83bda922 --- /dev/null +++ b/keps/prod-readiness/sig-apps/2879.yaml @@ -0,0 +1,3 @@ +kep-number: 2879 +beta: + approver: "@ehashman" diff --git a/keps/sig-apps/2879-ready-pods-job-status/README.md b/keps/sig-apps/2879-ready-pods-job-status/README.md new file mode 100644 index 000000000000..d682eee248e8 --- /dev/null +++ b/keps/sig-apps/2879-ready-pods-job-status/README.md @@ -0,0 +1,348 @@ +# KEP-2879: Track ready Pods in Job status + + +- [Release Signoff Checklist](#release-signoff-checklist) +- [Summary](#summary) +- [Motivation](#motivation) + - [Goals](#goals) + - [Non-Goals](#non-goals) +- [Proposal](#proposal) + - [Risks and Mitigations](#risks-and-mitigations) +- [Design Details](#design-details) + - [API](#api) + - [Changes to the Job controller](#changes-to-the-job-controller) + - [Test Plan](#test-plan) + - [Graduation Criteria](#graduation-criteria) + - [Alpha](#alpha) + - [Beta](#beta) + - [GA](#ga) + - [Deprecation](#deprecation) + - [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy) + - [Version Skew Strategy](#version-skew-strategy) +- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire) + - [Feature Enablement and Rollback](#feature-enablement-and-rollback) + - [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning) + - [Monitoring Requirements](#monitoring-requirements) + - [Dependencies](#dependencies) + - [Scalability](#scalability) + - [Troubleshooting](#troubleshooting) +- [Implementation History](#implementation-history) +- [Drawbacks](#drawbacks) +- [Alternatives](#alternatives) + + +## Release Signoff Checklist + +Items marked with (R) are required *prior to targeting to a milestone / release*. + +- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR) +- [x] (R) KEP approvers have approved the KEP status as `implementable` +- [x] (R) Design details are appropriately documented +- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors) + - [ ] e2e Tests for all Beta API Operations (endpoints) + - [ ] (R) Ensure GA e2e tests for meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) + - [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free +- [ ] (R) Graduation criteria is in place + - [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) +- [x] (R) Production readiness review completed +- [x] (R) Production readiness review approved +- [ ] "Implementation History" section is up-to-date for milestone +- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io] +- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes + +[kubernetes.io]: https://kubernetes.io/ +[kubernetes/enhancements]: https://git.k8s.io/enhancements +[kubernetes/kubernetes]: https://git.k8s.io/kubernetes +[kubernetes/website]: https://git.k8s.io/website + +## Summary + +The Job status has a field `active` which counts the number of Job Pods that +are in `Running` or `Pending` phases. In this KEP, we add a field `ready` that +counts the number of Job Pods that have a `Ready` condition, with the same +best effort guarantees as the existing `active` field. + +## Motivation + +Job Pods can remain in the `Pending` phase for a long time in clusters with +tight resources and when image pulls take long. Since the `Job.status.active` +field includes `Pending` Pods, this can give a false impression of progress +to end users or other controllers. This is more important when the pods serve +as workers and need to communicate among themselves. + +A separate `Job.status.ready` field can provide more information for users +and controllers, reducing the need to listen to Pod updates themselves. + +Note that other workload APIs (such as ReplicaSet and StatefulSet) have a +similar field: `.status.readyReplicas`. + +### Goals + +- Add the field `Job.status.ready` that keeps a count of Job Pods with the + `Ready` condition. + +### Non-Goals + +- Provide strong guarantees for the accuracy of the count. Due to the + asynchronous nature of k8s, there are can be more or less Pods currently + ready than what the count provides. + +## Proposal + +Add the field `.status.ready` to the Job API. The job controller updates the +field based on the number of Pods that have the `Ready` condition. + +### Risks and Mitigations + +During upgrades, a cluster can have apiservers with version skew, or the +administrator might decide to do a rollback. This can cause: + +- Loss of the new API field value + + This is acceptable for the first release. The value is only informative: the + kubernetes control plane doesn't use the value to influence behavior. + +- Repeated Job status updates. + + If one apiserver populates the value and another apiserver (running an older + version) drops the field, the job controller might try to update the field + again, potentially causing subsequent updates. This can be mitigated by only + updating the field if the job controller is already updating the status due + to changes in other fields. This check is only necessary in the first release. + +For both problems, in the first release, the API documentation, can state that +the field can remain at zero indefinitely even if pods have been Ready for a long +time. + +## Design Details + +### API + +```golang +type JobStatus struct { + ... + Active int32 + Ready int32 // new field + Succeeded int32 + Failed int32 +} +``` + +### Changes to the Job controller + +The Job controller already lists the Pods to populate the `active`, `succeeded` +and `failed` fields. To count `ready` pods, the job controller will filter the +pods that have the `Ready` condition. + +In a first release, the Job controller counts the ready pods and updates the +field if and only if: +- The job controller is already updating other Job status fields. +- The `JobReadyPods` feature gate is enabled. + +In the second release, the Job controller updates the field unconditionally. + +### Test Plan + +- Unit and integration tests covering: + - Count of ready pods. + - Not producing updates in the cases described in the design. +- Verify passing existing E2E and conformance tests for Job. + +### Graduation Criteria + +#### Alpha + +This KEP proposes to skip this stage, for the following reasons: +- The added calculation is trivial. +- It is acceptable to report .status.ready as zero in the first release, as + the value is only informative. + +#### Beta + +- Ability to completely disable the feature, through a feature gate. The feature + gate is enabled by default. + +In a first release: + +- The job controller only fills the field if there are other Job status updates. +- Unit and integration tests. + +In a second release: + +- The job controller fills the field whenever the number of ready Pods changes. + The feature can still be disabled through the feature gate. + +#### GA + +- Every bug report is fixed. +- The job controller ignores the feature gate. + +#### Deprecation + +N/A + +### Upgrade / Downgrade Strategy + +No changes required for existing cluster to use the enhancement. + +### Version Skew Strategy + +The feature doesn't affect nodes. + +In the first release, a version skew between apiservers might cause the new field +to remain at zero even if there are Pods ready. + +## Production Readiness Review Questionnaire + +### Feature Enablement and Rollback + +###### How can this feature be enabled / disabled in a live cluster? + +- [x] Feature gate (also fill in values in `kep.yaml`) + - Feature gate name: JobReadyPods + - Components depending on the feature gate: kube-controller-manager +- [ ] Other + - Describe the mechanism: + - Will enabling / disabling the feature require downtime of the control + plane? + - Will enabling / disabling the feature require downtime or reprovisioning + of a node? (Do not assume `Dynamic Kubelet Config` feature is enabled). + +###### Does enabling the feature change any default behavior? + +Yes, the Job controller might upgrade the Job status more frequently to +report ready pods. + +###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)? + +Yes, the lost of information is acceptable as the field is only informative. + +###### What happens if we reenable the feature if it was previously rolled back? + +The Job controller will start populating the field again. + +###### Are there any tests for feature enablement/disablement? + +Yes, at unit and integration level. + +### Rollout, Upgrade and Rollback Planning + +###### How can a rollout or rollback fail? Can it impact already running workloads? + +The field is only informative, it doesn't affect running workloads. + +###### What specific metrics should inform a rollback? + +N/A + +###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested? + +N/A + +###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.? + +No + +### Monitoring Requirements + +###### How can an operator determine if the feature is in use by workloads? + +The feature applies to all Jobs, unless the feature gate is disabled. + +###### How can someone using this feature know that it is working for their instance? + +- [x] API .status + - Other field: `ready` + +###### What are the reasonable SLOs (Service Level Objectives) for the enhancement? + +The 99% percentile of Job status updates below 1s, when the controller doesn't +create new Pods or tracks finishing Pods. + +###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service? + +- [x] Metrics + - Metric name: `job_sync_duration_seconds`, `job_sync_total`. + +###### Are there any missing metrics that would be useful to have to improve observability of this feature? + +No. + +### Dependencies + +###### Does this feature depend on any specific services running in the cluster? + +No. + +### Scalability + +###### Will enabling / using this feature result in any new API calls? + + +- API: PUT Job/status + + Estimated throughput: at most one API call for each Job Pod reaching Ready + condition. + + Originating component: job-controller + +###### Will enabling / using this feature result in introducing new API types? + +No. + +###### Will enabling / using this feature result in any new calls to the cloud provider? + +No. + +###### Will enabling / using this feature result in increasing size or count of the existing API objects? + +- API: Job/status + + Estimated increase in size: New field of less than 10B. + +###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs? + +No. + +###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components? + +No. + +### Troubleshooting + +###### How does this feature react if the API server and/or etcd is unavailable? + +No change from existing behavior of the Job controller. + +###### What are other known failure modes? + +- When the cluster has apiservers with skewed versions, the `Job.status.ready` + might remain zero. + +###### What steps should be taken if SLOs are not being met to determine the problem? + +1. Check reachability between kube-controller-manager and apiserver. +1. If the `job_sync_duration_seconds` is too high, check for the number + of requests in apiserver coming from the kube-system/job-controller service + account. Consider increasing the number of inflight requests for + apiserver or tuning [API priority and fairness](https://kubernetes.io/docs/concepts/cluster-administration/flow-control/) + to give more priority for the job-controller requests. +1. If the steps above are insufficient disable the `JobTrackingWithFinalizers` + feature gate from apiserver and kube-controller-manager and [report an issue](https://github.com/kubernetes/kubernetes/issues). + +## Implementation History + +- 2021-08-19: Proposed KEP starting in beta status. + +## Drawbacks + +The only drawback is an increase in API calls. However, this is capped by +the number of times a Pod flips ready status. This is usually once for each +Pod created. + +## Alternatives + +- Add `Job.status.running`, counting Pods with `Running` phase. The `Running` + phase doesn't take into account preparation work before the worker is ready + to accept connections. The `Ready` condition is configurable through a + readiness probe. diff --git a/keps/sig-apps/2879-ready-pods-job-status/kep.yaml b/keps/sig-apps/2879-ready-pods-job-status/kep.yaml new file mode 100644 index 000000000000..ba5671cd06b9 --- /dev/null +++ b/keps/sig-apps/2879-ready-pods-job-status/kep.yaml @@ -0,0 +1,33 @@ +title: Track ready Pods in Job status +kep-number: 2879 +authors: + - "@alculquicondor" +owning-sig: sig-apps +participating-sigs: +status: implementable +creation-date: 2021-08-19 +reviewers: + - "@soltysh" + - TBD API reviewer +approvers: + - "@soltysh" + +see-also: +replaces: + +stage: beta + +latest-milestone: "v1.23" + +milestone: + beta: "v1.23" + stable: "v1.25" + +feature-gates: + - name: JobReadyPods + components: kube-controller-manager +disable-supported: true + +metrics: + - job_sync_duration_seconds + - job_sync_total \ No newline at end of file