diff --git a/keps/sig-apps/2879-ready-pods-job-status/README.md b/keps/sig-apps/2879-ready-pods-job-status/README.md index 9cdbd5132e5..5e3a7ba6ac1 100644 --- a/keps/sig-apps/2879-ready-pods-job-status/README.md +++ b/keps/sig-apps/2879-ready-pods-job-status/README.md @@ -211,7 +211,9 @@ The Job controller will start populating the field again. ###### Are there any tests for feature enablement/disablement? -Yes, there are tests at unit and [integration] level. +Yes, there are unit tests (see [link](https://github.com/kubernetes/kubernetes/blob/e8abe1af8dcb36f65ef7aa7135d4664b3db90e89/pkg/controller/job/job_controller_test.go#L236)). +There are also integration tests (see [link](https://github.com/kubernetes/kubernetes/blob/e8abe1af8dcb36f65ef7aa7135d4664b3db90e89/test/integration/job/job_test.go#L1364) and +[link](https://github.com/kubernetes/kubernetes/blob/e8abe1af8dcb36f65ef7aa7135d4664b3db90e89/test/integration/job/job_test.go#L1517)). ### Rollout, Upgrade and Rollback Planning @@ -222,20 +224,20 @@ The field is only informative, it doesn't affect running workloads. ###### What specific metrics should inform a rollback? - An increase in `job_sync_duration_seconds`. -- A reduction in `job_sync_num`. +- A reduction in `job_syncs_total`. ###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested? -A manual test will be performed, as follows: +A manual test on Beta was performed, as follows: -1. Create a cluster in 1.23. -1. Upgrade to 1.24. -1. Create long running Job A, ensure that the ready field is populated. -1. Downgrade to 1.23. -1. Verify that ready field in Job A is not lost, but also not updated. -1. Create long running Job B, ensure that ready field is not populated. -1. Upgrade to 1.24. -1. Verify that Job A and B ready field is tracked again. +1. Create a cluster in 1.28 with the `JobReadyPods` disabled (`=false`). +2. Simulate upgrade by modifying control-plane manifests to enable `JobReadyPods`. +3. Create long running Job A, ensure that the ready field is populated. +4. Simulate downgrade by modifying control-plane manifests to disable `JobReadyPods`. +5. Verify that ready field in Job A is cleaned up shortly after the startup of the Job controller completes. +6. Create long running Job B, ensure that ready field is not populated. +7. Simulate upgrade by modifying control-plane manifests to enable `JobReadyPods`. +8. Verify that Job A and B ready field is tracked again shortly after the startup of the Job controller completes. ###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.? @@ -260,7 +262,7 @@ the controller doesn't create new Pods or tracks finishing Pods. ###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service? - [x] Metrics - - Metric name: `job_sync_duration_seconds`, `job_sync_total`. + - Metric name: `job_sync_duration_seconds`, `job_syncs_total`. - Components exposing the metric: `kube-controller-manager` ###### Are there any missing metrics that would be useful to have to improve observability of this feature? @@ -307,6 +309,10 @@ No. No. +###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)? + +No. + ### Troubleshooting ###### How does this feature react if the API server and/or etcd is unavailable? @@ -315,8 +321,7 @@ No change from existing behavior of the Job controller. ###### What are other known failure modes? -- When the cluster has apiservers with skewed versions, the `Job.status.ready` - might remain zero. +No. ###### What steps should be taken if SLOs are not being met to determine the problem? diff --git a/keps/sig-apps/2879-ready-pods-job-status/kep.yaml b/keps/sig-apps/2879-ready-pods-job-status/kep.yaml index da6c6b58986..65ee91d96a5 100644 --- a/keps/sig-apps/2879-ready-pods-job-status/kep.yaml +++ b/keps/sig-apps/2879-ready-pods-job-status/kep.yaml @@ -33,4 +33,4 @@ disable-supported: true metrics: - job_sync_duration_seconds - - job_sync_total \ No newline at end of file + - job_syncs_total \ No newline at end of file