Skip to content

Commit

Permalink
PRR Remarks
Browse files Browse the repository at this point in the history
  • Loading branch information
mimowo committed Oct 2, 2023
1 parent 0e13432 commit fc9b1c8
Show file tree
Hide file tree
Showing 2 changed files with 20 additions and 15 deletions.
33 changes: 19 additions & 14 deletions keps/sig-apps/2879-ready-pods-job-status/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -211,7 +211,9 @@ The Job controller will start populating the field again.

###### Are there any tests for feature enablement/disablement?

Yes, there are tests at unit and [integration] level.
Yes, there are unit tests (see [link](https://github.com/kubernetes/kubernetes/blob/e8abe1af8dcb36f65ef7aa7135d4664b3db90e89/pkg/controller/job/job_controller_test.go#L236)).
There are also integration tests (see [link](https://github.com/kubernetes/kubernetes/blob/e8abe1af8dcb36f65ef7aa7135d4664b3db90e89/test/integration/job/job_test.go#L1364) and
[link](https://github.com/kubernetes/kubernetes/blob/e8abe1af8dcb36f65ef7aa7135d4664b3db90e89/test/integration/job/job_test.go#L1517)).

### Rollout, Upgrade and Rollback Planning

Expand All @@ -222,20 +224,20 @@ The field is only informative, it doesn't affect running workloads.
###### What specific metrics should inform a rollback?

- An increase in `job_sync_duration_seconds`.
- A reduction in `job_sync_num`.
- A reduction in `job_syncs_total`.

###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?

A manual test will be performed, as follows:
A manual test on Beta was performed, as follows:

1. Create a cluster in 1.23.
1. Upgrade to 1.24.
1. Create long running Job A, ensure that the ready field is populated.
1. Downgrade to 1.23.
1. Verify that ready field in Job A is not lost, but also not updated.
1. Create long running Job B, ensure that ready field is not populated.
1. Upgrade to 1.24.
1. Verify that Job A and B ready field is tracked again.
1. Create a cluster in 1.28 with the `JobReadyPods` disabled (`=false`).
2. Simulate upgrade by modifying control-plane manifests to enable `JobReadyPods`.
3. Create long running Job A, ensure that the ready field is populated.
4. Simulate downgrade by modifying control-plane manifests to disable `JobReadyPods`.
5. Verify that ready field in Job A is cleaned up shortly after the startup of the Job controller completes.
6. Create long running Job B, ensure that ready field is not populated.
7. Simulate upgrade by modifying control-plane manifests to enable `JobReadyPods`.
8. Verify that Job A and B ready field is tracked again shortly after the startup of the Job controller completes.

###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?

Expand All @@ -260,7 +262,7 @@ the controller doesn't create new Pods or tracks finishing Pods.
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?

- [x] Metrics
- Metric name: `job_sync_duration_seconds`, `job_sync_total`.
- Metric name: `job_sync_duration_seconds`, `job_syncs_total`.
- Components exposing the metric: `kube-controller-manager`

###### Are there any missing metrics that would be useful to have to improve observability of this feature?
Expand Down Expand Up @@ -307,6 +309,10 @@ No.

No.

###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?

No.

### Troubleshooting

###### How does this feature react if the API server and/or etcd is unavailable?
Expand All @@ -315,8 +321,7 @@ No change from existing behavior of the Job controller.

###### What are other known failure modes?

- When the cluster has apiservers with skewed versions, the `Job.status.ready`
might remain zero.
No.

###### What steps should be taken if SLOs are not being met to determine the problem?

Expand Down
2 changes: 1 addition & 1 deletion keps/sig-apps/2879-ready-pods-job-status/kep.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -33,4 +33,4 @@ disable-supported: true

metrics:
- job_sync_duration_seconds
- job_sync_total
- job_syncs_total

0 comments on commit fc9b1c8

Please sign in to comment.