diff --git a/keps/prod-readiness/sig-storage/3762.yaml b/keps/prod-readiness/sig-storage/3762.yaml index 3ff82dfa6b43..89bff47e511b 100644 --- a/keps/prod-readiness/sig-storage/3762.yaml +++ b/keps/prod-readiness/sig-storage/3762.yaml @@ -1,3 +1,5 @@ kep-number: 3762 alpha: approver: "@wojtek-t" +beta: + approver: "@wojtek-t" \ No newline at end of file diff --git a/keps/sig-storage/3762-persistent-volume-last-phase-transition-time/README.md b/keps/sig-storage/3762-persistent-volume-last-phase-transition-time/README.md index 2429054b4d91..c8e15f7f54c3 100644 --- a/keps/sig-storage/3762-persistent-volume-last-phase-transition-time/README.md +++ b/keps/sig-storage/3762-persistent-volume-last-phase-transition-time/README.md @@ -230,9 +230,8 @@ The "Design Details" section below is for the real nitty-gritty. --> -We need to update API server to support the newly proposed field and update PV controller to set a value of the new -timestamp field when a volume transitions to a different phase. Also, if the feature gate is disabled the value must be -re-set to `nil` when updating or creating a volume. +We need to update API server to support the newly proposed field and set a value of the new timestamp field when a volume +transitions to a different phase. The timestamp field must be set to current time also for newly created volumes. The value of the field is not intended for use by any other Kubernetes components at this point and should be used only as a convenience feature for cluster admins. Cluster admins should be able to list and sort PersistentVolumes based on @@ -273,10 +272,6 @@ Adding the field to a PV is reasonable only when the PV actually transitions its capture meaningful timestamp. Trying to do this at any other step than phase transition would capture a timestamp that would semantically incorrect and misleading. -Removing the value can be done more freely, but tradeoffs need to be considered. One alternative approach discussed was -removing the field values more aggressively, during each PV sync. As this might introduce performance issues and does -not add much value the removal should be done during PV validation instead. - ### Risks and Mitigations -| API server | KCM | Behavior | -|------------|-----|----------------------------------------------------------------------------------------------------------------------------| -| off | off | Existing Kubernetes behavior. | -| on | off| Existing Kubernetes behavior, only users can set `pvc.Status.LastPhaseTransitionTime`. | -| off | on | PV controller may try to change `pvc.Status.LastPhaseTransitionTime`, which will fail on the API server. | -| on | on | New behavior. +Version skew is not applicable, KCM was not changed in scope of this enhancement. + +| API server | KCM | Behavior | +|------------|-----|-----------------------| +| off | N/A | Existing Kubernetes behavior.| +| on | N/A | New behavior. | +| off | N/A | Existing Kubernetes behavior. | +| on | N/A | New behavior. | ## Production Readiness Review Questionnaire @@ -588,7 +565,7 @@ well as the [existing list] of feature gates. - [X] Feature gate (also fill in values in `kep.yaml`) - Feature gate name: PersistentVolumeLastPhaseTransitionTime - - Components depending on the feature gate: kube-apiserver, kube-controller-manager + - Components depending on the feature gate: kube-apiserver - [ ] Other - Describe the mechanism: - Will enabling / disabling the feature require downtime of the control @@ -618,32 +595,30 @@ feature. NOTE: Also set `disable-supported` to `true` or `false` in `kep.yaml`. --> -Yes. This will result in the timestamp value being eventually set to `nil`. More details in -"Upgrade / Downgrade Strategy" section. +Yes, for PVs not updated while feature was enabled. However once the `LastPhaseTransitionTime` value is set, disabling +feature gate will not remove the value. + +More details in "Upgrade / Downgrade Strategy" section. ###### What happens if we reenable the feature if it was previously rolled back? -No issues expected. There are three cases that can occur for a PV: +No issues expected. There are two cases that can occur for a PV: 1. PV did not transition its phase when feature gate was enabled - the `LastPhaseTransitionTime` field was not added to the PV object so this is the same case as enabling the feature gate for the first time. - 2. PV did transition its phase but the `LastPhaseTransitionTime` *was not* reset to `nil` because the PV was not - validated when the feature was enabled - the timestamp value will be updated on next phase change as if the feature - gate was never disabled. - 2. PV did transition its phase but the `LastPhaseTransitionTime` *was* reset to `nil` because the PV was validated at - least once already while feature was enabled - the timestamp value will be updated on next phase change because - updates from `nil` are allowed. + + 2. PV did transition its phase when feature gate was enabled - the `LastPhaseTransitionTime` field is already set, + and it's timestamp value will be updated on next phase change. See "Upgrade / Downgrade Strategy" and "Notes/Constraints/Caveats" sections for more details. ###### Are there any tests for feature enablement/disablement? -Unit tests for enabling and disabling feature gate will be added when transitioning to beta. See "Graduation criteria" -section. +Unit tests for enabling and disabling feature gate will be added in alpha - see "Graduation criteria" section. The tests should focus on verifying correct handling of the new PV field in relation to feature gate state. Correct -handling means the values of the newly added field are added or updated when PV transitions its phase and cleared when -the feature is disabled. +handling means the values of the newly added field are added or updated when PV transitions its phase while feature gate +is enabled, and persisted if already set and feature gate is disabled. +Rollout should not fail and there should be no need for rollback as this enhancement only adds a new field. +Also, rollback in terms of removal of this new field is not possible, once a PV is updated with the new field it can not +be removed after disabling the feature. + ###### What specific metrics should inform a rollback? +No metrics are required. Not having the new field set after enabling this feature is a sufficient signal to indicate +that there is a problem. + ###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested? +Manual upgrade->downgrade->upgrade test was performed to verify correct behavior of the new field. If a downgrade is performed +there are two scenarios that can occur for each PV: + +1) Phase transitioned while feature was enabled - in this case the feature gets disabled and any updates to `LastPhaseTransitionTime` field must not be allowed. +2) Phase did not transition while feature was enabled - in this case the timestamp value must be persisted in `LastPhaseTransitionTime` field. + +After upgrading again the behavior has to match behavior as if the feature was turned on for the first time. + +The difference between feature enablement/disablement and downgrade/upgrade is that after downgrading to a version that +does not support `LastPhaseTransitionTime` field the data can not be accessed. Whereas only disabling the feature will +still show last the value that was set, if present. + ###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.? +No. + ### Monitoring Requirements +PV objects can be inspected for `LastPhaseTransitionTime` field. + ###### How can someone using this feature know that it is working for their instance? -- [ ] Events - - Event Reason: -- [ ] API .status - - Condition name: - - Other field: -- [ ] Other (treat as last resort) - - Details: +- [X] API .status + - Other field: `pv.Status.LastPhaseTransitionTime` ###### What are the reasonable SLOs (Service Level Objectives) for the enhancement? @@ -756,12 +749,9 @@ question. Pick one more of these and delete the rest. --> -- [ ] Metrics - - Metric name: - - [Optional] Aggregation method: - - Components exposing the metric: -- [ ] Other (treat as last resort) - - Details: +- [X] Other (treat as last resort) + - Details: To check correct functionality, inspect `LastPhaseTransitionTime` of a PV after binding it to a PVC. + Or simply create a PVC and check dynamically provisioned PV if it has a `LastPhaseTransitionTime` set to current time. ###### Are there any missing metrics that would be useful to have to improve observability of this feature? @@ -770,6 +760,8 @@ Describe the metrics themselves and the reasons why they weren't added (e.g., co implementation difficulties, etc.). --> +Due to the simple nature if this feature there's no need to add any metric. + ### Dependencies +No. + ### Scalability +No, the feature is implemented directly in API strategy for updating PVs. + ###### Will enabling / using this feature result in introducing new API types? +No. + ###### Will enabling / using this feature result in any new calls to the cloud provider? +No. + ###### Will enabling / using this feature result in increasing size or count of the existing API objects? +Yes, all PV objects will have an entirely new field. + ###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs? +No. + ###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components? +No. + ###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)? +No. + ### Troubleshooting +None, the feature is dependent only on API server and should not be affected by other failures. + ###### What steps should be taken if SLOs are not being met to determine the problem? +Users should inspect API server logs for errors in case PV objects are not updated properly. + ## Implementation History +No drawbacks discovered, enhancement only adds a new informative field. + ## Alternatives +Alternative solution is to update phase transition timestamp in PV controller/KCM. This would increase chances of having +a time skew between API audit logs and the timestamp. Updating phase transition timestamp in API strategy code is +therefore a better solution. + ## Infrastructure Needed (Optional) + +None. diff --git a/keps/sig-storage/3762-persistent-volume-last-phase-transition-time/kep.yaml b/keps/sig-storage/3762-persistent-volume-last-phase-transition-time/kep.yaml index a69740fdaf70..830dc3ce9fec 100644 --- a/keps/sig-storage/3762-persistent-volume-last-phase-transition-time/kep.yaml +++ b/keps/sig-storage/3762-persistent-volume-last-phase-transition-time/kep.yaml @@ -6,7 +6,7 @@ owning-sig: sig-storage participating-sigs: status: implementable creation-date: 2023-01-20 -last-updated: 2023-06-01 +last-updated: 2023-12-06 reviewers: - "@jsafrane" approvers: @@ -16,12 +16,12 @@ see-also: replaces: # The target maturity stage in the current dev cycle for this KEP. -stage: alpha +stage: beta # The most recent milestone for which work toward delivery of this KEP has been # done. This can be the current (upcoming) milestone, if it is being actively # worked on. -latest-milestone: "v1.28" +latest-milestone: "v1.29" # The milestone at which this feature was, or is targeted to be, at each stage. milestone: @@ -35,7 +35,6 @@ feature-gates: - name: PersistentVolumeLastPhaseTransitionTime components: - kube-apiserver - - kube-controller-manager disable-supported: true # The following PRR answers are required at beta release