Skip to content

Commit

Permalink
Clear beta graduation criteria.
Browse files Browse the repository at this point in the history
  • Loading branch information
cici37 committed May 9, 2023
1 parent eb59c8f commit ad1b439
Show file tree
Hide file tree
Showing 3 changed files with 51 additions and 23 deletions.
2 changes: 2 additions & 0 deletions keps/prod-readiness/sig-api-machinery/3488.yaml
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
kep-number: 3488
alpha:
approver: "@johnbelamaric"
beta:
approver: "@johnbelamaric"
68 changes: 47 additions & 21 deletions keps/sig-api-machinery/3488-cel-admission-control/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2312,6 +2312,9 @@ in back-to-back releases.
If multiple admission policies require the same conversion, convert only once.
From @liggitt: "webhook code loops up one level, first accumulates all the validation webhooks we'll run, then converts to the versions needed by those webhooks then evaluates in parallel"
- authz check to the specific resource referenced in the policy's paramKind. ([comment](https://github.com/kubernetes/kubernetes/pull/113314#discussion_r1013135860))
- complete feature of access to namespace metadata
- complete type check for CRD
- add controlled rollout strategy to support future CEL library/function/variable changes

### Upgrade / Downgrade Strategy

Expand Down Expand Up @@ -2387,15 +2390,9 @@ well as the [existing list] of feature gates.
[existing list]: https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/
-->

- [ ] Feature gate (also fill in values in `kep.yaml`)
- Feature gate name: CELValidatingAdmission
- [X] Feature gate (also fill in values in `kep.yaml`)
- Feature gate name: ValidatingAdmissionPolicy
- Components depending on the feature gate: kube-apiserver
- [ ] Other
- Describe the mechanism:
- Will enabling / disabling the feature require downtime of the control
plane?
- Will enabling / disabling the feature require downtime or reprovisioning
of a node?

###### Does enabling the feature change any default behavior?

Expand Down Expand Up @@ -2506,6 +2503,9 @@ Ideally, this should be a metric. Operations against the Kubernetes API (e.g.,
checking if there are objects with field X set) may be a last resort. Avoid
logs or events for this purpose.
-->
The following metrics could be used to see if the feature is in use:
- validating_admission_policy/check_total
- validating_admission_policy/definition_total

###### How can someone using this feature know that it is working for their instance?

Expand All @@ -2518,13 +2518,10 @@ and operation of this feature.
Recall that end users cannot usually observe component logs or access metrics.
-->

- [ ] Events
- Event Reason:
- [ ] API .status
- Condition name:
- Other field:
- [ ] Other (treat as last resort)
- Details:
- Metrics like `validating_admission_policy/check_total` can be used to check how many validation applied in total
- Audit mode can be used to check audit event following [this documentation](https://kubernetes.io/docs/reference/access-authn-authz/validating-admission-policy/#audit-annotations)
- ValidatingAdmissionPolicy.Status can be used to see if typechecking performed as expected
- User can also verify if the admission request is rejected or a warning is shown as expected based on how validationAction is set.

###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?

Expand All @@ -2542,26 +2539,28 @@ high level (needs more precise definitions) those may be things like:
These goals will help you determine what you need to measure (SLIs) in the next
question.
-->
No impact on latency for admission request when ValidatingAdmissionPolicy are absent.

Performance when ValidatingAdmissionPolicy are in use will need to be measured and optimized.

###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?

<!--
Pick one more of these and delete the rest.
-->

- [ ] Metrics
- Metric name:
- [Optional] Aggregation method:
- Components exposing the metric:
- [ ] Other (treat as last resort)
- Details:
- [ ] The Metrics below could be used:
- validating_admission_policy/check_total
- validating_admission_policy/definition_total
- validating_admission_policy/check_duration_seconds

###### Are there any missing metrics that would be useful to have to improve observability of this feature?

<!--
Describe the metrics themselves and the reasons why they weren't added (e.g., cost,
implementation difficulties, etc.).
-->
No. We are open to input.

### Dependencies

Expand All @@ -2585,6 +2584,7 @@ and creating new ones, as well as about cluster-level services (e.g. DNS):
- Impact of its outage on the feature:
- Impact of its degraded performance or high-error rates on the feature:
-->
No.

### Scalability

Expand Down Expand Up @@ -2612,6 +2612,7 @@ Focusing mostly on:
- periodic API calls to reconcile state (e.g. periodic fetching state,
heartbeats, leader election, etc.)
-->
Yes. A new API group is introduced which will be used for this feature.

###### Will enabling / using this feature result in introducing new API types?

Expand All @@ -2621,6 +2622,7 @@ Describe them, providing:
- Supported number of objects per cluster
- Supported number of objects per namespace (for namespace-scoped objects)
-->
Yes. We introduced two new kinds for this feature: ValidatingAdmissionPolicy and ValidatingAdmissionPolicyBinding as described in [this doc](https://kubernetes.io/docs/reference/access-authn-authz/validating-admission-policy/)

###### Will enabling / using this feature result in any new calls to the cloud provider?

Expand All @@ -2629,6 +2631,7 @@ Describe them, providing:
- Which API(s):
- Estimated increase:
-->
No.

###### Will enabling / using this feature result in increasing size or count of the existing API objects?

Expand All @@ -2638,6 +2641,7 @@ Describe them, providing:
- Estimated increase in size: (e.g., new annotation of size 32B)
- Estimated amount of new objects: (e.g., new Object X for every existing Pod)
-->
No.

###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?

Expand All @@ -2649,6 +2653,7 @@ Think about adding additional work or introducing new steps in between

[existing SLIs/SLOs]: https://git.k8s.io/community/sig-scalability/slos/slos.md#kubernetes-slisslos
-->
The existing admission request latency might be affected when the feature is used. We expect this to be negligible and will measure it before GA.

###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?

Expand All @@ -2661,6 +2666,20 @@ This through this both in small and large cases, again with respect to the

[supported limits]: https://git.k8s.io/community//sig-scalability/configs-and-limits/thresholds.md
-->
We don't expect it to. Especially comparing to the existing method to achieve the same goal, using this feature will not result in non-negligible increase of resource usage.

###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?

<!--
Focus not just on happy cases, but primarily on more pathological cases
(e.g. probes taking a minute instead of milliseconds, failed pods consuming resources, etc.).
If any of the resources can be exhausted, how this is mitigated with the existing limits
(e.g. pods per node) or new limits added by this KEP?

Are there any tests that were run/should be run to understand performance characteristics better
and validate the declared limits?
-->
No.

### Troubleshooting

Expand All @@ -2676,6 +2695,7 @@ details). For now, we leave it here.
-->

###### How does this feature react if the API server and/or etcd is unavailable?
Same as without this feature.

###### What are other known failure modes?

Expand All @@ -2691,9 +2711,15 @@ For each of them, fill in the following information by copying the below templat
Not required until feature graduated to beta.
- Testing: Are there any tests for failure mode? If not, describe why.
-->
N/A

###### What steps should be taken if SLOs are not being met to determine the problem?

- The feature can be disabled by disabling the API or setting the feature-gate to false if the performance impact of it is not tolerable.
- Try to run the validations separately to see which rule is slow
- Remove the problematic rules or update the rules to meet the requirement


## Implementation History

<!--
Expand Down
4 changes: 2 additions & 2 deletions keps/sig-api-machinery/3488-cel-admission-control/kep.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -33,12 +33,12 @@ see-also:
- "/keps/sig-api-machinery/2876-crd-validation-expression-language"

# The target maturity stage in the current dev cycle for this KEP.
stage: alpha
stage: beta

# The most recent milestone for which work toward delivery of this KEP has been
# done. This can be the current (upcoming) milestone, if it is being actively
# worked on.
latest-milestone: "v1.27"
latest-milestone: "v1.28"

# The milestone at which this feature was, or is targeted to be, at each stage.
milestone:
Expand Down

0 comments on commit ad1b439

Please sign in to comment.