Skip to content

Commit

Permalink
Merge pull request #3554 from cici37/cel-admission-control
Browse files Browse the repository at this point in the history
KEP-3488: Add PRR and test section for Cel admission control
  • Loading branch information
k8s-ci-robot authored Oct 5, 2022
2 parents eed6b07 + 130f0b3 commit 4c20efa
Show file tree
Hide file tree
Showing 3 changed files with 43 additions and 6 deletions.
3 changes: 3 additions & 0 deletions keps/prod-readiness/sig-api-machinery/3488.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
kep-number: 3488
alpha:
approver: "@johnbelamaric"
44 changes: 39 additions & 5 deletions keps/sig-api-machinery/3488-cel-admission-control/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,7 @@
- [Integration tests](#integration-tests)
- [e2e tests](#e2e-tests)
- [Graduation Criteria](#graduation-criteria)
- [Alpha](#alpha)
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
- [Version Skew Strategy](#version-skew-strategy)
- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
Expand Down Expand Up @@ -1821,6 +1822,7 @@ to implement this enhancement.
Based on reviewers feedback describe what additional tests need to be added prior
implementing this enhancement to ensure the enhancements have also solid foundations.
-->
N/A

##### Unit tests

Expand Down Expand Up @@ -1854,8 +1856,16 @@ For Alpha, describe what tests will be added to ensure proper quality of the enh
For Beta and GA, add links to added tests together with links to k8s-triage for those tests:
https://storage.googleapis.com/k8s-triage/index.html
-->
In the first alpha phase, the integration tests are expected to be added for:
- The behavior with feature gate and API turned on/off and mix match
- The happy path with everything configured and validation proceeded successfully
- Validation with different enforcement policies
- Validation with different failure policies
- Validation with different Match Criteria
- Validation violations for different reasons including type checking failures, misconfiguration, failed validation, etc and formatted messages
- Singleton policy
- Validation limit check including cost limit, max policy binding limit, max length, etc.

- <test>: <link to test coverage>

##### e2e tests

Expand All @@ -1868,8 +1878,8 @@ https://storage.googleapis.com/k8s-triage/index.html

We expect no non-infra related flakes in the last month as a GA graduation criteria.
-->

- <test>: <link to test coverage>
We will test the edge cases mostly in integration test and unit test.
We may add e2e test for spot check of the feature presence.

### Graduation Criteria

Expand Down Expand Up @@ -1934,6 +1944,10 @@ in back-to-back releases.
- Address feedback on usage/changed behavior, provided on GitHub issues
- Deprecate the flag
-->
#### Alpha

- Feature implemented behind a feature flag
- Ensure proper tests are in place.

### Upgrade / Downgrade Strategy

Expand All @@ -1948,6 +1962,8 @@ enhancement:
- What changes (in invocations, configurations, API use, etc.) is an existing
cluster required to make on upgrade, in order to make use of the enhancement?
-->
In alpha, no changes are required to maintain previous behavior. And the feature gate
and the required API could be turned on to make use of the enhancement.

### Version Skew Strategy

Expand All @@ -1963,6 +1979,7 @@ enhancement:
- Will any other components on the node change? For example, changes to CSI,
CRI or CNI may require updating that component before the kubelet.
-->
N/A

## Production Readiness Review Questionnaire

Expand Down Expand Up @@ -2007,8 +2024,8 @@ well as the [existing list] of feature gates.
-->

- [ ] Feature gate (also fill in values in `kep.yaml`)
- Feature gate name:
- Components depending on the feature gate:
- Feature gate name: CelValidatingAdmissionExtensibility
- Components depending on the feature gate: kube-apiserver
- [ ] Other
- Describe the mechanism:
- Will enabling / disabling the feature require downtime of the control
Expand All @@ -2022,6 +2039,7 @@ well as the [existing list] of feature gates.
Any change of default behavior may be surprising to users or break existing
automations, so be extremely careful here.
-->
No, default behavior is the same.

###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?

Expand All @@ -2035,8 +2053,10 @@ feature.

NOTE: Also set `disable-supported` to `true` or `false` in `kep.yaml`.
-->
Yes, disabling the feature will result in validation expressions being ignored.

###### What happens if we reenable the feature if it was previously rolled back?
The validatingAdmissionPolicy will be enforced again.

###### Are there any tests for feature enablement/disablement?

Expand All @@ -2052,6 +2072,7 @@ feature gate after having objects written with the new field) are also critical.
You can take a look at one potential example of such test in:
https://github.com/kubernetes/kubernetes/pull/97058/files#diff-7826f7adbc1996a05ab52e3f5f02429e94b68ce6bce0dc534d1be636154fded3R246-R282
-->
Unit test and integration test will be introduced in alpha implementation.

### Rollout, Upgrade and Rollback Planning

Expand All @@ -2070,13 +2091,24 @@ feature flags will be enabled on some API servers and not others during the
rollout. Similarly, consider large clusters and how enablement/disablement
will rollout across nodes.
-->
The existing workload could potentially fail the validation and cause unexpected failures if the validation is misconfigured.

While rollout, the cluster administrator could configure the feature with `enforcement: [Warn, Audit]` or similar,
and wait until being comfortable to switch the enforcement level to `Deny`. In this way it will minimize the effect on the running workloads.

Note that if the request was in while the feature is off, when the feature is turned back on,
the validation policy might prevent it from editing if validation rules applied.

###### What specific metrics should inform a rollback?

<!--
What signals should users be paying attention to when the feature is young
that might indicate a serious problem?
-->
On a cluster that has not yet opted into ValidatingAdmissionPolicy, non-zero counts for either of the following metrics mean
the feature is not working as expected:
- cel_admission_validation_total
- cel_admission_validation_errors

###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?

Expand All @@ -2085,12 +2117,14 @@ Describe manual testing that was done and the outcomes.
Longer term, we may want to require automated upgrade/rollback tests, but we
are missing a bunch of machinery and tooling and can't do that now.
-->
Upgrade and rollback will be tested before the feature goes to Beta.

###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?

<!--
Even if applying deprecation policies, they may still surprise some users.
-->
No.

### Monitoring Requirements

Expand Down
2 changes: 1 addition & 1 deletion keps/sig-api-machinery/3488-cel-admission-control/kep.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ authors:
owning-sig: sig-api-machinery
participating-sigs:
- sig-auth
status: provisional
status: implementable
creation-date: 2022-09-02
reviewers:
- "@TristonianJones"
Expand Down

0 comments on commit 4c20efa

Please sign in to comment.