diff --git a/keps/sig-api-machinery/customresources-pruning-and-defaulting.md b/keps/sig-api-machinery/customresources-pruning-and-defaulting.md index f27a97aa44e..96f9e8d3dc2 100644 --- a/keps/sig-api-machinery/customresources-pruning-and-defaulting.md +++ b/keps/sig-api-machinery/customresources-pruning-and-defaulting.md @@ -1,6 +1,6 @@ --- kep-number: 24 -title: Defaulting and Pruning for Custom Resources +title: Pruning for Custom Resources status: provisional authors: - "@sttts" @@ -17,32 +17,28 @@ reviewers: - "@apelisse" approvers: - "@deads2k" - - "@lavalamp - substituted by @fedebongio while @lavalamp is out" + - "@lavalamp" editor: name: "@sttts" creation-date: 2018-07-31 last-updated: 2018-07-31 --- -# Defaulting and Pruning for Custom Resources +# Pruning for Custom Resources ## Table of Contents -* [Defaulting and Pruning for Custom Resources](#defaulting-and-pruning-for-custom-resources) +* [Pruning for Custom Resources](#pruning-for-custom-resources) * [Table of Contents](#table-of-contents) * [Overview](#overview) * [Goals](#goals) * [Non-Goals](#non-goals) * [Motivation](#motivation) * [Pruning](#pruning) - * [Defaulting](#defaulting) * [Mixing of Schema and Value Validation](#mixing-schema-and-value-validation) * [Formal Proposal – following pruning option 1]() * [Types and Formats](#types-and-formats) * [Polymorphic Fields](#polymorphic-fields) - * [Defaulting](#restrictions-on-defaults) * [Excluding values from Pruning](#excluding-values-from-pruning) - * [Related topis](#related-topics) - * [Ratcheting (slightly related)](#ratcheting-(slightly-related)) * [References](#references) * [Alternatives Considered](#alternatives-considered) @@ -54,18 +50,16 @@ Without consistency of data in etcd, objects can suddenly render unaccessible on CustomResources are persisted as JSON blobs today (with the exception of `ObjectMeta` which is pruned since Kubernetes 1.11), i.e. we do not drop unknown fields, with the described consequences. This proposal is about adding a decoding step named "pruning" into the decoder from JSON to `unstructured.Unstructured` inside the apiextensions-apiserver. This pruning step will drop fields not specified in the OpenAPI validation spec, leading to the same persistence semantics as for native types. -Algorithmically deeply related to validation and pruning is "defaulting", i.e. the application of the OpenAPI "default" values. Defaulting is done after decoding for native types, before conversion. This proposal suggests to add defaulting just after pruning to the CustomResource JSON decoder. - ## Goals * Prune unknown fields from CustomResources silently. Unknown means not specified in the OpenAPI validation spec. -* Allow to opt-out of pruning via the OpenAPI validation spec for a subtree of the JSON objects. -* Apply OpenAPI validation schema defaults to CustomResources. -* Have simple semantics for pruning and defaulting. +* Allow to opt-out of pruning via the OpenAPI validation spec for a whole subtree of the JSON objects. +* Have simple semantics for pruning. +* Be extensible to defaulting at a later point. ## Non-Goals * Add a strict mode to the REST API which rejects objects with unknown fields. -* Propose or decide anything about ratcheting. [We discuss the topic](#ratcheting-(slightly-related)) because it is related, but this needs a follow-up KEP. +* Propose or decide anything about defaulting. ## Motivation ### Pruning @@ -132,60 +126,6 @@ From these options: **Here we propose not to follow 2 and 3 due to ambiguity of 2 and the danger of user mistakes of 3. Both 1 and 4 lead to the same pruning behaviour and only differ in whether the user has to re-specify property keys or whether they are automatically derived.** -### Restrictions on Defaults - -The OpenAPI validation schema allows to define defaults via the `default` field. For example: - -```json -{"properties": {"a": {"default": {"x": 42}}}} -``` - -An empty object `{}` would be defaulted to `{"a":{"x":42}}` - - -Obviously, the default values should validate. But OpenAPI validation schemata do not enforce that. Also one default in a branch of `anyOf`, `allOf`, `oneOf` might force another validation run onto another branch of these quantors. - -Even worse: forcing onto another branch might make more defaults applicable. In other words, we would need to run validation with defaulting until it converges. This is a hint that the semantics of defaulting in full OpenAPI schemata are not well defined either. - -**We propose to allow defaults only outside of `anyOf`, `allOf`, `oneOf`, `not`.** - -Without that, we would get ambiguous defaulting semantics: - -#### Example 4: - -> Assume ->```json ->{ -> "anyOf":[ -> {"default": 1}, -> {"default": 2} -> ] ->} ->``` -> -> Which branch should be chosen while applying the defaults. - -#### Example 5: - -> Assume -> ->```json ->{ -> "anyOf":[ -> {"properties": {"a": {"pattern": "1"}, "b": {"type": "string"}, "d": {"default": 1}}}, -> {"properties": {"a": {"pattern": "2"}, "c": {"type": "string"}, "d": {"default": 2}}} -> ] ->} ->``` -> -> Take an object `{"a": "1", "b":"foo", "c":"bar"}`. Assuming we follow pruning option 1, it is pruned to `{"a": "1", "b":"foo", "c":"bar"}` and to default to `{"a":"1", "b": "foo", "c":"bar", "d": 1}`. -> -> But what about the object `{"a": "12", "b":"foo", "c":"bar"}`? Both patterns apply. With pruning option 1 we keep `"b"` and `"c"`. But what about the default? Should we default to `1` or `2`? - -Clearly, we can make the algorithm deterministic by choosing one, e.g. the first matching branch. This choice is arbitrary, hard to understand, and gets even more confusing with even more nested `allOf`, `anyOf` or even `not`. - -We can later become less restrictive about the allowed positions of default values in the scheme. I.e. the restriction of only allowing default outside of `allOf`, `anyOf`, `oneOf` and `not` is a starting point with a way forward if necessary. - ### Mixing of Schema and Value Validation The OpenAPI validation schema mixes the actual structural schema validation (which is usually done by the Golang struct JSON decoding for native types) @@ -193,7 +133,7 @@ and the value validation (usually done in the validation step of the API server For CRDs we cannot distinguish both. This was first noticed by @lavalamp in https://github.com/kubernetes/kubernetes/pull/64907#issuecomment-397015030. -#### Example 6 +#### Example 4 > Assume > @@ -210,53 +150,32 @@ For CRDs we cannot distinguish both. This was first noticed by @lavalamp in http **Remark:** the line between type and format is blurry. Format is optional to be processed by tools, and the value is open ended. In Kube we have some formats (like `date`) which correspond to custom JSON unmarshallers in the native types. That format would be verified during decoding, although technically a date is just a string. So we might want to replicate that behaviour at some point. -For pruning and defaulting we have to apply the OpenAPI validation schema inside of the decoder step (the left box inside apiextensions-apiserver in the figure above). This is considerably earlier than for native types. This has a number of implications: +For pruning (and possibly later defaulting) we have to apply the OpenAPI validation schema inside of the decoder step (the left box inside apiextensions-apiserver in the figure above). This is considerably earlier than for native types. This has a number of implications: * We have to apply full OpenAPI value validation (e.g. regular expressions, propositional evaluation) during decoding. * Our generic registry applies validation after defaulting. We need the other way around for CRDs. -* Our generic registry expects defaulting to always succeed (no error result type). CRD validation needed by defaulting can fail. +* Our generic registry expects defaulting to always succeed (no error result type). CRD validation needed by OpenAPI defaulting would be able to fail. -To avoid these and to get sane semantics, **we propose to split the CRD validation into two steps:** -1. During decoding we validate using a skeleton schema which lacks value validations and propositional logic (`anyOf`, `allOf`, `oneOf`, `not`). -2. During defaulting we validate using the skeleton schema and apply the resulting defaulter. -During standard generic registry validation phase we validate using the full validation schema. +To avoid these and to get sane semantics, **we propose to split the CRD validation into two top-level steps:** -Because the first step succeeds, also the second step succeeds. So we don’t need error handling during defaulting. The third step will catch wrong types of the used defaults. +1. During decoding we validate using a skeleton schema which lacks value validations and propositional logic (`anyOf`, `allOf`, `oneOf`, `not`). +2. During standard generic registry validation phase we validate using the full validation schema. -Note that defaulting – like for native types – will be done -1. before conversion of the incoming API objects and of objects read from storage and -2. after mutating admission webhooks (compare [k8s.io/apiserver/pkg/admission/plugin/webhook/mutating/dispatcher.go](https://github.com/kubernetes/kubernetes/blob/e4c219df42c77ecb8f0588197072bef81bca7429/staging/src/k8s.io/apiserver/pkg/admission/plugin/webhook/mutating/dispatcher.go#L152)). +Step 1 is extensible to defaulting based on the skeleton validation result. Step 2 would then catch wrong types of the used defaults. ## Formal Proposal – following pruning option 1 -The examples in the motivational section show that the semantics of full OpenAPI validation schemata are not trivial in respect to pruning and defaulting. To simplify the algorithms and to enforce "sane" schemata which allow to split schema and value validation, we propose to derive a skeleton schema from the full user-given OpenAPI validation schema +The examples in the motivational section show that the semantics of full OpenAPI validation schemata are not trivial in respect to pruning. To simplify the algorithms and to enforce "sane" schemata which allow to split schema and value validation, we propose to derive a skeleton schema from the full user-given OpenAPI validation schema * which does not contain value validations -* which is complete enough for pruning and defaulting. +* which is complete enough for pruning (and possibly later defaulting). -**Remark:** we have two options to define and implement pruning and defaulting: +**Remark:** we have two options to define and implement pruning: 1. directly on the full OpenAPI validation schema with two custom algorithms doing parallel recursion over the schema and the input object. -2. via an intermediate representation (the skeleton schema) and using go-openapi pruning and defaulting. Both algorithms are ten-liners based on the go-openapi/validate output. With the skeleton schema the go-openapi pruning algorithm coincides with our pruning option 1. +2. via an intermediate representation (the skeleton schema) and using go-openapi pruning. Both algorithms are ten-liners based on the go-openapi/validate output. With the skeleton schema the go-openapi pruning algorithm coincides with our pruning option 1. Both routes lead to the same algorithm: the intermediate representation of the skeleton schema makes merging of OpenAPI validation schema constraints explicit, while the custom algorithms would hide that in its recursion code. Moreover, the main reason for the intermediate schema: the custom algorithm would have to replicate a lot of the validation logic of go-openapi, e.g. the semantics of `properties`, `additionalPropoerties`, `patternProperties` and the same for items of an array. With route 2 we get all of this for free. -### Example 7: - -> Assume -> ->```json ->{ -> "patternProperties": { -> "^foo-.*": { "properties": {"a":{}}, "c": {"default": 1} }, -> "^bar-.*": { "properties": {"b":{}}, "c": {"default": 2} } -> } ->} ->``` -> ->An object `{"foo-42": {"a": 1, "b": 2}, "bar-42": {"a": 1", "b": 2}}` will be pruned to `{"foo-42": {"a": 1}, "bar-42": {"b": 2}}` and defaulted to `{"foo-42": {"a": 1, "c": 1}, "bar-42": {"b": 2, "c": 2}}`. -> ->Note that the default and the pruning only depends on the object keys, not on values. - ### Definition: skeleton schema > For a given OpenAPI validation schema `s` the skeleton schema `skel(s)` is derived by @@ -267,8 +186,7 @@ Moreover, the main reason for the intermediate schema: the custom algorithm woul > * `additionalItems`, > * `properties`, > * `patternProperties`, -> * `additionalProperties`, -> * `default`, +> * `additionalProperties` > giving `drop(s)`, > 2. then merging `s_i` into it’s containing object. > @@ -284,7 +202,6 @@ Moreover, the main reason for the intermediate schema: the custom algorithm woul > "patternProperties": { k_i: merge(v_i1, …, v_ik) for keys appearing in x_ij with values x_ij }, > "additionalProperties": merge(p_1, …, p_i), > for all x_i with defined additionalProperties p_i -> "default": first default value of `x_i` or fail >} >``` @@ -295,7 +212,6 @@ The skeleton schema especially lacks: The skeleton schema `skel(s)` puts * less or equal constraints on `type` than `s`. * every field specified by a `properties` key in `s` is specified in `skel(s)`. -* if there are no default values under `allOf`, `anyOf`, `oneOf`, `not` in `s`, for each JSON path the default matches in `s` and `skel(s)`. The computation of `skel(s)` is `O(size(s))` and `size(skel(s)) <= size(s)`. @@ -303,11 +219,11 @@ Note that a field might be constrained by the `type` construct in the full OpenA **Property:** if the OpenAPI validation schema applies to an object, so does its skeleton schema. -Pruning and defaulting will be implemented based on the skeleton schema of the specified OpenAPI validation schema. +Pruning will be implemented based on the skeleton schema of the specified OpenAPI validation schema. Optionally for debugging, the skeleton schema derived from the validation schema by the apiextensions-apiserver can be stored in the CRD status. -### Example 8 +#### Example 5 >Assume > @@ -345,7 +261,7 @@ In native types, we have custom unmarshallers for date / timestamps. In OpenAPI ### Polymorphic Fields -#### Example 9 +#### Example 6 > Assume > @@ -372,32 +288,9 @@ Hence, we have to add the following restriction to OpenAPI validation schemata: **Restriction 1 on Object Polymorphism:** reject CRD OpenAPI validation schemata `s` with `skel(s)` having `properties` and one of `additionalProperties` or `patternProperties` being defined for the same JSON path. -### Defaulting - -**Restriction 2:** Defaults may only be defined outside of `anyOf`, `allOf`, `oneOf` and `not`. All other defaults are rejected in CRD validation. - -**Consequence:** defaults are not conditional, but static. I.e. they don’t depend on the validity of other parts of the input object. - -#### Example 11 - -> In a OpenAPI validation schema, one can write: -> ->```json ->{ -> "anyOf": [ -> { "properties": {"a": {"pattern": "1"}, "b": {"default": "1"}}}, -> { "properties": {"a": {"pattern": "2"}, "b": {"default": "2"}}} -> ] ->} ->``` -> -> This says to default `b` to the same value as `a`. This cannot be expressed with a skeleton schema. In this sense, the defaults are static. - -Pruning and defaulting will be applied after every unstructured JSON decoding and before conversion. - ### Excluding values from Pruning -There are cases where parts of an object are verbatim JSON, i.e. without any applied schema and especially without a complete specification which allows to apply pruning. E.g. the [`JSONSchemaProps` types in apiextensions-apiserver contain fields of type JSON](https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apiextensions-apiserver/pkg/apis/apiextensions/v1beta1/types_jsonschema.go#L28) for defaults, examples and enums. These should never be pruned (and they are not pruned while decoding using JSON marshalling). +There are cases where parts of an object are verbatim JSON, i.e. without any applied schema and especially without a complete specification which allows to apply pruning. Hence, we need a mechanism to express that in the OpenAPI validation schema (compare https://github.com/kubernetes/kubernetes/pull/64558#issuecomment-403564033). @@ -416,18 +309,16 @@ Note that we lose expressiveness of the existing `format` strings: we either app **Raw JSON Option 2:** add an extension property, e.g. ```json -{"properties": {"x": {"x-no-pruning": true}}} +{"properties": {"x": {"x-kubernetes-no-pruning": true}}} ``` This would not lead to a loss in expressivity. It is formulated negatively intentionally to have `false` as the default with `omitempty`. -As OpenAPI extensions are prefixed with `x-kubernetes` elsewhere, we probably should name it `x-kubernetes-no-pruning: true`. - -**We propose to follow option with the `x-kubernetes` prefix** because it is more explicit, does not reduce expressivity and does not mangle with the already very vague `format` field definition. +**We propose to follow the second option with `x-kubernetes-no-prune`, because it is more explicit, does not reduce expressivity and does not mangle with the already very vague `format` field definition.** ### Nested x-no-pruning -**Question:** should we support nested `x-no-pruning`, i.e. disabling pruning for a sub-object, but re-enable it something deep inside of it? E.g. +**Question:** should we support nested `x-kubernetes-no-prune`, i.e. disabling pruning for a sub-object, but re-enable it something deep inside of it? E.g. ```json { @@ -460,7 +351,9 @@ If we do, the object } ``` -is pruned to `{"x":{"b": 2, "y":{"z": 42}}}`. +would be pruned to `{"x":{"b": 2, "y":{"z": 42}}}`. + +**We propose to disallow nesting of `x-kubernetes-no-prune` and to disallow setting it to false, i.e. `x-kubernetes-no-prune: false`.** We can add nesting later if necessary. ## Opt-in and Opt-out of Pruning on CRD Level @@ -482,25 +375,9 @@ For `apiextensions.k8s.io/v1` we will change the default to `true` and forbid `f When [CRD conversion](https://github.com/mbohlool/community/blob/master/contributors/design-proposals/api-machinery/customresource-conversion-webhook.md) is implemented before this KEP, we will add the pruning field to `type CustomResourceDefinitionVersion`, in analogy to subresources and `additionalPrinterColumns`. -## Related Topics - -### Ratcheting (slightly related) - -In https://github.com/kubernetes/kubernetes/pull/64907 a ratcheting mechanism during validation is introduced: if a specific proposition during validation is not fulfilled in an object stored in etcd, it is not required either during updates of that object. - -Ratcheting does not work well with arbitrary propositional OpenAPI validation schemata because propositions in negative positions (under `not` or `oneOf`) that are weakened via ratcheting will make the whole validation stricter instead of weaker (compare https://github.com/kubernetes/kubernetes/pull/64907#issuecomment-396526377). - -We have multiple options to restrict the use of ratcheting (which could be the property of certain OpenAPI validation schemata constructs): -1. Allow ratcheting only in the skeleton schema. Everything in the skeleton schema is positive. -2. Disallow ratcheting under `not` and under `oneOf`, but allow it everywhere else. -3. Disallow ratcheting in negative positions. This implies that it is disallowed under `oneOf`, but is allowed e.g. under `not: {not: ….}`. -The first is the most strict one, the last is the most relaxed one. On the other hand, 1 is the simplest to reason about. - -**Question:** which of the options do we want? - ## References -* WIP implementation PR https://github.com/kubernetes/kubernetes/pull/64558 +* Pruning implementation PR https://github.com/kubernetes/kubernetes/pull/64558 * [OpenAPI v3 specification](https://github.com/OAI/OpenAPI-Specification/blob/master/versions/3.0.0.md) * [JSON Schema](http://json-schema.org/) * [pruning algorithm in go-openapi](https://github.com/go-openapi/validate/blob/master/post/prune.go)