-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-4153: Declarative Validation Proposal and Design #4163
KEP-4153: Declarative Validation Proposal and Design #4163
Conversation
| 'dns1123subdomain' | metadata name and generateName | | ||
| 'dns1123label' | metadata name and generateName | | ||
| 'dns1035label' | Scoped names and keys | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about this a bit more, I think we should offer nicknames for these formats. Maybe we don't need a nickname for all of them, but at least the most popular name format (I believe most names in kubernetes use dns1123subdomain) we should offer a simpler name. Maybe "io.k8s.identifier" or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I like that they are named after exactly what they are. RFC numbers are unambiguous. Though if there are k8s-isms about our implementation them it would probably be good at least to namespace them i.e. io.k8s.dns1123subdomain
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
honestly, we should just get rid of 1035 and normalize everything to 1123 :)
- IP and CIDR library: | ||
- `isIP(string) bool` | ||
- `ip(string) IP` | ||
- `IP.is4() bool` | ||
- `IP.is6() bool` | ||
- `IP.isLoopback() bool` | ||
- `isCIDR(string) bool` | ||
- `cidr(string) CIDR` | ||
- `CIDR.overlaps(CIDR) bool` | ||
- `CIDR.containsIP(IP) bool` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a great start! A few questions:
- How would IP comparison work? Some places in Kubernetes I believe we already trim whitespace before validating IP for example.
- Could we trim the IP out of a larger string like an IP:Port pair?
- In addition to the proposed CIDR functions, could we have
CIDR.containsCIDR(CIDR)
?
Can we provide a path for projects to define their own helpers? I can think of several Gateway API helpers I'd like to have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to audit a bunch of hand-written code and brainstorm the least-painful CEL equivalents and for things that are just TOO painful, derive new helper functions. See my comment above. As a non-CEL expert, I don't know all of what's viable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thockin CEL team would be happy to accept a package for working with CIDR masks. We try to remain compatible with ZetaSQL, so if there's a subset of functions from there which would be desired here, we can iterate on a CEL-idiomatic way of exposing them. If you need more than what's listed, we can also support that too, but would like to align where possible.
In general, even for other packages, I can consult on what is general enough to include in CEL and what should probably be K8s-specific.
- UTF8 Library (for CRD validation, ManagedFields) | ||
- IsValid | ||
- IsPrint | ||
- IsControl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this intended to be used with strings or individual chars? I worry that this function list will expand over time. Is this possible to achieve using just regex? How much worse is the regex?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UTF8 spec is so crisp here that the performance would be better with special functions. I wouldn't use regex here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For ManagedFIelds we need to test that all characters in the entire string are printable. The Go unicode.IsPrint
seems to very specifically specify all printable characters (source).
For CRDs we test if the whole string is valid, and whether the whole string uses printable, non-control chars:
https://github.com/kubernetes/kubernetes/blob/12dc19d46fb97cbbfeb1e12b8a10ff7ae73d9515/staging/src/k8s.io/apiextensions-apiserver/pkg/apis/apiextensions/validation/validation.go#L271-L283
- IsPrint | ||
- IsControl | ||
- Selectors | ||
- `Matches(LabelSelector)` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of supporting selectors, and I agree native types will need it. I just wanted to point out that declaring LabelSelectors (a complex object type) in CEL will require using CEL struct initialization in a more complex way than we've supported before, so this will be quite a bit of work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I always thought it would be fine to map selector -> CEL, but that you might not want to mix the two so you could yourself room for optimizing via indexing with special operators.
- `APIVersion` (GroupVersion) | ||
- Quantity | ||
- Qualified name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add more detail here? Pods resources will need this, so I think it's going to be important, but I haven't spent the time to break down exactly what is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should change "qualified name" to something less confusing. maybePrefixedName
or labelKey
or something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added links to the explicit Go validations being referred to. These are terms used all over k8s validation, so I was planning on simply exposing most of the functions from apimachinery/pkg/util/validation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know "qualified name" is used within the codebase, but it's a terrible name (and it's my fault).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair disclosure- I'd helped with the early proposals for this feature, so I'm biased, but I'm very supportive of this approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @alexzielenski!
|
||
Declarative validation will benefit Kubernetes maintainers: | ||
|
||
- It will make it easier to develop, maintain and review APIs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although this may true for the simplest of validations, we've had significant trouble with this in Gateway API. It is very easy for CEL rules to get out of hand quickly. In Gateway API we're using CEL rules primarily because we don't want to maintain a validating webhook, but if we had the choice between CEL and built-in validations written and tested with go, I'm pretty sure we'd choose go.
Before we go all-in on this, it would be very helpful to see how we can make CEL validation rules more manageable to write and maintain. Can we define them somewhere else? Could they span multiple lines? In general, trying to fit each validation rule into a single line within API spec has not been a good experience so far for development, maintainability, or review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is critical feedback. I agree that this KEP should be blocked from GA, maybe even from beta, by positive feedback from some preponderance of use-cases (e.g. converting "core" validation to use it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree putting all of the validation, error, reason, etc. in one line is not good UX for writing or reviewing, which is counter to our goal.
I think at the very least we should have the ability to split the annotation for the rule, reason, fieldPath into separate lines. We can add this to the code generator. I will add that to our list of requirements for adding x-kubernetes-validations marker comments to the code generator.
Another QoL improvement which would assist in expression writing might be to support variables as we do for ValidatingAdmissionPolicy. I think that is out of scope for this KEP since it would need to be added to CRD Validation Rules (KEP will just use the same API exposed)
I also think referring to constants from code is something to be explored, but seems challenging.
TLDR: We can and should do easy improvements ahead of time. Many more advanced UX improvements to experience over time into the code-generator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we had a tool to help authoring/testing/maintaining these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tools help, but ultimately humans have to read and review them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the specifics, Rob. As we collectivel learn how to do this, you are again paying the trail-blazer tax :)
A few things to think about:
-
We don't need to block the KEP on solving this, just acknowledging that it needs to be solved.
-
There are going to be "fairly trivial" validations that are easy to read and comprehend. There are going to be "fairly complex" validations that would benefit from being written as multi-line expressions (maybe with comments!) which seems like a k8s problem rather than intrinsic to CEL. And there will be some "very complex" validations which do not easily distill into a single expression. For this last set, we need to consider those specific cases. It may be that there are idioms or capabilities missing in CEL that we can help define. It may be that there are library functions missing. It may be that some validations just DON'T fit very well.
-
Our near-to-medium term goal is not 100% elimination of hand-written validation. Let's take wins where we can get them and iterate on how to fit the hard-cases.
-
I want to believe that it's in everyone's best interest to make CEL work well here, so I will assume that reasonable questions will be met with reasonable consideration until otherwise demonstrated :) Let's not assume that everything is fully baked and we have no degrees of freedom.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest we create "The International Obfuscated CEL Code Contest" (see https://www.ioccc.org/ for those not familiar with it). 🤣
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Completely agree with everything @thockin said, not trying to block this KEP at all, just trying to find ways we can improve the overall experience here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@robscott Thanks for sharing the detailed notes about your experience. It's useful in how I consider which sorts of tools the CEL team should consider building in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is critical feedback. I agree that this KEP should be blocked from GA, maybe even from beta, by positive feedback from some preponderance of use-cases (e.g. converting "core" validation to use it).
I share reservations about whether CEL is actually easier than golang, particular for feature-gated fields, tests that require oldSelf, and cross field references. I agree this does not block alpha. I think it should prevent beta (on by default).
- No change to error message structure for clients. Difference to clients is | ||
limited to minimal changes to error detail strings. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this new approach have a way to access common + predefined error messages like we already have for go validations? In Gateway API we've found it pretty challenging to fit both the message and rule in a single line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mentioned elsewhere multi-line expressions. I suspect errors will need some more robust support too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For non-CEL validations we will be using the field.Error
types you linked with the predefined error messages. For CEL validations the errors currently have to be duplicated in the CEL rule.
This sounds like something that could be added to the code generator if not supported directly by CEL Validation Rules, so if it becomes a problem we can add it later without too much friction
- IP and CIDR library: | ||
- `isIP(string) bool` | ||
- `ip(string) IP` | ||
- `IP.is4() bool` | ||
- `IP.is6() bool` | ||
- `IP.isLoopback() bool` | ||
- `isCIDR(string) bool` | ||
- `cidr(string) CIDR` | ||
- `CIDR.overlaps(CIDR) bool` | ||
- `CIDR.containsIP(IP) bool` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a great start! A few questions:
- How would IP comparison work? Some places in Kubernetes I believe we already trim whitespace before validating IP for example.
- Could we trim the IP out of a larger string like an IP:Port pair?
- In addition to the proposed CIDR functions, could we have
CIDR.containsCIDR(CIDR)
?
Can we provide a path for projects to define their own helpers? I can think of several Gateway API helpers I'd like to have.
- Vast majority (95%+) of hand written validation are replaced with declarative | ||
validation. The remaining hand written rules will primarily be rules that _should | ||
not_ be published as part of the API, usually because server side state is | ||
involved in the validation decision. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we ensure this migration is safe? We tried our best with Gateway API and still made at least one key mistake in the process. We thought we had thorough test coverage but unfortunately still missed things. I'd imagine it will be very difficult to safely translate the wide set of existing Kubernetes validations to CEL without making some mistakes along the way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need to establish conventions for testing. Does a type which ONLY uses declarative validation need a validation test? Can we provide a streamlined way of writing those tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can expect to have an automated process that catches 100% of everything. We are moving slowly and carefully with this transition.
Automated tests will be as thorough as possible. This includes coverage of existing validation unit tests, as well as instrumenting ALL current upstream tests to use declarative validation to shake out differences.
Until GA we will compare the validation output from the new backend to the Go backend and log the diff (# of differences tracked in a metric)
We won't promote to GA until we are confident that the number of differences is near zero. To inform that confidence we might be able to look at the above mentioned metric at the cloud operator level, and look at frequency of user submitted bug reports from the differences reported in their logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alexzielenski please let the CEL team know how we can support you in this effort. We'd be happy to find a way to offer more standardized tooling around testing. If it happens to be K8s-specific, that is fine too, but if there's a way to generalize a part of the problem so all CEL users could benefit that would be great.
76d76b9
to
a10bd48
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for nitpicking. This is a LOT of detail, maybe too much.
I think the way to go about this is to iterate - go thru validation code and do all the easy ones (like numeric comparisons, reuquiredness, etc). Then pick some of the simpler formats and go do all of those. Then crank up the complexity. At any point in time, the codebase WORKS, but as we figure things out, we might evolve the expressions we use.
```go | ||
// staging/src/k8s.io/api/core/v1/types.go | ||
|
||
// +validationRule="!self.hostNetwork || self.containers.all(c, c.containerPort.all(cp, cp.hostPort == cp.containerPort))" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be explicit about CEL (so if any OTHER language arises later, we can have room?
e.g.
+validationRule=cel("...")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is a needed. In the API where we have CEL we've been careful to arrange stanzas such that we could add a language field later if needed. We should future proof go tags in some way to allow for this as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a similar vein, I think the final tag should have a prefix (like kubebuilder:
), so we can confidently fail closed on unrecognized ones.
- IP and CIDR library: | ||
- `isIP(string) bool` | ||
- `ip(string) IP` | ||
- `IP.is4() bool` | ||
- `IP.is6() bool` | ||
- `IP.isLoopback() bool` | ||
- `isCIDR(string) bool` | ||
- `cidr(string) CIDR` | ||
- `CIDR.overlaps(CIDR) bool` | ||
- `CIDR.containsIP(IP) bool` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to audit a bunch of hand-written code and brainstorm the least-painful CEL equivalents and for things that are just TOO painful, derive new helper functions. See my comment above. As a non-CEL expert, I don't know all of what's viable.
- `APIVersion` (GroupVersion) | ||
- Quantity | ||
- Qualified name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should change "qualified name" to something less confusing. maybePrefixedName
or labelKey
or something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm overall really happy with this and I think we should merge and iterate. I don't see any fundamental objections?
| string format | `+format={format name}` | `format` | Invalid | | ||
| size limits | `+min{Length,Properties,Items}`, `+max{Length,Properties,Items}` | `min{Length,Properties,Items}`, `max{Length,Properties,Items}` | TooMany, TooLong | | ||
| numeric limits | `+minimum`, `+maximum`, `+exclusiveMinimum`, `+exclusiveMaximum` | `minimum`, `maximum`, `exclusiveMinimum`, `exclusiveMaximum` | Invalid | | ||
| required fields | `+optional` (exists today) | `required` | Required | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly. When I say "formalize" I really mean "write down how people express the difference". Today we only have +optional and +default is pretty new and not widely used. Before it becomes the norm, let's just write down (API convention doc or something) which one (or both) should be used.
| 'dns1123subdomain' | metadata name and generateName | | ||
| 'dns1123label' | metadata name and generateName | | ||
| 'dns1035label' | Scoped names and keys | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
honestly, we should just get rid of 1035 and normalize everything to 1123 :)
the hand written validation rules. | ||
- It will enable clients to perform validation of native types earlier in | ||
development worflows ("shift-left"), such as at with a Git pre-submit linter. | ||
- It will improve API composition. In particular CRDs that embed native types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, since we have historical baggage, we may not be able to do "obvious" things like I suggested above. I recently changed ONE field to stop defaulting in *.PodSpec and instead only in Pod.PodSpec, and I broke things (doing a diff of old saved state with new dry-run results - the field was marked absent).
You'll want to squash, I think :) |
b982996
to
d273093
Compare
179e8ba
to
a43d249
Compare
I think this looks pretty good, @jpbetz ? |
02c09d5
to
a777079
Compare
Have not had time to review all of this in detail, but no objections from me based on what I've been able to review. Thanks for driving this forward! |
- "@jpbetz" | ||
owning-sig: sig-api-machinery | ||
status: implementable | ||
creation-date: 2023-08-20 | ||
reviewers: | ||
- apelisse | ||
- jpbetz | ||
- thockin | ||
- robscott | ||
approvers: | ||
- thockin | ||
- jpbetz |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a little strange to have an author as a reviewer and approver.
c4f22ba
to
53183bf
Compare
LGTM |
LGTM for me too. |
c9fda45
to
3dc5beb
Compare
Co-authored-by: Joe Betz <[email protected]>
3dc5beb
to
3801dda
Compare
/approve Leaving lgtm with @deads2k |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a big change. I don't see anything that makes me think we can't start. I think it's possible that we'll find additional beta criteria as the alpha is implemented.
```go | ||
// staging/src/k8s.io/api/core/v1/types.go | ||
|
||
// +validationRule="!self.hostNetwork || self.containers.all(c, c.containerPort.all(cp, cp.hostPort == cp.containerPort))" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a similar vein, I think the final tag should have a prefix (like kubebuilder:
), so we can confidently fail closed on unrecognized ones.
|
||
Declarative validation will benefit Kubernetes maintainers: | ||
|
||
- It will make it easier to develop, maintain and review APIs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is critical feedback. I agree that this KEP should be blocked from GA, maybe even from beta, by positive feedback from some preponderance of use-cases (e.g. converting "core" validation to use it).
I share reservations about whether CEL is actually easier than golang, particular for feature-gated fields, tests that require oldSelf, and cross field references. I agree this does not block alpha. I think it should prevent beta (on by default).
- Enablement/Disablement tests | ||
- Initial e2e tests completed and enabled | ||
|
||
#### Beta |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think beta will need to address feedback like: https://github.com/kubernetes/enhancements/pull/4163/files#r1329051141
| string format | `+format={format name}` | `format` | Invalid | | ||
| size limits | `+min{Length,Properties,Items}`, `+max{Length,Properties,Items}` | `min{Length,Properties,Items}`, `max{Length,Properties,Items}` | TooMany, TooLong | | ||
| numeric limits | `+minimum`, `+maximum`, `+exclusiveMinimum`, `+exclusiveMaximum` | `minimum`, `maximum`, `exclusiveMinimum`, `exclusiveMaximum` | Invalid | | ||
| required fields | `+optional` (exists today) | `required` | Required | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The user perspective varies depending on whether you're the writer or the reader:
- writer: defaulted fields are optional, I don't have to specify
- reader: default fields are required, they will always have values.
fwiw, I personally find that defaulted fields as required matches my expectations, but allowing exactly one of required, optional, default maybe the clearest.
|
||
##### Mitigation: Avoid Conversion to Internal Type | ||
|
||
Requests are received as the versioned type, so it should be feasible to avoid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear to me when defaulting would happen if this was done. It's also not clear to me how we ensure that validation for different versions stays in sync
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alexzielenski, deads2k, jpbetz The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm |
/sig api-machinery
/assign @jpbetz @apelisse