Admission Webhook to GA #736

mbohlool · 2019-01-28T17:01:22Z

Enhancement proposal to promote Admission Webhooks to GA.

@deads2k @liggitt @sttts @erictune @caesarxuchao @roycaihw @fedebongio @lavalamp

mbohlool · 2019-01-28T17:28:48Z

I think that any plan to graduate to GA needs to address the existent problems that have been identified around admission plugin ordering. Notably kubernetes/kubernetes#64333 which prevents mutating admission plugins from manipulating pods.
/hold

I agree and setup a follow up meeting for that. But for the holding part, the KEP process encourages to merge the KEP as soon as possible and iterate on it. The status of the KEP would be provisional until we answer these follow up questions.

liggitt · 2019-01-28T17:39:00Z

But for the holding part, the KEP process encourages to merge the KEP as soon as possible and iterate on it. The status of the KEP would be provisional until we answer these follow up questions.

The intent is to capture iterative consensus. A spot in the doc listing open questions would temper the "this is the agreed-on list for GA" impression.

mbohlool · 2019-01-28T17:51:27Z

The intent is to capture iterative consensus. A spot in the doc listing open questions would temper the "this is the agreed-on list for GA" impression.

Added open questions to the Graduation section.

keps/sig-api-machinery/00xx-admission-webhooks-to-ga.md

mbohlool · 2019-01-30T05:55:30Z

@liggitt @deads2k Answered or applied your comments. PTAL.

deads2k · 2019-01-30T13:09:33Z

@deads2k look at number 6 in this document: https://github.com/kubernetes/enhancements/blob/master/keps/0000-kep-template.md

Merging early works, but only if we strip out all bits with comments. We did that with Chao's storage migration one and it worked well. In this case, we're up against a deadline.

keps/sig-api-machinery/00xx-admission-webhooks-to-ga.md

liggitt · 2019-01-30T16:20:08Z

I don't want this to run afoul of a paperwork deadline, but I also don't want to merge something that needs edits. I'm really not sure what to do paperwork-wise. @liggitt advice? These are fairly non-contentious updates on the path to stability, it would be a real shame if paperwork prevented us from moving forward.

deadline for merging and marking implementable got extended to monday because the originally documented requirements were unclear. let's work to resolve the open questions today/tomorrow

lavalamp · 2019-01-30T19:08:22Z

keps/sig-api-machinery/00xx-admission-webhooks-to-ga.md

+* AdmissionReview version selection: Add a API field to set the AdmissionReview versions a webhook can understand.
+* Pass existing objects to delete admission webhooks
+* re-run mutating plugins if any webhook changed object to fix the plugin ordering problem
+* pass OperationOption (such as CreateOption/DeleteOption) to the webhook


Add: Webhook Configuration should not require users/admins to understand every possible way a user can write an object (e.g., you must hook both apps/deployment and extensions/deployment).

I remember I discussed this early on with @liggitt and he recommended we do not make this item a requirement for GA. Let's add this item in a follow up PR if we wanted to do it.

We discussed this on Wednesday, and I think it's worth attempting for GA.

All the pros are for webhook authors and are significant:

you could write/register a simpler webhook for a single version of a resource

you don't have to worry about a security-related webhook missing a new version of a resource you thought you were protecting

All the cons are for the apiserver authors

performance cost of converting up to 2n times in order to dispatch to n webhooks (where today we convert zero times between webhooks)

complexity of some of those conversions possibly being done by CRD conversion webhooks

...

I'm ok adding it as a planned item, but there are some interesting edges to work through before this bit would be implementable. If it proves untenable (either because of complexity or performance), I'd be comfortable GAing without it.

lavalamp · 2019-01-30T19:09:52Z

keps/sig-api-machinery/00xx-admission-webhooks-to-ga.md

+* AdmissionReview version selection: Add a API field to set the AdmissionReview versions a webhook can understand.
+* Pass existing objects to delete admission webhooks
+* re-run mutating plugins if any webhook changed object to fix the plugin ordering problem
+* pass OperationOption (such as CreateOption/DeleteOption) to the webhook


Add: define whether or not admission plugins are allowed to modify "user set" fields; also, do the actions of prior webhooks count as "user set"?

define whether or not admission plugins are allowed to modify "user set" fields

yes, much of their power comes from this

do the actions of prior webhooks count as "user set"

yes

As this is the behaviour today (Jordan's yeses) I assume we don't have an action item in this comment unless @lavalamp disagrees :)

keps/sig-api-machinery/00xx-admission-webhooks-to-ga.md

lavalamp · 2019-01-30T21:53:55Z

keps/sig-api-machinery/00xx-admission-webhooks-to-ga.md

+* port configuration: Add a configuration field to change the service port from the default 443.
+* AdmissionReview version selection: Add a API field to set the AdmissionReview versions a webhook can understand.
+* Pass existing objects to delete admission webhooks
+* re-run mutating plugins if any webhook changed object to fix the plugin ordering problem


I know we talked this to death on the call but I'm increasingly unsatisfied with this.

Imagine a linear acyclic chain of webhooks where each makes a decision based on the previous one, which I think is a reasonable thing for a webhook to do. The optimal ordering requires a single pass. The pessimal ordering (run them in the opposite order) requires N passes.

There are many arrangements of webhooks that need more than 2 passes if we consider cycles.

We can deliberately construct pathological webhook sets requiring obscene numbers of passes no matter what order they are run in. I think we can mostly exclude these from consideration as it should be hard to accidentally produce one.

So, I'd like to see:

If there is an ordering for a set of webhooks such that they quiesce in 1 or 2 passes, is there anything we can do to actually run them in that order and not the order that takes N passes?

If there's a cycle that needs 3 or more passes, how do users realize that they're not getting the correct final state?

Can we collect any evidence supporting (or refuting) my assumption that most webhook sets are not pathological?

Let me re-emphasize that we need to let webhooks opt-out of the double run, and probably default off in the beta. E.g., @yliaog wasn't confident that the Istio sidecar webhook would behave well if double-called.

The below is probably not a good idea, but maybe it will help someone think of one that is: One way of finding the optimal order for things that have an optimal order: allow webhooks to return in the AdmissionReview object the set of fields they examined before mutating the fields they did. We can tell the set of fields they changed by looking at what they did. With these pieces of information, we can form a graph with all webhooks that need to be run a second time, and then choose the order to run them in by topologically sorting. Unfortunately, this is onerous from the webhook developer's perspective. It also doesn't solve pathological cases at all, but I think that's a non-goal--I'm pretty sure we can construct webhook sets which encode NP complete problems.

I think the current proposal ("just re-call") is both:

the simplest thing that could work (covers the reported issues)

required as an option even in the presence of more intelligent/complex optimizations

Justification for 2: the apiserver computing an order requires knowledge of the inputs and outputs to the webhook. In a future world where a webhook could express its inputs and outputs, it would always need a way to express "I pay attention to all fields, and have the potential to modify all fields" (for example, a programmable opa-based admission webhook), and if there were two webhooks that both inspect and potentially modify the same fields, the possibility of needing to re-call is present.

Unfortunately, this is onerous from the webhook developer's perspective.

I think anything sufficiently expressive to be useful is going to be painfully verbose, and that most webhook configs would either opt out entirely or choose to just get unconditionally re-called.

Let me re-emphasize that we need to let webhooks opt-out of the double run, and probably default off in the beta. E.g., @yliaog wasn't confident that the Istio sidecar webhook would behave well if double-called.

Agreed. I'd expect a way for the webhook config to indicate it is idempotent / safe to re-call, defaulted to false in v1beta1, and possibly undefaulted and required in v1. I would leave the number of potential re-calls unspecified to allow for future optimizations.

I'd recommend waiting for more evidence of a need for apiserver-driven sorting before pursuing it, and don't see that as a blocker to GA.

... a way to express "I pay attention to all fields ...
I think anything sufficiently expressive to be useful is going to be painfully verbose, and that most webhook configs would either opt out entirely or choose to just get unconditionally re-called.

I think you misread; in the meeting I was talking about the configuration object but just now I was talking about the admission review response. I.e. the webhook writes down the fields it actually did look at; there's never a need to say "everything" unless it literally did look at everything. (Imagine: an unstructured object that logs every Get request.) This doesn't help the first pass but would help subsequent passes.

I agree multiple passes (hopefully calling fewer webhooks every pass) is likely to be an ingredient in a good solution :/

I noted in the KEP that this feature would be opt-in. We can continue discussing this and document the result in this KEP using a follow up PR.

mbohlool · 2019-02-01T05:04:17Z

@apelisse @deads2k @lavalamp @liggitt PTAL

liggitt · 2019-02-01T06:39:02Z

/lgtm

still provisional, but at a good spot for iterating

follow up PR #806 addresses:

include "convert to requested-version" capability in "planned for GA" list (Admission Webhook to GA #736 (comment))
resolve unset vs * for scope field (Admission Webhook to GA #736 (comment))
mention the service port capability will be added to APIService and audit webhook structs as well (Admission Webhook to GA #736 (comment))
explicitly indicates API/behavior for conversion and re-call features are still in progress and will be updated/approved in the KEP before implementation
marks implementable

to be resolved:

API/behavior design for "convert to webhook-requested version" (proposed in update webhook admission proposal with "convert to webhook-requested version" #802)
API/behavior design for "mutating plugin ordering" re-call
- detail opt-in mechanism for re-call (Admission Webhook to GA #736 (comment))
- make a decision on simple re-call strategy as a first step (Admission Webhook to GA #736 (comment))

to be added to existing documentation:

clarify admission's ability to set "user set" fields (Admission Webhook to GA #736 (comment))

lavalamp · 2019-02-01T17:46:16Z

/lgtm
/approve

This is getting big, let's lock in progress.

k8s-ci-robot · 2019-02-01T17:46:26Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lavalamp, liggitt, mbohlool

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~keps/sig-api-machinery/OWNERS~~ [lavalamp]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Admission Webhook to GA

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/blocked-paths Indicates that a PR should not merge because it touches files in blocked paths. labels Jan 28, 2019

k8s-ci-robot requested review from jdumars and lavalamp January 28, 2019 17:01

mbohlool mentioned this pull request Jan 28, 2019

Admission Webhook Path to GA KEP kubernetes/community#3157

Closed

mbohlool force-pushed the gandalf branch from c40be7c to e48af3b Compare January 28, 2019 17:12

mbohlool force-pushed the gandalf branch 2 times, most recently from 5ed473c to 892b412 Compare January 28, 2019 17:49

mbohlool force-pushed the gandalf branch from 892b412 to 7813f69 Compare January 28, 2019 17:52

This was referenced Jan 28, 2019

Extensibility features - Path to GA kubernetes/kubernetes#73185

Closed

forward delete options to admission webhooks kubernetes/kubernetes#68873

Closed

mbohlool force-pushed the gandalf branch 4 times, most recently from 970fa55 to 9f530a0 Compare January 29, 2019 22:27

deads2k reviewed Jan 29, 2019

View reviewed changes