Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Admission Webhook to GA #736

Merged
merged 3 commits into from
Feb 1, 2019
Merged

Admission Webhook to GA #736

merged 3 commits into from
Feb 1, 2019

Conversation

mbohlool
Copy link
Contributor

Enhancement proposal to promote Admission Webhooks to GA.

@deads2k @liggitt @sttts @erictune @caesarxuchao @roycaihw @fedebongio @lavalamp

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/blocked-paths Indicates that a PR should not merge because it touches files in blocked paths. labels Jan 28, 2019
@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/pm size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jan 28, 2019
@mbohlool
Copy link
Contributor Author

@deads2k wrote:

I think that any plan to graduate to GA needs to address the existent problems that have been identified around admission plugin ordering. Notably kubernetes/kubernetes#64333 which prevents mutating admission plugins from manipulating pods.
/hold

I agree and setup a follow up meeting for that. But for the holding part, the KEP process encourages to merge the KEP as soon as possible and iterate on it. The status of the KEP would be provisional until we answer these follow up questions.

@liggitt
Copy link
Member

liggitt commented Jan 28, 2019

But for the holding part, the KEP process encourages to merge the KEP as soon as possible and iterate on it. The status of the KEP would be provisional until we answer these follow up questions.

The intent is to capture iterative consensus. A spot in the doc listing open questions would temper the "this is the agreed-on list for GA" impression.

@mbohlool mbohlool force-pushed the gandalf branch 2 times, most recently from 5ed473c to 892b412 Compare January 28, 2019 17:49
@mbohlool
Copy link
Contributor Author

The intent is to capture iterative consensus. A spot in the doc listing open questions would temper the "this is the agreed-on list for GA" impression.

Added open questions to the Graduation section.

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/blocked-paths Indicates that a PR should not merge because it touches files in blocked paths. label Jan 30, 2019
@mbohlool
Copy link
Contributor Author

@liggitt @deads2k Answered or applied your comments. PTAL.

@deads2k
Copy link
Contributor

deads2k commented Jan 30, 2019

@deads2k look at number 6 in this document: https://github.com/kubernetes/enhancements/blob/master/keps/0000-kep-template.md

Merging early works, but only if we strip out all bits with comments. We did that with Chao's storage migration one and it worked well. In this case, we're up against a deadline.

@liggitt
Copy link
Member

liggitt commented Jan 30, 2019

I don't want this to run afoul of a paperwork deadline, but I also don't want to merge something that needs edits. I'm really not sure what to do paperwork-wise. @liggitt advice? These are fairly non-contentious updates on the path to stability, it would be a real shame if paperwork prevented us from moving forward.

deadline for merging and marking implementable got extended to monday because the originally documented requirements were unclear. let's work to resolve the open questions today/tomorrow

* AdmissionReview version selection: Add a API field to set the AdmissionReview versions a webhook can understand.
* Pass existing objects to delete admission webhooks
* re-run mutating plugins if any webhook changed object to fix the plugin ordering problem
* pass OperationOption (such as CreateOption/DeleteOption) to the webhook
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add: Webhook Configuration should not require users/admins to understand every possible way a user can write an object (e.g., you must hook both apps/deployment and extensions/deployment).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember I discussed this early on with @liggitt and he recommended we do not make this item a requirement for GA. Let's add this item in a follow up PR if we wanted to do it.

Copy link
Member

@liggitt liggitt Feb 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this on Wednesday, and I think it's worth attempting for GA.

All the pros are for webhook authors and are significant:

  • you could write/register a simpler webhook for a single version of a resource
  • you don't have to worry about a security-related webhook missing a new version of a resource you thought you were protecting

All the cons are for the apiserver authors

  • performance cost of converting up to 2n times in order to dispatch to n webhooks (where today we convert zero times between webhooks)
  • complexity of some of those conversions possibly being done by CRD conversion webhooks
  • ...

I'm ok adding it as a planned item, but there are some interesting edges to work through before this bit would be implementable. If it proves untenable (either because of complexity or performance), I'd be comfortable GAing without it.

* AdmissionReview version selection: Add a API field to set the AdmissionReview versions a webhook can understand.
* Pass existing objects to delete admission webhooks
* re-run mutating plugins if any webhook changed object to fix the plugin ordering problem
* pass OperationOption (such as CreateOption/DeleteOption) to the webhook
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add: define whether or not admission plugins are allowed to modify "user set" fields; also, do the actions of prior webhooks count as "user set"?

Copy link
Member

@liggitt liggitt Jan 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

define whether or not admission plugins are allowed to modify "user set" fields

yes, much of their power comes from this

do the actions of prior webhooks count as "user set"

yes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this is the behaviour today (Jordan's yeses) I assume we don't have an action item in this comment unless @lavalamp disagrees :)

* port configuration: Add a configuration field to change the service port from the default 443.
* AdmissionReview version selection: Add a API field to set the AdmissionReview versions a webhook can understand.
* Pass existing objects to delete admission webhooks
* re-run mutating plugins if any webhook changed object to fix the plugin ordering problem
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we talked this to death on the call but I'm increasingly unsatisfied with this.

  • Imagine a linear acyclic chain of webhooks where each makes a decision based on the previous one, which I think is a reasonable thing for a webhook to do. The optimal ordering requires a single pass. The pessimal ordering (run them in the opposite order) requires N passes.
  • There are many arrangements of webhooks that need more than 2 passes if we consider cycles.
  • We can deliberately construct pathological webhook sets requiring obscene numbers of passes no matter what order they are run in. I think we can mostly exclude these from consideration as it should be hard to accidentally produce one.

So, I'd like to see:

  • If there is an ordering for a set of webhooks such that they quiesce in 1 or 2 passes, is there anything we can do to actually run them in that order and not the order that takes N passes?
  • If there's a cycle that needs 3 or more passes, how do users realize that they're not getting the correct final state?
  • Can we collect any evidence supporting (or refuting) my assumption that most webhook sets are not pathological?
  • Let me re-emphasize that we need to let webhooks opt-out of the double run, and probably default off in the beta. E.g., @yliaog wasn't confident that the Istio sidecar webhook would behave well if double-called.

The below is probably not a good idea, but maybe it will help someone think of one that is: One way of finding the optimal order for things that have an optimal order: allow webhooks to return in the AdmissionReview object the set of fields they examined before mutating the fields they did. We can tell the set of fields they changed by looking at what they did. With these pieces of information, we can form a graph with all webhooks that need to be run a second time, and then choose the order to run them in by topologically sorting. Unfortunately, this is onerous from the webhook developer's perspective. It also doesn't solve pathological cases at all, but I think that's a non-goal--I'm pretty sure we can construct webhook sets which encode NP complete problems.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the current proposal ("just re-call") is both:

  1. the simplest thing that could work (covers the reported issues)
  2. required as an option even in the presence of more intelligent/complex optimizations

Justification for 2: the apiserver computing an order requires knowledge of the inputs and outputs to the webhook. In a future world where a webhook could express its inputs and outputs, it would always need a way to express "I pay attention to all fields, and have the potential to modify all fields" (for example, a programmable opa-based admission webhook), and if there were two webhooks that both inspect and potentially modify the same fields, the possibility of needing to re-call is present.

Unfortunately, this is onerous from the webhook developer's perspective.

I think anything sufficiently expressive to be useful is going to be painfully verbose, and that most webhook configs would either opt out entirely or choose to just get unconditionally re-called.

Let me re-emphasize that we need to let webhooks opt-out of the double run, and probably default off in the beta. E.g., @yliaog wasn't confident that the Istio sidecar webhook would behave well if double-called.

Agreed. I'd expect a way for the webhook config to indicate it is idempotent / safe to re-call, defaulted to false in v1beta1, and possibly undefaulted and required in v1. I would leave the number of potential re-calls unspecified to allow for future optimizations.

I'd recommend waiting for more evidence of a need for apiserver-driven sorting before pursuing it, and don't see that as a blocker to GA.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... a way to express "I pay attention to all fields ...
I think anything sufficiently expressive to be useful is going to be painfully verbose, and that most webhook configs would either opt out entirely or choose to just get unconditionally re-called.

I think you misread; in the meeting I was talking about the configuration object but just now I was talking about the admission review response. I.e. the webhook writes down the fields it actually did look at; there's never a need to say "everything" unless it literally did look at everything. (Imagine: an unstructured object that logs every Get request.) This doesn't help the first pass but would help subsequent passes.

I agree multiple passes (hopefully calling fewer webhooks every pass) is likely to be an ingredient in a good solution :/

Copy link
Contributor Author

@mbohlool mbohlool Jan 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noted in the KEP that this feature would be opt-in. We can continue discussing this and document the result in this KEP using a follow up PR.

@mbohlool
Copy link
Contributor Author

mbohlool commented Feb 1, 2019

@apelisse @deads2k @lavalamp @liggitt PTAL

@liggitt
Copy link
Member

liggitt commented Feb 1, 2019

/lgtm

still provisional, but at a good spot for iterating

follow up PR #806 addresses:

to be resolved:

to be added to existing documentation:

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 1, 2019
@lavalamp
Copy link
Member

lavalamp commented Feb 1, 2019

/lgtm
/approve

This is getting big, let's lock in progress.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lavalamp, liggitt, mbohlool

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 1, 2019
@k8s-ci-robot k8s-ci-robot merged commit 46941fc into kubernetes:master Feb 1, 2019
derekwaynecarr pushed a commit to derekwaynecarr/enhancements that referenced this pull request Feb 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants