Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pluggable access control #114

Closed
evankanderson opened this issue Feb 27, 2020 · 28 comments
Closed

Pluggable access control #114

evankanderson opened this issue Feb 27, 2020 · 28 comments
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. kind/user-story Categorizes an issue as capturing a user story lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@evankanderson
Copy link
Contributor

What would you like to be added:

A standardized mechanism to call an external service to authorize request forwarding to a selected destination (backend).

Why is this needed:

User Story

  • As an application developer, I want to extract authentication and authorization requirements from an application to a common infrastructure component. The authentication and authorization function should have access to request headers and optionally the request body up to some limit.
  • As a cluster operator, using an external call (akin in spirit to Kubernetes' ValidatingWebhooks) rather than a proxying service allows me to control the following policies:
    • Ability to separately measure latency/error rate
    • Ability to fail-open as well as fail-closed on error
    • Reduced complexity of authorization webhook

Usage examples

  • Cloud Foundry UAA for OAuth token verification would be one example, as would a SAML implementation.
  • If TLS information is provided in the request, this might provide an implementation for TLS: enforce validation policy for an application #93 (enforce TLS validation for an application)
  • Access control could also be used to implement a rate-limiting mechanism or load-shedding mechanism to prevent overloading backend services like a SQL database.
  • Application services could also provide validation on other header fields; Knative event delivery (CloudEvents over HTTP) might want to be able to limit delivery to only certain event types (CE-Type header).
@evankanderson evankanderson added the kind/feature Categorizes issue or PR as related to a new feature. label Feb 27, 2020
@evankanderson
Copy link
Contributor Author

/kind user-story

@mikehelmick
@tcnghia
@yolocs

@k8s-ci-robot k8s-ci-robot added the kind/user-story Categorizes an issue as capturing a user story label Feb 27, 2020
@bowei
Copy link
Contributor

bowei commented Apr 2, 2020

/assign

@evankanderson
Copy link
Contributor Author

Quick envoy-compatible design to support combining multiple authentication hooks with an at-least-one semantic follows. Note that it may be advisable to duplicate the gRPC definitions and pun the protos between the two to allow independent evolution of the Envoy protos and Ingressv2.

Add a property spec.authorization with the following golang type:

type Authorization struct {
	// URL indicates the address of the service used to perform authorization checks. Required.
	// There are two protocols supported:
	// - grpc: A service.auth.v2.CheckRequest
	//   In this mode, the access controlled resource is denoted by attributes.destination labels
	//   https://www.envoyproxy.io/docs/envoy/v1.13.1/api-v2/service/auth/v2/external_auth.proto#envoy-api-msg-service-auth-v2-checkrequest
	// - https: An HTTP request with all the headers but no body content.
	//   In this mode, the access controlled resource must be denoted by additional path or query-string arguments
	//   Equivalent to a matcher of `{prefix: ""}` in envoy configuration:
	//   https://www.envoyproxy.io/docs/envoy/v1.13.1/api-v2/config/filter/http/ext_authz/v2/ext_authz.proto#envoy-api-msg-config-filter-http-ext-authz-v2-authorizationrequest
	URL string `json:"url"`

	// TimeoutMs is the duration that the authorization call is allowed to operate.
	// Default if unset is 100
	TimeoutMs int `json:"timeoutMs"`
}

// Define a type for the list to allow adding helper methods to `spec.authorization`
type Authorizations []Authorization

This would render in yaml as:

...
spec:
  authorization:
  - url: grpc://opa.policy-central.svc.cluster.local/
  - url: https://external-auth.mycorp.com/foobar
    timeoutMs: 500

@hbagdi
Copy link
Contributor

hbagdi commented Jun 11, 2020

Please take a look at the extensible points that the following document discusses: https://docs.google.com/document/d/1SkTb6ECuiQiayvumdsZ3sxDcXD6cYs-DKB_C81Ajvh4/edit

It seems like such an extension would be a custom extension that can be supported via actions.

@jmprusi
Copy link

jmprusi commented Jun 15, 2020

Pluggable access control looks like a standard functionality implemented in a lot of "custom" ingresses implementation, and perhaps, this could be part of extended (at least). If a feature like this is custom, we will end up with a lot of custom implementations again, defeating the purpose of this work. (most of those implementations expose the Envoy's extAuthz configuration)

If we keep the authentication fields simple enough, those could be implemented by multiple ingresses:

I would add to this proposal an extensionRef allowing the controller to add some metadata to the authorization/authentication request based on an external CRD:

  authorization:
  - url: grpc://opa.policy-central.svc.cluster.local/
    timeoutMs: 500
    extensionRef: 
      - apiVersion: APIgateway
         kind: AuthorizationCRD
         name: example

Also, wouldn't this make more sense implemented as a filter?

@evankanderson
Copy link
Contributor Author

@hbagdi

Hi, I had a chance to read the extension points document in this comment.

Unfortunately, it looks like the answer suggested in that doc is that access control is a custom extensionRef, which means that it will have all the same problems that the v1beta1 Ingress had -- inconsistent implementations and a lowest-common-denominator that's below the utility level for many applications. I realize that authn and authz is a hard problem (much harder than traffic splitting), but I'm hoping that we can solve them the same way that the kubernetes API server did -- by an extensible hook mechanism, rather than needing to build all the details into the core application.

At first glance, @jmprusi 's suggestion seems like it makes sense, though I'm not quite sure what the extensionRef would be pointing to here. (The minimum needed is probably the authz URL; you can stuff an identifier of the specific rule to be enforced into the URL, but an extension could provide extra metadata about what properties of the request need to be forwarded.)

@jmprusi
Copy link

jmprusi commented Jul 9, 2020

At first glance, @jmprusi 's suggestion seems like it makes sense, though I'm not quite sure what the extensionRef would be pointing to here.

I think there are some situations where you want to add additional metadata to the authorization requests that aren't available in the original request. (In this case, the extensionRef can even point to a config map with some key/values that could include, for example, tenant, environment, tier,etc.. )

(The minimum needed is probably the authz URL; you can stuff an identifier of the specific rule to be enforced into the URL, but an extension could provide extra metadata about what properties of the request need to be forwarded.)

That's a good point too! :)

I agree that this can be solved in multiple ways: extensionRef, a key/value list, or even by adding some headers.

@hbagdi
Copy link
Contributor

hbagdi commented Jul 9, 2020

Great points here @jmprusi and @evankanderson.
Please take a look at the following proposal:
https://docs.google.com/document/d/1SkTb6ECuiQiayvumdsZ3sxDcXD6cYs-DKB_C81Ajvh4/edit
Look for the section "Option 2: Extending custom functionality" which shows how such an extension point would work in the API.

The overall theme for this API is to allow for custom implementations to extend when required and then get convergence over time so that these custom features can be pulled into extended and then ultimately into the core API.
This is explained in the Conformance section at https://kubernetes-sigs.github.io/service-apis/concepts/.

Please let us know how you feel about the above proposal.

@hbagdi
Copy link
Contributor

hbagdi commented Jul 9, 2020

/assign

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 7, 2020
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 6, 2020
@evankanderson
Copy link
Contributor Author

evankanderson commented Nov 7, 2020

(sorry, missed the follow-up until now for some reason)

I agree that having a general extension mechanism is important, and the "actions API" seems like a sketch that could eventually provide lots of functionality that API gateways provide.

What I'm specifically interested in, though, is a standard way to provide data-plane access control along the lines of Google's BeyondCorp design -- a way to take the access control decision out of the application and put it in the infrastructure layer. This seems like a well-understood pattern, though platforms like Google's IAP mostly tie you to a specific identity platform. Amazon's API gateway does support pluggable logic, though.

@evankanderson
Copy link
Contributor Author

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Nov 7, 2020
@jmprusi
Copy link

jmprusi commented Nov 9, 2020

Perhaps getting into the extended support spec, a basic initial approach like AWS Lambda has:

A request parameter-based Lambda authorizer (also called a REQUEST authorizer) receives the caller's identity in a combination of headers, query string parameters, stageVariables, and $context variables.

Could help cover some initial use-cases, provide an initial reference implementation, and "simple" enough to be implemented in different data-planes.

@evankanderson
Copy link
Contributor Author

Would a POC for one or more ingress implementations help move the needle here? If so, which ones?

@jpeach
Copy link
Contributor

jpeach commented Nov 9, 2020

There's a question of what the API looks like, and a question of what the implementation looks like for integrating this stuff. There's no common protocol for external auth; various proxies all do different things. We could, for example, spec that the external auth server has to support the Envoy gRPC protocol, but that that just means few proxies will implement it.

@jmprusi
Copy link

jmprusi commented Nov 10, 2020

There's a question of what the API looks like, and a question of what the implementation looks like for integrating this stuff. There's no common protocol for external auth; various proxies all do different things. We could, for example, spec that the external auth server has to support the Envoy gRPC protocol, but that that just means few proxies will implement it.

Yes, going with extAuthz gRPC could be difficult for other proxies, but envoy can use a REST endpoint for extAuthz, and some of those proxies can be extended via lua, wasm, or native extensions...

As @evankanderson proposed, it seems to be a good idea to try to define a list of proxies and make some PoCs :)

@robscott
Copy link
Member

Hey @jmprusi and @evankanderson, thanks for raising this issue! We followed back up on this today during our community meeting and are still very interested in finding a solution here. Would either of you happen to have time to work on some PoCs for this?

@jmprusi
Copy link

jmprusi commented Jan 14, 2021

Hi @robscott, I did work on an early PoC around this:

Luckily, I can shift my working focus onto this right now, so I will start a design/proposal document and improve those PoCs. Any help/pointers are very welcome! :)

Also, I don't know if the recording of the community meeting is available yet, but I can't seem to find it... can you point me to it?

Thanks

--- Update

The recoding: https://www.youtube.com/watch?v=k6I-nL9RinE&list=PL69nYSiGNLP2E8vmnqo5MwPOY25sDWIxb&index=3

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 18, 2021
@evankanderson
Copy link
Contributor Author

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 19, 2021
@hbagdi hbagdi removed their assignment Apr 23, 2021
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 22, 2021
@evankanderson
Copy link
Contributor Author

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 27, 2021
@jpeach
Copy link
Contributor

jpeach commented Jul 27, 2021

Policy attachment #736 might be a more flexible approach for access control and authentication in general.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 25, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 24, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. kind/user-story Categorizes an issue as capturing a user story lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants