Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Users should only be able to create TriggerTemplates for resources they are permissioned to use #77

Closed
pmorie opened this issue Aug 21, 2019 · 33 comments
Assignees
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@pmorie
Copy link

pmorie commented Aug 21, 2019

In the current design, users would be able to use the spec.resourceTemplates field to escalate their degree of write permission within the cluster; users of TriggerTemplate are able to create any resource that the service account responsible for creating resources in a TriggerTemplate has permission to create. For example, if a user does not have access to create Pods, but the service account servicing TriggerTemplate does, a user could put a Pod into spec.resourceTemplates and have it created.

We should protect against this form of privilege escalation by having a validating webhook that uses information from the authorizer (this is part of the payload webhooks are passed) to make SAR checks that establish that the user creating or updating a TriggerTemplate has permission to create those resources in the cluster.

@bobcatfish
Copy link
Collaborator

@pmorie what's a SAR check? (not a term I'm familiar with!)

@bobcatfish
Copy link
Collaborator

@pmorie this is a super interesting point - was discussing this a bit with @imjasonh and @dibyom - this issue must apply to pipelines as well? Sine a user can specify any service account they'd like in a Run definition, which will ulitmately get provided to arbitrary containers the user provides 🤔

@imjasonh also told me SAR might stand for SubjectAccessReview? I'm not sure how that works but I agree with you about the validating webhook (maybe via something like gatekeeper?)

It feels to me a bit out of scope for Triggers or Pipelines tho - what do you think? I could see having a guide to a best practices setup that points people towards setting this kind of thing up (maybe even tools/scripts/docs to help them get this right), but I'm not sure it should be a feature of the projects themselves?

@vtereso
Copy link

vtereso commented Aug 28, 2019

@pmorie I'm not too familiar with the distinction between users and service accounts. This says:

Kubernetes does not have objects which represent normal user accounts

Could you further explain how a user would have access to such a service account where it would be an escalation?

@bobcatfish
Copy link
Collaborator

After investigating and thinking about this a bit more, I think the TL;DR is:

afaik @pmorie is pointing out that in our current design, if a user can create an EventListener (with a serviceaccount that can make PipelineRuns, read TriggerBindings, etc.), and you can create Pipelines + Tasks, they'll then have additionally the ability to:

  1. Create Resources like PipelineRuns
  2. Creating a PipelineRun means you can create a pod
  3. Being able to create a pod means you can basically do anything you want within a cluster, and access any existing service accounts w/in the cluster

So basically for example you might intend for a user to only be able to run Pipeline Foo in a cluster in response to a webhook, but you may not realize you actually gave the user the ability to do nearly anything they want in the cluster.

I'm not sure triggers or pipelines should solve this directly, but we might want to provide some guidance (and maybe even best practice setup Tasks like the one @akihikokuroda added!) for ppl to setup secure clusters, e.g. something like:

  1. Restrict who can create Pipelines and Tasks <-- maybe this solves like 90% of the problem?
  2. Use SubjectAccessReview to express stuff like: in order for a user to be allowed to make an EventListener, they must also be allowed to create a PipelineRun (and if you want to create a PipelineRun, you need to be allowed to create a pod)
  3. Use something like gatekeeper to express more complex stuff, e.g. serviceaccount blah can only be referenced in pods that are created by user baz

@bobcatfish
Copy link
Collaborator

haha i am bad at TL;DRs XD

@vtereso
Copy link

vtereso commented Sep 5, 2019

Thought about this a bit more, but please correct me if I am wrong.

Since Triggers only allow you to create the resources specified with the template, it is the responsibility of the service account to specify the permissions, where there is no permissions issue*. The caveat here is that from those resources generated, reconcilers might do auxiliary processes. For example, the service account may have the permission to create a PipelineRun, which actually makes pods (the reconciler service account) and that could be a transitive escalation of privilege. However even if a pod was created with this service account, wouldn't the service account's permissions (the one used to created the original PipelineRun) similarly restrict the scope of the what could be done? In any case, is there any programmatic way to make a determination on what prospective reconcilers would do?

@vtereso
Copy link

vtereso commented Sep 5, 2019

To this point, maybe pipelines should require the service account to be able to create pods if it doesn't so already?

@gabemontero
Copy link
Contributor

gabemontero commented Feb 5, 2020

Thought about this a bit more, but please correct me if I am wrong.

Since Triggers only allow you to create the resources specified with the template, it is the responsibility of the service account to specify the permissions, where there is no permissions issue*. The caveat here is that from those resources generated, reconcilers might do auxiliary processes. For example, the service account may have the permission to create a PipelineRun, which actually makes pods (the reconciler service account) and that could be a transitive escalation of privilege. However even if a pod was created with this service account, wouldn't the service account's permissions (the one used to created the original PipelineRun) similarly restrict the scope of the what could be done?

Hey @vtereso

First, given it was several months ago, if the question you asked here has already been answered (with the same conclusion as me), apologies for missing that and the added noise here :-)

I believe the answer to your question is "no" in that the associated service account permission do not act as sufficient shield.

My explanation as to "why" is tl;dr ... but I'll risk it in the hope of fully explaining the rationale, as well as laying out all the assumption details in case there are disconnects. I'll also preface this by stating I've in the past worked on

"a controller that creates objects on behalf of users with varying permissions in a k8s cluster"

  • so, if we agree that tekton trigger controller can also be a controller that will be creating objects for users with varying permissions in a k8s cluster
  • and with the Trigger api type exposing the entire Container type in the Step type, thus allowing for the setting of privileged containers
  • you need something to distinguish which users can employ TriggerTemplates with cluster admin type capabilities, if those exist in the cluster
  • that is where k8s SARs come in ... a controller with cluster-admin privileges can run a SAR for the object types in the raw field, supplying the identity of a user without cluster-admin privileges attempting to create, to see if they can instantiate the objects defined in the TriggerResourceTemplate that happen to require cluster-admin privileges to create those object.
  • where if they are not allowed, of course the request is denied

Now, you could just make the restriction that "no cluster admin level" creations will be allowed at all, and restrict the trigger controller in that way.

But I could see cluster admins wanting to leverage tekton plumbing, including triggers, to do things like manipulate test clusters. What if they maintain cluster configuration in a git repo, and trigger cluster changes via PRs against the git repo, and then have those PRs trigger Tekton Pipelines to validate the config change?

Now, admitting I'm still new to tekton triggers, but I see in the current sink code access to

  • http request headers, which could be used to identify who is making the request
  • github styled webhook payloads that include github users (who are presumably creating PRs)
  • or use of auth enabled PipelineResources like GitResource or StorageResource in the dynamic file resources proposal

It seems the opportunity will exists to, at least opt into, gathering the identity of the end requester and performing authorization checks.

My hope is to craft a PR in the short term that would at least show what an implementation could look like. Then once everyone is comfortable with that logic, the community could decide how to connect the requesting user impersonation/authorization dots end to end and stage the changes in.

Thoughts?

@dibyom @vdemeester @pmorie FYI

In any case, is there any programmatic way to make a determination on what prospective reconcilers would do?

@gabemontero
Copy link
Contributor

fyi I updated my prior comment #77 (comment) as I've looked at the code more and understand that

  • there is validation to restrict the type instantiated from the TriggerTemplate to the various Tekton types
  • clarifying that with the Step being exposing the Container API it is possible to create privileged containers

@gabemontero
Copy link
Contributor

per sync call today I'm assigning to myself

will update this with the WIP PR when I have it available

@gabemontero
Copy link
Contributor

oops .... @wlynch @dibyom I don't have permission to assign this to myself ;-)

can one of you all do the owners?

@gabemontero
Copy link
Contributor

/assign @gabemontero

@gabemontero
Copy link
Contributor

check that @wlynch @dibyom ... just could not do it from the assignee portion of the issue ;-)

@gabemontero
Copy link
Contributor

gabemontero commented Feb 18, 2020

Intermediate update:

  • convinced myself (and I believe @pmorie as well though we are reconvening later this week to talk some more) that all the admission webhooks/controllers/servers and SARs belong in tektoncd/pipelines
  • any updates in this repo, if any, would be around a more granular auth flow out of the event sink
  • I've got a commit/branch in my fork where I'm tinkering with some ideas along these lines, though I'm not ready to pull the trigger on a PR yet

@tekton-robot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.

/lifecycle stale

Send feedback to tektoncd/plumbing.

@tekton-robot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

@tekton-robot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

/close

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 13, 2020
@tekton-robot
Copy link

@tekton-robot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

/close

Send feedback to tektoncd/plumbing.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@dibyom
Copy link
Member

dibyom commented Aug 14, 2020

we've made some progress here with more granular RBAC. Do you think it makes sense to keep this closed or should we re-open @gabemontero ?

@gabemontero
Copy link
Contributor

/remove-lifecycle rotten
/reopen

@dibyom yeah the more granular RBAC is minimally a first step, but I think we should track this issue in conjunction with @khrm 's trigger CRD work; I'm curious as that reaches fruition if the serviceAccount associated with the trigger will be optional or not, along with how the trigger CRD work reconciles with #705

@tekton-robot
Copy link

@gabemontero: Reopened this issue.

In response to this:

/remove-lifecycle rotten
/reopen

@dibyom yeah the more granular RBAC is minimally a first step, but I think we should track this issue in conjunction with @khrm 's trigger CRD work; I'm curious as that reaches fruition if the serviceAccount associated with the trigger will be optional or not, along with how the trigger CRD work reconciles with #705

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tekton-robot tekton-robot reopened this Aug 16, 2020
@tekton-robot tekton-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Aug 16, 2020
@tekton-robot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.

/lifecycle stale

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 14, 2020
@gabemontero
Copy link
Contributor

/remove-lifecycle stale

I still need to revisit where @khrm is wrt the trigger CRD / multi-tenant event listener work etc. per #77 (comment)

@tekton-robot tekton-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 14, 2020
@gabemontero
Copy link
Contributor

@dibyom @khrm - to help illustrate my point about the optional nature of the EVT sa ref field and multi-tenant event listeners, I've opened an experimental PR #836 to explain where things need to go, and we sufficiently close the gaps originally noted by this issue.

@gabemontero
Copy link
Contributor

@dibyom @khrm - to help illustrate my point about the optional nature of the EVT sa ref field and multi-tenant event listeners, I've opened an experimental PR #836 to explain where things need to go, and we sufficiently close the gaps originally noted by this issue.

UPDATE: we had a centralized discussion on this in today's WG call today ... the skinny:

  • per @dibyom the pluggable interceptor rework should reduced the permission scope of the EventListener
  • EL permissions in general are on @khrm 's radar wrt multi-tenant EL
  • we also discussed a few tweaks to the first pass approach I did in if eventlistenertrigger sa not set, assume user means default sa #836
  • But in general, we'll revisit once the interceptor rework is done, and there is something more tangible wrt multi-tenant WL to examine

@dibyom @khrm please comment as needed if my summary missed something

@tekton-robot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 2, 2021
@imjasonh
Copy link
Member

imjasonh commented Mar 2, 2021

/remove-lifecycle stale

@tekton-robot tekton-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 2, 2021
@dibyom dibyom mentioned this issue Apr 22, 2021
4 tasks
@dibyom
Copy link
Member

dibyom commented May 12, 2021

But in general, we'll revisit once the interceptor rework is done, and there is something more tangible wrt multi-tenant WL to examine

@gabemontero Interceptors now run in separate services. Is there anything else we need to do for this issue?

(Somewhat related: #1077 (comment))

@gabemontero
Copy link
Contributor

But in general, we'll revisit once the interceptor rework is done, and there is something more tangible wrt multi-tenant WL to examine

@gabemontero Interceptors now run in separate services. Is there anything else we need to do for this issue?

(Somewhat related: #1077 (comment))

Yeah at this point, between what I see in https://github.com/tektoncd/triggers/blob/61544a05c554f7d18ab9e0c5b655de3105230cc7/examples/rbac.yaml, the existing restriction around templates only allowing creation tekton.dev objects, the existing ability I added last year to specify the SA used when those tekton.dev objects are created based on the template spec, interceptors separated, I'm good with closing this.

If multi tenant EL ever does get to the point where we need to see if that rbac file changes to the point where creation in multiple namespaces are possible, we can revisit this concern in that context at that time.

IMO I'm good with no longer tracking this here.

@tekton-robot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 15, 2021
@tekton-robot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 14, 2021
@tekton-robot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen with a justification.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/close

Send feedback to tektoncd/plumbing.

@tekton-robot
Copy link

@tekton-robot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen with a justification.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/close

Send feedback to tektoncd/plumbing.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

7 participants