adds policy to dynamic audit kep #2407

pbarker · 2018-07-23T22:23:51Z

Adds ability to configure policy dynamically in the AuditConfiguration resource.

timothysc · 2018-07-25T18:53:23Z

/ok-to-test

timothysc · 2018-07-25T18:54:25Z

/cc @kubernetes/sig-auth-pr-reviews

tallclair · 2018-07-26T19:23:50Z

keps/sig-auth/0014-dynamic-audit-configuration.md

@@ -156,6 +164,9 @@ As a Kubernetes extension developer, I will be able to provide drop in extension
 #### Story 3
 As a cluster admin, I will be able configure multiple audit-policies and webhook endpoints to provide independent auditing facilities.

+#### Story 4
+As a kubernetes developer, I will be able to quickly turn up the audit level on a certain area to debug my application.
+
 ### Implementation Details/Notes/Constraints


Can you add a section about how you plan on implementing this? I imagine something like: generate the full audit event for every request, and filter it down (drop the event, or drop the request/response body) based on policy prior to sending.

The most expensive portions of the audit pipleine are probably serializing the request&response body (may no longer be true with recent json imrpovements) [1], and duplicating the audit event before sending it [2]. These may need to be addressed before this feature can progress.

[1] https://github.com/kubernetes/kubernetes/blob/cef2d325ee1be894e883d63013f75cfac5cb1246/staging/src/k8s.io/apiserver/pkg/audit/request.go#L150-L152
[2] https://github.com/kubernetes/kubernetes/blob/cef2d325ee1be894e883d63013f75cfac5cb1246/staging/src/k8s.io/apiserver/plugin/pkg/audit/buffered/buffered.go#L269-L271

tallclair · 2018-07-26T19:28:34Z

/cc @x13n

pbarker · 2018-07-30T14:00:51Z

@tallclair updated with details on implementation and performance. PTAL

tallclair

Mostly nits. I think I'm happy with the overall direction, and just want to iron out a few details.

tallclair · 2018-07-31T01:42:42Z

keps/sig-auth/0014-dynamic-audit-configuration.md

@@ -53,14 +53,13 @@ We want to allow the advanced auditing features to be dynamically configured. Fo

 ## Motivation

-The advanced auditing features are a powerful tool, yet difficult to configure. The configuration requires deep insight into the deployment mechanism of choice and often takes many iterations to configure properly requiring a restart of the apiserver each time. Moreover, the ability to install addon tools that configure and enhance audting is hindered by the overhead in configuration. Such tools frequently run on the cluster requiring future knowledge of how to reach them when the cluster is live. These tools could enhance the security and conformance of the cluster and its applications.
+The advanced auditing features are a powerful tool, yet difficult to configure. The configuration requires deep insight into the deployment mechanism of choice and often takes many iterations to configure properly requiring a restart of the apiserver each time. Moreover, the ability to install addon tools that configure and enhance auditing is hindered by the overhead in configuration. Such tools frequently run on the cluster requiring future knowledge of how to reach them when the cluster is live. These tools could enhance the security and conformance of the cluster and its applications.


nit: I like to break up long lines in proposals, because it is easier to tell what changed in github, and easier to comment on a specific sentence. The exact length doesn't matter, but I usually go for ~100chars

tallclair · 2018-07-31T01:43:33Z

keps/sig-auth/0014-dynamic-audit-configuration.md

@@ -80,7 +79,10 @@ type ClusterAuditConfiguration struct {

    v1.ObjectMeta

-    // Backends to send events
+    // Policy is the current audit v1beta1 Policy object
+    Policy Policy


Should we make this optional, and default to the hardcoded policy? Or do you think it's better to force webhooks to be explicit about the data they want?

I've been wondering this myself, I think it could be handy to have it default as long as we feel that would be clear enough to the user.

I think it makes sense. The default policy is a bit complicated - not just "log everything @request-response", so not requiring users to come up with their own version or copy that into the config here would simplify the setup quite a bit.

If you agree, make this a pointer, and update the comment to explain that.

tallclair · 2018-07-31T01:47:55Z

keps/sig-auth/0014-dynamic-audit-configuration.md

+
+Implementing the [attribute interface](/Users/patrick/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/authorization/authorizer/interfaces.go) based on the Event struct was also explored. This would allow us to keep the existing `Sink` interfaces, however it would require parsing the request URI twice in the pipeline due to how that field is represented in the Event. This was determined to not be worth the cost.
+
+#### Configuration Changes
 Any actions to the audit configuration objects will be hard coded to log at the `level=RequestResponse` to the previous backend and the new backend. If the apiserver is HA, the configuration will be rolled out in increments.


Any actions to the audit configuration objects will be hard coded to log at the level=RequestResponse to the previous backend and the new backend.

Does this still apply? I think the static backends should just use the static policy. I think in most cases that would include webhook configuration, but I don't think it needs to be hardcoded. I can see an argument for including it to the dynamic webhooks, but what if (for example) I'm only trying to log things in a specific namespace?

In either case, please clarify what "previous backend and the new backend" refers to (should it be all backends?)

Yeah because of how we mitigated the privilege escalation scenario with the static policy I think were fine to remove this

tallclair · 2018-07-31T01:48:40Z

keps/sig-auth/0014-dynamic-audit-configuration.md

+
+Implementing the [attribute interface](/Users/patrick/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/authorization/authorizer/interfaces.go) based on the Event struct was also explored. This would allow us to keep the existing `Sink` interfaces, however it would require parsing the request URI twice in the pipeline due to how that field is represented in the Event. This was determined to not be worth the cost.
+
+#### Configuration Changes


Is the whole configuration mutable, or are there any parts that are immutable?

I think the whole thing should be mutable, and just reload on changes, I'll add a piece on that

tallclair · 2018-07-31T01:49:55Z

keps/sig-auth/0014-dynamic-audit-configuration.md

 ### Implementation Details/Notes/Constraints

+#### Feature Gating
+Introduction of dynamic policy requires changes to the current audit pipeline. Care must be taken that these changes are properly gated and do not affect the stability of the current features as they progress to GA. A new decorated handler will be provisioned similar to the [existing handlers](https://github.com/kubernetes/apiserver/blob/master/pkg/endpoints/filters/audit.go#L41) called `withDynamicAudit`. Another conditional clause will be added where the handlers are [provisioned](https://github.com/kubernetes/apiserver/blob/master/pkg/server/config.go#L536) allowing for the proper feature gating.


nit: "... and do not affect the stability or performance of the current features ..."

tallclair · 2018-07-31T02:03:29Z

keps/sig-auth/0014-dynamic-audit-configuration.md

 Any actions to the audit configuration objects will be hard coded to log at the `level=RequestResponse` to the previous backend and the new backend. If the apiserver is HA, the configuration will be rolled out in increments.

-Inherently apiserver aggregates and HA apiserver setups will work off the same dynamic configuration object. If separate objects are needed they should be configured as static objects on the node and set through the runtime flags. Aggregated servers will implement the same audit handling mechanisms. A conformance test should be provided as assurance. This needs further discussion with the participating sigs.
+#### Aggregated Servers
+Inherently apiserver aggregates and HA apiserver setups will work off the same dynamic configuration object. If separate objects are needed they should be configured as static objects on the node and set through the runtime flags. Aggregated servers will implement the same audit handling mechanisms. A conformance test should be provided as assurance. Metadata level logging will happen by default at the main api server as it proxies the traffic. The aggregated server will then watch the same configuration objects and only log on resource types that it handles.


note: this means that there will be duplicate events at each stage. This isn't necessarily wrong, but it should be documented. The receiving servers should not expect to key off { Audit-ID x Stage }

tallclair · 2018-07-31T02:05:07Z

keps/sig-auth/0014-dynamic-audit-configuration.md

@@ -173,6 +193,11 @@ This does open up the attack surface of the audit mechanisms. Having them strict

 As a mitigation strategy policy configured through a static file on the api server will not be accessible through the api. This file ensures that an escalation attack cannot tamper with a root configuration, but works independently of any dynamically configured objects.

+#### Leaked Resources
+A user is granted access to create audit policies, and inadvertently exposes secret resources.


To make the risk clearer, prefer:

A user with permissions to create audit policies effectively has read access to the entire cluster (including all secrets data).

tallclair · 2018-07-31T02:07:33Z

keps/sig-auth/0014-dynamic-audit-configuration.md

@@ -182,6 +207,10 @@ It may also be reasonable to provide a dynamic auth configuration from secrets,

 This needs further discussion.

+#### Performance
+
+These changes will likely have an O(n) performance impact on the api server per policy.  A `DeepCopy` of the event will be required for each backend. Also, the request/response object would now be serialized on every [request](https://github.com/kubernetes/kubernetes/blob/cef2d325ee1be894e883d63013f75cfac5cb1246/staging/src/k8s.io/apiserver/pkg/audit/request.go#L150-L152). Benchmark testing will be required to understand the scope of the impact and what optimizations may be required. This impact is gated by opt-in feature flags, which largely mitigates the concern.


This impact is gated by opt-in feature flags, which largely mitigates the concern.

I disagree with this statement. I think it unblocks the alpha implementation & lets us start benchmarking, but it is still a concern that needs to be addressed (or dismissed with data) before going to beta.

tallclair · 2018-07-31T02:09:37Z

keps/sig-auth/0014-dynamic-audit-configuration.md

+* It could prove difficult to understand as the policies themselves are fairly complex 
+* The use of CRDs would be difficult to bound
+
+The dynamic policy feature is gated by runtime flags. This still provides the cluster provisioner a means to limit audit logging to the single runtime object if needed.


feature gates aren't intended as long-term configuration options. If this is a desirable knob, we should add a non-feature gate flag for it.

agreed, added a piece on --audit-dynamic-configuration flag

tallclair · 2018-07-31T02:10:32Z

keps/sig-auth/0014-dynamic-audit-configuration.md

-last-updated: 2018-07-13
-status: provisional
+last-updated: 2018-07-30
+status: implementable


Please either revert this or close #2417

tallclair · 2018-07-31T17:46:14Z

keps/sig-auth/0014-dynamic-audit-configuration.md

@@ -80,7 +85,11 @@ type ClusterAuditConfiguration struct {

    v1.ObjectMeta

-    // Backends to send events
+    // Policy is the current audit v1beta1 Policy object
+    // if undefined it will default to the runtime flag provided policy if available


prefer: s/runtime flag provided/statically configured cluster/

tallclair · 2018-07-31T17:46:33Z

keps/sig-auth/0014-dynamic-audit-configuration.md

@@ -80,7 +85,11 @@ type ClusterAuditConfiguration struct {

    v1.ObjectMeta

-    // Backends to send events
+    // Policy is the current audit v1beta1 Policy object
+    // if undefined it will default to the runtime flag provided policy if available


"if available" - what if it isn't?

tallclair · 2018-07-31T17:50:00Z

keps/sig-auth/0014-dynamic-audit-configuration.md

+Implementing the [attribute interface](/Users/patrick/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/authorization/authorizer/interfaces.go) based on the Event struct was also explored. This would allow us to keep the existing `Sink` interfaces, however it would require parsing the request URI twice in the pipeline due to how that field is represented in the Event. This was determined to not be worth the cost.
+
+#### Aggregated Servers
+Inherently apiserver aggregates and HA apiserver setups will work off the same dynamic configuration object. If separate objects are needed they should be configured as static objects on the node and set through the runtime flags. Aggregated servers will implement the same audit handling mechanisms. A conformance test should be provided as assurance. Metadata level logging will happen by default at the main api server as it proxies the traffic. The aggregated server will then watch the same configuration objects and only log on resource types that it handles. This will duplicate the events sent to the receiving servers should they not expect to key off `{ Audit-ID x Stage }`.


nit: grammar in the last sentence. I think you're missing "so"

tallclair · 2018-07-31T17:53:17Z

keps/sig-auth/0014-dynamic-audit-configuration.md

+Implementing the [attribute interface](/Users/patrick/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/authorization/authorizer/interfaces.go) based on the Event struct was also explored. This would allow us to keep the existing `Sink` interfaces, however it would require parsing the request URI twice in the pipeline due to how that field is represented in the Event. This was determined to not be worth the cost.
+
+#### Aggregated Servers
+Inherently apiserver aggregates and HA apiserver setups will work off the same dynamic configuration object. If separate objects are needed they should be configured as static objects on the node and set through the runtime flags. Aggregated servers will implement the same audit handling mechanisms. A conformance test should be provided as assurance. Metadata level logging will happen by default at the main api server as it proxies the traffic. The aggregated server will then watch the same configuration objects and only log on resource types that it handles. This will duplicate the events sent to the receiving servers should they not expect to key off `{ Audit-ID x Stage }`.


This will duplicate ...

Do we want to provide an option to not duplicate it? E.g. if you're logging at metadata, there's not much reason to duplicate those events with the extension server (unless you're debugging an issue with that server). I'm not sure how we'd implement that though - do api servers know whether they're extension or aggregator servers?

do api servers know whether they're extension or aggregator servers?

not sure on that one, I could see this as being a useful add but maybe something we track as a later feature?

tallclair · 2018-07-31T17:56:53Z

Should we change the resource name from ClusterAuditConfiguration to AuditConfiguration? I think you went with Cluster with the intention of adding a namespaced config later.

pbarker · 2018-07-31T18:34:41Z

@tallclair went ahead and renamed to AuditConfiguration to simplify things

tallclair · 2018-07-31T18:37:58Z

/lgtm
/hold

I'm looking into the approval process before letting this merge.

tallclair · 2018-07-31T18:38:38Z

/milestone v1.12
/status approved-for-milestone

caesarxuchao

@pbarker though you pinged me on Slack, I think Tim is the right person to approve this change :)

I asked a couple of question for my own understanding.

caesarxuchao · 2018-07-31T20:13:18Z

keps/sig-auth/0014-dynamic-audit-configuration.md

+
+#### Aggregated Servers
+Inherently apiserver aggregates and HA apiserver setups will work off the same dynamic configuration object. If separate 
+objects are needed they should be configured as static objects on the node and set through the runtime flags. Aggregated 


I don't follow what "objects" means here, could you elaborate? Do you mean the Policy object?

Yeah separate audit configuration objects, I'll update to reflect

caesarxuchao · 2018-07-31T20:20:52Z

keps/sig-auth/0014-dynamic-audit-configuration.md

+servers will implement the same audit handling mechanisms. A conformance test should be provided as assurance. Metadata level 
+logging will happen by default at the main api server as it proxies the traffic. The aggregated server will then watch the same 
+configuration objects and only log on resource types that it handles. This will duplicate the events sent to the receiving servers 
+so they should not expect to key off `{ Audit-ID x Stage }`.


Sorry I'm not following this last sentence, could you elaborate?

This just implies that in some scenarios the event may occur in both servers, so to be cognizant of that when developing features.

Are all the events sent to all the servers, and is up to the servers to filter the events?

pbarker · 2018-07-31T20:42:59Z

thank you @caesarxuchao! @tallclair and I just wanted input from api machinery before merging, I updated the doc based on your feedback

tallclair · 2018-07-31T23:27:06Z

/hold cancel
/lgtm

tallclair · 2018-07-31T23:28:00Z

/approve

k8s-ci-robot · 2018-07-31T23:28:15Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: tallclair

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~keps/sig-auth/OWNERS~~ [tallclair]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

adds policy to dynamic audit kep

k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jul 23, 2018

k8s-ci-robot requested review from liggitt and tallclair July 23, 2018 22:23

k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/auth Categorizes an issue or PR as relevant to SIG Auth. labels Jul 23, 2018

k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jul 25, 2018

k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jul 25, 2018

tallclair reviewed Jul 26, 2018

View reviewed changes

k8s-ci-robot requested a review from x13n July 26, 2018 19:28

pbarker force-pushed the audit-policy branch 4 times, most recently from 68b61d1 to 1172947 Compare July 30, 2018 14:00

pbarker mentioned this pull request Jul 30, 2018

Dynamic Audit Configuration kubernetes/enhancements#600

Closed

2 tasks

tallclair reviewed Jul 31, 2018

View reviewed changes

pbarker mentioned this pull request Jul 31, 2018

marks kep 14 as implementable #2417

Closed

pbarker force-pushed the audit-policy branch 2 times, most recently from 70f53b5 to 2fe1fb7 Compare July 31, 2018 16:20

tallclair reviewed Jul 31, 2018

View reviewed changes

pbarker force-pushed the audit-policy branch 3 times, most recently from 74b2b5f to a99a9ae Compare July 31, 2018 18:34

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jul 31, 2018

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 31, 2018

k8s-ci-robot assigned tallclair and unassigned tallclair Jul 31, 2018

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 31, 2018

caesarxuchao reviewed Jul 31, 2018

View reviewed changes

adds policy to dynamic auditing KEP

bf9c628

pbarker force-pushed the audit-policy branch from a99a9ae to bf9c628 Compare July 31, 2018 20:42

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 31, 2018

k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Jul 31, 2018

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 31, 2018

k8s-ci-robot merged commit fe56bd9 into kubernetes:master Jul 31, 2018

calebamiles pushed a commit to calebamiles/community that referenced this pull request Sep 5, 2018

Merge pull request kubernetes#2407 from pbarker/audit-policy

f128e2f

adds policy to dynamic audit kep


		Implementing the [attribute interface](/Users/patrick/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/authorization/authorizer/interfaces.go) based on the Event struct was also explored. This would allow us to keep the existing `Sink` interfaces, however it would require parsing the request URI twice in the pipeline due to how that field is represented in the Event. This was determined to not be worth the cost.

		#### Configuration Changes

adds policy to dynamic audit kep #2407

adds policy to dynamic audit kep #2407

Conversation

pbarker commented Jul 23, 2018 • edited Loading

timothysc commented Jul 25, 2018

timothysc commented Jul 25, 2018

Choose a reason for hiding this comment

tallclair commented Jul 26, 2018

pbarker commented Jul 30, 2018

tallclair left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pbarker Jul 31, 2018 • edited Loading

Choose a reason for hiding this comment

tallclair commented Jul 31, 2018

pbarker commented Jul 31, 2018

tallclair commented Jul 31, 2018

tallclair commented Jul 31, 2018

caesarxuchao left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

caesarxuchao Jul 31, 2018 • edited Loading

Choose a reason for hiding this comment

pbarker commented Jul 31, 2018 • edited Loading

tallclair commented Jul 31, 2018

tallclair commented Jul 31, 2018

k8s-ci-robot commented Jul 31, 2018

pbarker commented Jul 23, 2018 •

edited

Loading

pbarker Jul 31, 2018 •

edited

Loading

caesarxuchao Jul 31, 2018 •

edited

Loading

pbarker commented Jul 31, 2018 •

edited

Loading