-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adds policy to dynamic audit kep #2407
Conversation
/ok-to-test |
/cc @kubernetes/sig-auth-pr-reviews |
@@ -156,6 +164,9 @@ As a Kubernetes extension developer, I will be able to provide drop in extension | |||
#### Story 3 | |||
As a cluster admin, I will be able configure multiple audit-policies and webhook endpoints to provide independent auditing facilities. | |||
|
|||
#### Story 4 | |||
As a kubernetes developer, I will be able to quickly turn up the audit level on a certain area to debug my application. | |||
|
|||
### Implementation Details/Notes/Constraints |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a section about how you plan on implementing this? I imagine something like: generate the full audit event for every request, and filter it down (drop the event, or drop the request/response body) based on policy prior to sending.
The most expensive portions of the audit pipleine are probably serializing the request&response body (may no longer be true with recent json imrpovements) [1], and duplicating the audit event before sending it [2]. These may need to be addressed before this feature can progress.
[1] https://github.com/kubernetes/kubernetes/blob/cef2d325ee1be894e883d63013f75cfac5cb1246/staging/src/k8s.io/apiserver/pkg/audit/request.go#L150-L152
[2] https://github.com/kubernetes/kubernetes/blob/cef2d325ee1be894e883d63013f75cfac5cb1246/staging/src/k8s.io/apiserver/plugin/pkg/audit/buffered/buffered.go#L269-L271
/cc @x13n |
68b61d1
to
1172947
Compare
@tallclair updated with details on implementation and performance. PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly nits. I think I'm happy with the overall direction, and just want to iron out a few details.
@@ -53,14 +53,13 @@ We want to allow the advanced auditing features to be dynamically configured. Fo | |||
|
|||
## Motivation | |||
|
|||
The advanced auditing features are a powerful tool, yet difficult to configure. The configuration requires deep insight into the deployment mechanism of choice and often takes many iterations to configure properly requiring a restart of the apiserver each time. Moreover, the ability to install addon tools that configure and enhance audting is hindered by the overhead in configuration. Such tools frequently run on the cluster requiring future knowledge of how to reach them when the cluster is live. These tools could enhance the security and conformance of the cluster and its applications. | |||
The advanced auditing features are a powerful tool, yet difficult to configure. The configuration requires deep insight into the deployment mechanism of choice and often takes many iterations to configure properly requiring a restart of the apiserver each time. Moreover, the ability to install addon tools that configure and enhance auditing is hindered by the overhead in configuration. Such tools frequently run on the cluster requiring future knowledge of how to reach them when the cluster is live. These tools could enhance the security and conformance of the cluster and its applications. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I like to break up long lines in proposals, because it is easier to tell what changed in github, and easier to comment on a specific sentence. The exact length doesn't matter, but I usually go for ~100chars
@@ -80,7 +79,10 @@ type ClusterAuditConfiguration struct { | |||
|
|||
v1.ObjectMeta | |||
|
|||
// Backends to send events | |||
// Policy is the current audit v1beta1 Policy object | |||
Policy Policy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we make this optional, and default to the hardcoded policy? Or do you think it's better to force webhooks to be explicit about the data they want?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been wondering this myself, I think it could be handy to have it default as long as we feel that would be clear enough to the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense. The default policy is a bit complicated - not just "log everything @request-response", so not requiring users to come up with their own version or copy that into the config here would simplify the setup quite a bit.
If you agree, make this a pointer, and update the comment to explain that.
|
||
Implementing the [attribute interface](/Users/patrick/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/authorization/authorizer/interfaces.go) based on the Event struct was also explored. This would allow us to keep the existing `Sink` interfaces, however it would require parsing the request URI twice in the pipeline due to how that field is represented in the Event. This was determined to not be worth the cost. | ||
|
||
#### Configuration Changes | ||
Any actions to the audit configuration objects will be hard coded to log at the `level=RequestResponse` to the previous backend and the new backend. If the apiserver is HA, the configuration will be rolled out in increments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any actions to the audit configuration objects will be hard coded to log at the
level=RequestResponse
to the previous backend and the new backend.
Does this still apply? I think the static backends should just use the static policy. I think in most cases that would include webhook configuration, but I don't think it needs to be hardcoded. I can see an argument for including it to the dynamic webhooks, but what if (for example) I'm only trying to log things in a specific namespace?
In either case, please clarify what "previous backend and the new backend" refers to (should it be all backends?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah because of how we mitigated the privilege escalation scenario with the static policy I think were fine to remove this
|
||
Implementing the [attribute interface](/Users/patrick/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/authorization/authorizer/interfaces.go) based on the Event struct was also explored. This would allow us to keep the existing `Sink` interfaces, however it would require parsing the request URI twice in the pipeline due to how that field is represented in the Event. This was determined to not be worth the cost. | ||
|
||
#### Configuration Changes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the whole configuration mutable, or are there any parts that are immutable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the whole thing should be mutable, and just reload on changes, I'll add a piece on that
### Implementation Details/Notes/Constraints | ||
|
||
#### Feature Gating | ||
Introduction of dynamic policy requires changes to the current audit pipeline. Care must be taken that these changes are properly gated and do not affect the stability of the current features as they progress to GA. A new decorated handler will be provisioned similar to the [existing handlers](https://github.com/kubernetes/apiserver/blob/master/pkg/endpoints/filters/audit.go#L41) called `withDynamicAudit`. Another conditional clause will be added where the handlers are [provisioned](https://github.com/kubernetes/apiserver/blob/master/pkg/server/config.go#L536) allowing for the proper feature gating. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "... and do not affect the stability or performance of the current features ..."
Any actions to the audit configuration objects will be hard coded to log at the `level=RequestResponse` to the previous backend and the new backend. If the apiserver is HA, the configuration will be rolled out in increments. | ||
|
||
Inherently apiserver aggregates and HA apiserver setups will work off the same dynamic configuration object. If separate objects are needed they should be configured as static objects on the node and set through the runtime flags. Aggregated servers will implement the same audit handling mechanisms. A conformance test should be provided as assurance. This needs further discussion with the participating sigs. | ||
#### Aggregated Servers | ||
Inherently apiserver aggregates and HA apiserver setups will work off the same dynamic configuration object. If separate objects are needed they should be configured as static objects on the node and set through the runtime flags. Aggregated servers will implement the same audit handling mechanisms. A conformance test should be provided as assurance. Metadata level logging will happen by default at the main api server as it proxies the traffic. The aggregated server will then watch the same configuration objects and only log on resource types that it handles. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: this means that there will be duplicate events at each stage. This isn't necessarily wrong, but it should be documented. The receiving servers should not expect to key off { Audit-ID x Stage }
@@ -173,6 +193,11 @@ This does open up the attack surface of the audit mechanisms. Having them strict | |||
|
|||
As a mitigation strategy policy configured through a static file on the api server will not be accessible through the api. This file ensures that an escalation attack cannot tamper with a root configuration, but works independently of any dynamically configured objects. | |||
|
|||
#### Leaked Resources | |||
A user is granted access to create audit policies, and inadvertently exposes secret resources. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make the risk clearer, prefer:
A user with permissions to create audit policies effectively has read access to the entire cluster (including all secrets data).
@@ -182,6 +207,10 @@ It may also be reasonable to provide a dynamic auth configuration from secrets, | |||
|
|||
This needs further discussion. | |||
|
|||
#### Performance | |||
|
|||
These changes will likely have an O(n) performance impact on the api server per policy. A `DeepCopy` of the event will be required for each backend. Also, the request/response object would now be serialized on every [request](https://github.com/kubernetes/kubernetes/blob/cef2d325ee1be894e883d63013f75cfac5cb1246/staging/src/k8s.io/apiserver/pkg/audit/request.go#L150-L152). Benchmark testing will be required to understand the scope of the impact and what optimizations may be required. This impact is gated by opt-in feature flags, which largely mitigates the concern. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This impact is gated by opt-in feature flags, which largely mitigates the concern.
I disagree with this statement. I think it unblocks the alpha implementation & lets us start benchmarking, but it is still a concern that needs to be addressed (or dismissed with data) before going to beta.
* It could prove difficult to understand as the policies themselves are fairly complex | ||
* The use of CRDs would be difficult to bound | ||
|
||
The dynamic policy feature is gated by runtime flags. This still provides the cluster provisioner a means to limit audit logging to the single runtime object if needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feature gates aren't intended as long-term configuration options. If this is a desirable knob, we should add a non-feature gate flag for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, added a piece on --audit-dynamic-configuration
flag
last-updated: 2018-07-13 | ||
status: provisional | ||
last-updated: 2018-07-30 | ||
status: implementable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please either revert this or close #2417
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
closed
70f53b5
to
2fe1fb7
Compare
@@ -80,7 +85,11 @@ type ClusterAuditConfiguration struct { | |||
|
|||
v1.ObjectMeta | |||
|
|||
// Backends to send events | |||
// Policy is the current audit v1beta1 Policy object | |||
// if undefined it will default to the runtime flag provided policy if available |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prefer: s/runtime flag provided/statically configured cluster/
@@ -80,7 +85,11 @@ type ClusterAuditConfiguration struct { | |||
|
|||
v1.ObjectMeta | |||
|
|||
// Backends to send events | |||
// Policy is the current audit v1beta1 Policy object | |||
// if undefined it will default to the runtime flag provided policy if available |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"if available" - what if it isn't?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
Implementing the [attribute interface](/Users/patrick/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/authorization/authorizer/interfaces.go) based on the Event struct was also explored. This would allow us to keep the existing `Sink` interfaces, however it would require parsing the request URI twice in the pipeline due to how that field is represented in the Event. This was determined to not be worth the cost. | ||
|
||
#### Aggregated Servers | ||
Inherently apiserver aggregates and HA apiserver setups will work off the same dynamic configuration object. If separate objects are needed they should be configured as static objects on the node and set through the runtime flags. Aggregated servers will implement the same audit handling mechanisms. A conformance test should be provided as assurance. Metadata level logging will happen by default at the main api server as it proxies the traffic. The aggregated server will then watch the same configuration objects and only log on resource types that it handles. This will duplicate the events sent to the receiving servers should they not expect to key off `{ Audit-ID x Stage }`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: grammar in the last sentence. I think you're missing "so"
Implementing the [attribute interface](/Users/patrick/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/authorization/authorizer/interfaces.go) based on the Event struct was also explored. This would allow us to keep the existing `Sink` interfaces, however it would require parsing the request URI twice in the pipeline due to how that field is represented in the Event. This was determined to not be worth the cost. | ||
|
||
#### Aggregated Servers | ||
Inherently apiserver aggregates and HA apiserver setups will work off the same dynamic configuration object. If separate objects are needed they should be configured as static objects on the node and set through the runtime flags. Aggregated servers will implement the same audit handling mechanisms. A conformance test should be provided as assurance. Metadata level logging will happen by default at the main api server as it proxies the traffic. The aggregated server will then watch the same configuration objects and only log on resource types that it handles. This will duplicate the events sent to the receiving servers should they not expect to key off `{ Audit-ID x Stage }`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will duplicate ...
Do we want to provide an option to not duplicate it? E.g. if you're logging at metadata, there's not much reason to duplicate those events with the extension server (unless you're debugging an issue with that server). I'm not sure how we'd implement that though - do api servers know whether they're extension or aggregator servers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do api servers know whether they're extension or aggregator servers?
not sure on that one, I could see this as being a useful add but maybe something we track as a later feature?
Should we change the resource name from |
74b2b5f
to
a99a9ae
Compare
@tallclair went ahead and renamed to |
/lgtm I'm looking into the approval process before letting this merge. |
/milestone v1.12 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pbarker though you pinged me on Slack, I think Tim is the right person to approve this change :)
I asked a couple of question for my own understanding.
|
||
#### Aggregated Servers | ||
Inherently apiserver aggregates and HA apiserver setups will work off the same dynamic configuration object. If separate | ||
objects are needed they should be configured as static objects on the node and set through the runtime flags. Aggregated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't follow what "objects" means here, could you elaborate? Do you mean the Policy object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah separate audit configuration objects, I'll update to reflect
servers will implement the same audit handling mechanisms. A conformance test should be provided as assurance. Metadata level | ||
logging will happen by default at the main api server as it proxies the traffic. The aggregated server will then watch the same | ||
configuration objects and only log on resource types that it handles. This will duplicate the events sent to the receiving servers | ||
so they should not expect to key off `{ Audit-ID x Stage }`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I'm not following this last sentence, could you elaborate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This just implies that in some scenarios the event may occur in both servers, so to be cognizant of that when developing features.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are all the events sent to all the servers, and is up to the servers to filter the events?
thank you @caesarxuchao! @tallclair and I just wanted input from api machinery before merging, I updated the doc based on your feedback |
/hold cancel |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: tallclair The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
adds policy to dynamic audit kep
Adds ability to configure policy dynamically in the
AuditConfiguration
resource.