Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adds policy to dynamic audit kep #2407

Merged
merged 1 commit into from
Jul 31, 2018

Conversation

pbarker
Copy link
Contributor

@pbarker pbarker commented Jul 23, 2018

Adds ability to configure policy dynamically in the AuditConfiguration resource.

@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jul 23, 2018
@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/auth Categorizes an issue or PR as relevant to SIG Auth. labels Jul 23, 2018
@timothysc
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jul 25, 2018
@timothysc
Copy link
Member

/cc @kubernetes/sig-auth-pr-reviews

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jul 25, 2018
@@ -156,6 +164,9 @@ As a Kubernetes extension developer, I will be able to provide drop in extension
#### Story 3
As a cluster admin, I will be able configure multiple audit-policies and webhook endpoints to provide independent auditing facilities.

#### Story 4
As a kubernetes developer, I will be able to quickly turn up the audit level on a certain area to debug my application.

### Implementation Details/Notes/Constraints
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a section about how you plan on implementing this? I imagine something like: generate the full audit event for every request, and filter it down (drop the event, or drop the request/response body) based on policy prior to sending.

The most expensive portions of the audit pipleine are probably serializing the request&response body (may no longer be true with recent json imrpovements) [1], and duplicating the audit event before sending it [2]. These may need to be addressed before this feature can progress.

[1] https://github.com/kubernetes/kubernetes/blob/cef2d325ee1be894e883d63013f75cfac5cb1246/staging/src/k8s.io/apiserver/pkg/audit/request.go#L150-L152
[2] https://github.com/kubernetes/kubernetes/blob/cef2d325ee1be894e883d63013f75cfac5cb1246/staging/src/k8s.io/apiserver/plugin/pkg/audit/buffered/buffered.go#L269-L271

@tallclair
Copy link
Member

/cc @x13n

@k8s-ci-robot k8s-ci-robot requested a review from x13n July 26, 2018 19:28
@pbarker pbarker force-pushed the audit-policy branch 4 times, most recently from 68b61d1 to 1172947 Compare July 30, 2018 14:00
@pbarker
Copy link
Contributor Author

pbarker commented Jul 30, 2018

@tallclair updated with details on implementation and performance. PTAL

Copy link
Member

@tallclair tallclair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly nits. I think I'm happy with the overall direction, and just want to iron out a few details.

@@ -53,14 +53,13 @@ We want to allow the advanced auditing features to be dynamically configured. Fo

## Motivation

The advanced auditing features are a powerful tool, yet difficult to configure. The configuration requires deep insight into the deployment mechanism of choice and often takes many iterations to configure properly requiring a restart of the apiserver each time. Moreover, the ability to install addon tools that configure and enhance audting is hindered by the overhead in configuration. Such tools frequently run on the cluster requiring future knowledge of how to reach them when the cluster is live. These tools could enhance the security and conformance of the cluster and its applications.
The advanced auditing features are a powerful tool, yet difficult to configure. The configuration requires deep insight into the deployment mechanism of choice and often takes many iterations to configure properly requiring a restart of the apiserver each time. Moreover, the ability to install addon tools that configure and enhance auditing is hindered by the overhead in configuration. Such tools frequently run on the cluster requiring future knowledge of how to reach them when the cluster is live. These tools could enhance the security and conformance of the cluster and its applications.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I like to break up long lines in proposals, because it is easier to tell what changed in github, and easier to comment on a specific sentence. The exact length doesn't matter, but I usually go for ~100chars

@@ -80,7 +79,10 @@ type ClusterAuditConfiguration struct {

v1.ObjectMeta

// Backends to send events
// Policy is the current audit v1beta1 Policy object
Policy Policy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make this optional, and default to the hardcoded policy? Or do you think it's better to force webhooks to be explicit about the data they want?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been wondering this myself, I think it could be handy to have it default as long as we feel that would be clear enough to the user.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense. The default policy is a bit complicated - not just "log everything @request-response", so not requiring users to come up with their own version or copy that into the config here would simplify the setup quite a bit.

If you agree, make this a pointer, and update the comment to explain that.


Implementing the [attribute interface](/Users/patrick/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/authorization/authorizer/interfaces.go) based on the Event struct was also explored. This would allow us to keep the existing `Sink` interfaces, however it would require parsing the request URI twice in the pipeline due to how that field is represented in the Event. This was determined to not be worth the cost.

#### Configuration Changes
Any actions to the audit configuration objects will be hard coded to log at the `level=RequestResponse` to the previous backend and the new backend. If the apiserver is HA, the configuration will be rolled out in increments.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any actions to the audit configuration objects will be hard coded to log at the level=RequestResponse to the previous backend and the new backend.

Does this still apply? I think the static backends should just use the static policy. I think in most cases that would include webhook configuration, but I don't think it needs to be hardcoded. I can see an argument for including it to the dynamic webhooks, but what if (for example) I'm only trying to log things in a specific namespace?

In either case, please clarify what "previous backend and the new backend" refers to (should it be all backends?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah because of how we mitigated the privilege escalation scenario with the static policy I think were fine to remove this


Implementing the [attribute interface](/Users/patrick/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/authorization/authorizer/interfaces.go) based on the Event struct was also explored. This would allow us to keep the existing `Sink` interfaces, however it would require parsing the request URI twice in the pipeline due to how that field is represented in the Event. This was determined to not be worth the cost.

#### Configuration Changes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the whole configuration mutable, or are there any parts that are immutable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the whole thing should be mutable, and just reload on changes, I'll add a piece on that

### Implementation Details/Notes/Constraints

#### Feature Gating
Introduction of dynamic policy requires changes to the current audit pipeline. Care must be taken that these changes are properly gated and do not affect the stability of the current features as they progress to GA. A new decorated handler will be provisioned similar to the [existing handlers](https://github.com/kubernetes/apiserver/blob/master/pkg/endpoints/filters/audit.go#L41) called `withDynamicAudit`. Another conditional clause will be added where the handlers are [provisioned](https://github.com/kubernetes/apiserver/blob/master/pkg/server/config.go#L536) allowing for the proper feature gating.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "... and do not affect the stability or performance of the current features ..."

Any actions to the audit configuration objects will be hard coded to log at the `level=RequestResponse` to the previous backend and the new backend. If the apiserver is HA, the configuration will be rolled out in increments.

Inherently apiserver aggregates and HA apiserver setups will work off the same dynamic configuration object. If separate objects are needed they should be configured as static objects on the node and set through the runtime flags. Aggregated servers will implement the same audit handling mechanisms. A conformance test should be provided as assurance. This needs further discussion with the participating sigs.
#### Aggregated Servers
Inherently apiserver aggregates and HA apiserver setups will work off the same dynamic configuration object. If separate objects are needed they should be configured as static objects on the node and set through the runtime flags. Aggregated servers will implement the same audit handling mechanisms. A conformance test should be provided as assurance. Metadata level logging will happen by default at the main api server as it proxies the traffic. The aggregated server will then watch the same configuration objects and only log on resource types that it handles.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: this means that there will be duplicate events at each stage. This isn't necessarily wrong, but it should be documented. The receiving servers should not expect to key off { Audit-ID x Stage }

@@ -173,6 +193,11 @@ This does open up the attack surface of the audit mechanisms. Having them strict

As a mitigation strategy policy configured through a static file on the api server will not be accessible through the api. This file ensures that an escalation attack cannot tamper with a root configuration, but works independently of any dynamically configured objects.

#### Leaked Resources
A user is granted access to create audit policies, and inadvertently exposes secret resources.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make the risk clearer, prefer:

A user with permissions to create audit policies effectively has read access to the entire cluster (including all secrets data).

@@ -182,6 +207,10 @@ It may also be reasonable to provide a dynamic auth configuration from secrets,

This needs further discussion.

#### Performance

These changes will likely have an O(n) performance impact on the api server per policy. A `DeepCopy` of the event will be required for each backend. Also, the request/response object would now be serialized on every [request](https://github.com/kubernetes/kubernetes/blob/cef2d325ee1be894e883d63013f75cfac5cb1246/staging/src/k8s.io/apiserver/pkg/audit/request.go#L150-L152). Benchmark testing will be required to understand the scope of the impact and what optimizations may be required. This impact is gated by opt-in feature flags, which largely mitigates the concern.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This impact is gated by opt-in feature flags, which largely mitigates the concern.

I disagree with this statement. I think it unblocks the alpha implementation & lets us start benchmarking, but it is still a concern that needs to be addressed (or dismissed with data) before going to beta.

* It could prove difficult to understand as the policies themselves are fairly complex
* The use of CRDs would be difficult to bound

The dynamic policy feature is gated by runtime flags. This still provides the cluster provisioner a means to limit audit logging to the single runtime object if needed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feature gates aren't intended as long-term configuration options. If this is a desirable knob, we should add a non-feature gate flag for it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, added a piece on --audit-dynamic-configuration flag

last-updated: 2018-07-13
status: provisional
last-updated: 2018-07-30
status: implementable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please either revert this or close #2417

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

closed

@pbarker pbarker force-pushed the audit-policy branch 2 times, most recently from 70f53b5 to 2fe1fb7 Compare July 31, 2018 16:20
@@ -80,7 +85,11 @@ type ClusterAuditConfiguration struct {

v1.ObjectMeta

// Backends to send events
// Policy is the current audit v1beta1 Policy object
// if undefined it will default to the runtime flag provided policy if available
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefer: s/runtime flag provided/statically configured cluster/

@@ -80,7 +85,11 @@ type ClusterAuditConfiguration struct {

v1.ObjectMeta

// Backends to send events
// Policy is the current audit v1beta1 Policy object
// if undefined it will default to the runtime flag provided policy if available
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"if available" - what if it isn't?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Implementing the [attribute interface](/Users/patrick/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/authorization/authorizer/interfaces.go) based on the Event struct was also explored. This would allow us to keep the existing `Sink` interfaces, however it would require parsing the request URI twice in the pipeline due to how that field is represented in the Event. This was determined to not be worth the cost.

#### Aggregated Servers
Inherently apiserver aggregates and HA apiserver setups will work off the same dynamic configuration object. If separate objects are needed they should be configured as static objects on the node and set through the runtime flags. Aggregated servers will implement the same audit handling mechanisms. A conformance test should be provided as assurance. Metadata level logging will happen by default at the main api server as it proxies the traffic. The aggregated server will then watch the same configuration objects and only log on resource types that it handles. This will duplicate the events sent to the receiving servers should they not expect to key off `{ Audit-ID x Stage }`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: grammar in the last sentence. I think you're missing "so"

Implementing the [attribute interface](/Users/patrick/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/authorization/authorizer/interfaces.go) based on the Event struct was also explored. This would allow us to keep the existing `Sink` interfaces, however it would require parsing the request URI twice in the pipeline due to how that field is represented in the Event. This was determined to not be worth the cost.

#### Aggregated Servers
Inherently apiserver aggregates and HA apiserver setups will work off the same dynamic configuration object. If separate objects are needed they should be configured as static objects on the node and set through the runtime flags. Aggregated servers will implement the same audit handling mechanisms. A conformance test should be provided as assurance. Metadata level logging will happen by default at the main api server as it proxies the traffic. The aggregated server will then watch the same configuration objects and only log on resource types that it handles. This will duplicate the events sent to the receiving servers should they not expect to key off `{ Audit-ID x Stage }`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will duplicate ...

Do we want to provide an option to not duplicate it? E.g. if you're logging at metadata, there's not much reason to duplicate those events with the extension server (unless you're debugging an issue with that server). I'm not sure how we'd implement that though - do api servers know whether they're extension or aggregator servers?

Copy link
Contributor Author

@pbarker pbarker Jul 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do api servers know whether they're extension or aggregator servers?

not sure on that one, I could see this as being a useful add but maybe something we track as a later feature?

@tallclair
Copy link
Member

Should we change the resource name from ClusterAuditConfiguration to AuditConfiguration? I think you went with Cluster with the intention of adding a namespaced config later.

@pbarker pbarker force-pushed the audit-policy branch 3 times, most recently from 74b2b5f to a99a9ae Compare July 31, 2018 18:34
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jul 31, 2018
@pbarker
Copy link
Contributor Author

pbarker commented Jul 31, 2018

@tallclair went ahead and renamed to AuditConfiguration to simplify things

@tallclair
Copy link
Member

/lgtm
/hold

I'm looking into the approval process before letting this merge.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 31, 2018
@k8s-ci-robot k8s-ci-robot assigned tallclair and unassigned tallclair Jul 31, 2018
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 31, 2018
@tallclair
Copy link
Member

/milestone v1.12
/status approved-for-milestone

Copy link
Member

@caesarxuchao caesarxuchao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pbarker though you pinged me on Slack, I think Tim is the right person to approve this change :)

I asked a couple of question for my own understanding.


#### Aggregated Servers
Inherently apiserver aggregates and HA apiserver setups will work off the same dynamic configuration object. If separate
objects are needed they should be configured as static objects on the node and set through the runtime flags. Aggregated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow what "objects" means here, could you elaborate? Do you mean the Policy object?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah separate audit configuration objects, I'll update to reflect

servers will implement the same audit handling mechanisms. A conformance test should be provided as assurance. Metadata level
logging will happen by default at the main api server as it proxies the traffic. The aggregated server will then watch the same
configuration objects and only log on resource types that it handles. This will duplicate the events sent to the receiving servers
so they should not expect to key off `{ Audit-ID x Stage }`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I'm not following this last sentence, could you elaborate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This just implies that in some scenarios the event may occur in both servers, so to be cognizant of that when developing features.

Copy link
Member

@caesarxuchao caesarxuchao Jul 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are all the events sent to all the servers, and is up to the servers to filter the events?

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 31, 2018
@pbarker
Copy link
Contributor Author

pbarker commented Jul 31, 2018

thank you @caesarxuchao! @tallclair and I just wanted input from api machinery before merging, I updated the doc based on your feedback

@tallclair
Copy link
Member

/hold cancel
/lgtm

@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Jul 31, 2018
@tallclair
Copy link
Member

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: tallclair

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 31, 2018
@k8s-ci-robot k8s-ci-robot merged commit fe56bd9 into kubernetes:master Jul 31, 2018
calebamiles pushed a commit to calebamiles/community that referenced this pull request Sep 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/auth Categorizes an issue or PR as relevant to SIG Auth. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants