Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Advanced Auditing Beta Changes #48561

Closed
6 of 7 tasks
timstclair opened this issue Jul 6, 2017 · 44 comments
Closed
6 of 7 tasks

Advanced Auditing Beta Changes #48561

timstclair opened this issue Jul 6, 2017 · 44 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/feature Categorizes issue or PR as related to a new feature. milestone/removed priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/auth Categorizes an issue or PR as relevant to SIG Auth. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation.

Comments

@timstclair
Copy link

timstclair commented Jul 6, 2017

As we work with the new "advanced auditing" API, we're noticing places where the API could be improved. I'm opening this issue to track all the changes we'd like to make to the API when it goes to beta.

API Changes:

  • audit.Event.ObjectRef.APIVersion currently holds both the the API group and version, separated by a /. We should break these out into separate fields.
  • Policy should be able to specify subresources. This could either be a separate field, or allow matching of / delimited resources (e.g. pods/status for the pods resource and status subresource)
  • ( @ericchiang ) It would be useful to be able to specify resource names in the policy. E.g. ingress controller configmap. Resources in the [GroupResources]
    ( ) struct should be changed to a struct that includes Resource + ResourceNames (slice) + (Subresource)
  • audit.Event.Metadata.CreationTimestamp shows up as null in the json serialized events we output, which looks sloppy. We should consider cleaning this up. One possibility is to get rid of the audit.Event.Timestamp field, and use CreationTimestamp.
  • We want to omit the RequestReceived stage in GKE. The policy may be the right place to specify that.

Other Changes:

  • feature gate AdvancedAuditing moves to beta and defaults to enabled

Postponed to post 1.8.0

  • It would be nice to identify the server that sent the audit event, i.e. the aggregator vs. an end-user apiserver. Implementation TBD.

/cc @sttts @soltysh @ericchiang @ihmccreery

Feature: kubernetes/enhancements#22

@timstclair timstclair added this to the v1.8 milestone Jul 6, 2017
@k8s-github-robot
Copy link

@timstclair There are no sig labels on this issue. Please add a sig label by:
(1) mentioning a sig: @kubernetes/sig-<team-name>-misc
e.g., @kubernetes/sig-api-machinery-* for API Machinery
(2) specifying the label manually: /sig <label>
e.g., /sig scalability for sig/scalability

Note: method (1) will trigger a notification to the team. You can find the team list here and label list here

@k8s-github-robot k8s-github-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jul 6, 2017
@xiangpengzhao
Copy link
Contributor

/sig api-machinery

@k8s-ci-robot k8s-ci-robot added the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Jul 7, 2017
@k8s-github-robot k8s-github-robot removed the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jul 7, 2017
@tallclair
Copy link
Member

/cc @crassirostris

@CaoShuFeng
Copy link
Contributor

I am very glad to implement the first one if no one has started it.

audit.Event.ObjectRef.APIVersion currently holds both the the API group and version, separated by a /. We should break these out into separate fields.

@tallclair
Copy link
Member

Note that the first one is a breaking change, since it would mean we populate the APIVersion differently. I think it should be done in the transition to the Beta API, rather than an augment to the alpha API.

@tallclair tallclair changed the title Advanced Auditing Beta API Changes Advanced Auditing Beta Changes Jul 13, 2017
@lavalamp lavalamp added sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. and removed sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. labels Jul 13, 2017
@lavalamp
Copy link
Member

Please note that SIG API doesn't own the contents of apis and we don't need to own issues like this one.

@ericchiang
Copy link
Contributor

I think the GKE audit policy really demonstrated how many edge cases there are when composing an audit policy. Don't capture the request body of TokenSubjectReview, ignore kube-proxy watching endpoints, etc.

It'd be great to formalize that knowledge somehow. Either by creating a default audit policy in the API server (like default RBAC roles), or by having a real example audit policy in documentation.

@CaoShuFeng
Copy link
Contributor

Note that the first one is a breaking change, since it would mean we populate the APIVersion differently. I think it should be done in the transition to the Beta API, rather than an augment to the alpha API.

Thanks for pointing it out!

@tallclair
Copy link
Member

I think we should consider adding user agent to the audit user info. I know it can be spoofed trivially, but we've had a couple people ask for it.

@CaoShuFeng
Copy link
Contributor

Hi, can we implement a feature that support reading audits by kubectl or dashborad?
I think this is a common use case.
What's your opinion?

CaoShuFeng added a commit to CaoShuFeng/kubernetes that referenced this issue Jul 20, 2017
Updates kubernetes#48561
This provide a way to omit some stages for each audit policy rule.

For example:
  - level: Metadata
    resources:
       - group: "rbac.authorization.k8s.io"
         resources: ["roles"]
    omitStages:
      - "RequestReceived"

RequestReceived stage will not be emitted to audit backends with
previous config.
@tallclair
Copy link
Member

I think audit logs should be exposed similarly to other master/apiserver logs. I'm hesitant to build a custom audit log solution.

@sttts
Copy link
Contributor

sttts commented Jul 21, 2017

Making them available via kube-apiserver doesn't sound right. Nothing stops you though from creating a custom read-only audit-event-apiserver which preferably is backed by an independent storage (maybe not even etcd) and hook that into the cluster with apiserver aggregation. Of course, use the audit webhook to feed the events into audit-event-apiserver.

@tallclair
Copy link
Member

Copying from #46592 (comment), with regards to omitting ReqeustRecieved stage from the batched webhook output:

My priority is just to reduce the number of events that are sent by the webhook, so an alternative would be to do some deduplication in the webhook. One way of doing so would be to put the early stages in a sort of "TTL Cache" which holds the latest stage of the event. If the ResponseComplete stage is logged, then that event is removed from the cache and added to the queue as with today. If an event is evicted from the cache (either due to size or TTL expiration), then the latest stage is added to the queue. The end result should be that most requests just send a ResponseComplete event, but requests that take an unusually long time to complete are guaranteed to be sent within X seconds (60?) of the request start time. (Obviously this would only be for the batching mode).

I think we should pursue this option before adding more configuration to the policy, but open for discussion.

@ericchiang
Copy link
Contributor

I like the batching webhook mode doing de-duplication. It's a nice middle ground between sending less events and adding even more configuration to omit stages. It also provides a sane default option.

Looking at #46897, it already takes a lot of prior knowledge to write these policy files. I think that's a strong argument for not making the policy more complex.

@crassirostris
Copy link

@sttts @ericchiang @CaoShuFeng

We recently discussed the issue of omitting stages with @tallclair and here are some thoughts:

  • Sometimes having this stage can actually be useful, e.g. to detect a query of death -- a request without a response
  • This logic is useful not only with the batching webhook, it's even more useful with e.g. blocking webhook

Also I personally think that the logic with omitStages is cleaner and easier to maintain and explain to users

@CaoShuFeng
Copy link
Contributor

CaoShuFeng commented Sep 7, 2017

Turning the AdvancedAuditing feature gate on by default changes the behavior of the existing audit log flags.

Yes, you are right.

We should never turn this feature on?
Or this change will break some tests again like #51556, so we should turn the gate on and update some test at the same time?

@ericchiang
Copy link
Contributor

We should never turn this feature on?

Maybe we let users opt into this through --audit-log-format?

e.g. if you supply --audit-log-format=legacy, which we make the default again, you get the old audit log implementation. If you supply --audit-log-format=json you get the new implementation.

Don't know how cleanly that would interact with --audit-policy-file.

Or this change will break some tests again like #51556, so we should turn the gate on and update some test at the same time?

Don't have a good answer there. We'd probably just update the e2e tests to use the --audit-log-format=json flag.

@sttts @crassirostris @soltysh any thoughts here? I'd like to see a PR soon that formally bumps the feature flag to beta, but I think we have to address this first.

@CaoShuFeng
Copy link
Contributor

Don't know how cleanly that would interact with --audit-policy-file.

With current implementation, if user doesn't pass --audit-policy-file to kube-apiserver, nothing will be recorded.

@CaoShuFeng
Copy link
Contributor

CaoShuFeng commented Sep 7, 2017

Maybe we let users opt into this through --audit-log-format?

This option is not a switch between Lagacy audit/Advanced audit.
This option is used for the format of Advanced audit.
I known this --audit-log-format hasn't been released yet, but it has been merged for 2 months, can we change its behavior now?

@crassirostris
Copy link

@sttts @crassirostris @soltysh any thoughts here? I'd like to see a PR soon that formally bumps the feature flag to beta, but I think we have to address this first.

Wait, hasn't #51943 already done this?

@crassirostris
Copy link

The way I see it, you can still enable basic auth by disabling advanced audit logging, so it's not a problem. @CaoShuFeng even added this to the documentation in kubernetes/website#5300

@ericchiang
Copy link
Contributor

Wait, hasn't #51943 already done this?

Wasn't aware we already did this. I'll call it out explicitly in the release notes. https://github.com/kubernetes/features/blob/master/release-1.8/release_notes_draft.md#action-required-before-upgrading

@crassirostris
Copy link

@ericchiang Thanks!

@tallclair
Copy link
Member

Good progress has been made on this, but the remaining issues aren't blockers. Moving to 1.9, but maybe we should consider this complete?

@crassirostris
Copy link

@tallclair Everything except for the discussion about api-aggregator (and/or federation?) is done, I think it makes sense to close this one and one a new one later

@k8s-github-robot
Copy link

[MILESTONENOTIFIER] Milestone Removed

@timstclair

Important: This issue was missing labels required for the v1.9 milestone for more than 3 days:

kind: Must specify exactly one of kind/bug, kind/cleanup or kind/feature.

Help

@k8s-github-robot k8s-github-robot removed this from the v1.9 milestone Oct 11, 2017
@crassirostris
Copy link

Thanks everyone for working on audit logging in 1.8!

Closing in favor of #54551

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/feature Categorizes issue or PR as related to a new feature. milestone/removed priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/auth Categorizes an issue or PR as relevant to SIG Auth. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation.
Projects
None yet
Development

No branches or pull requests