Add audit logging for Supervisor and Concierge #2009

cfryanr · 2024-07-05T18:29:13Z

PR for implementing audit logging in the Concierge and Supervisor.

Note that this implementation differs from the proposal document, which is over two years old.

In that old proposal doc, I proposed that we do a bunch of extra work to make a fancy sidecar container that would allow us to seperate the audit logs from the regular error/warning/debug/info/trace logs, to make it easier for admin users to export their audit logs seperately. However, much time has passed since then and I no longer believe that is nessessary. Fluentbit and other log exporting tools have improved since then. Fluentbit now includes tools which allow you to separate lines from a pod log and send those special lines to a different destination (example of that shown below).

Instead, this PR outputs the new audit log events into the regular pod log, and makes them easy to grep by always including "auditEvent":true on every log line which is for an audit event.

The key/values in audit event logs are:

Always have auditEvent equal to true
Always have these pre-existing keys that our pod logs were already using:
- level, which for audit line will always be equal to info
- timestamp
- caller (line of code which caused the log)
- message which for audit lines is effectively the event type. The value will always be one the messages declared as an enum in audit_events.go, which is effectively a catalog of all possible audit event types. No string interpolation should ever be done into the message value.
- Only when the global log level is configured to "trace" or "all", then they will also always include a stacktrace key whose value shows a full Go stacktrace for the caller. This is the case for all lines of our pod logs, and audit lines are no exception.
When applicable, logs lines have an auditID which is a unique ID for every HTTP request, to allow multiple lines of audit events to be correlated when they came from a single HTTP request. This audit ID is also returned to the client as an HTTP response header to allow for correlation between the request as observed by the client and the logs as observed by the administrator.
When applicable, logs lines have a sessionID which is the unique ID of a stored Pinniped Supervisor session, to allow audit events to be correlated which relate to a single session even when they are caused by different requests or controllers. E.g. same sessionID across fresh login, token exchange, refreshes, and session garbage collection.
Each audit event also has more key/values specific to the event type (the event type is shown in the message).

Other changes compared to the original proposal doc:

I think we should enable audit logging by default. Otherwise, people might not learn about this new feature. However, this means that we would be putting usernames and group names into the pod logs (which might be considered PII) upon upgrade to this new version. To address this, we will provide a new configuration option available in the static configmaps to include or not include usernames and group names in the logs. By default, we will not include usernames and group names.
Seeing every request to the /healthz endpoint audited by default is overwhelming, so we will not audit this by default and we will create a configuration option to enable it for debugging if desired by the user.
I don't think we should try to detect “admin-like” RBAC. If we log the username and group names, then that should be sufficient.
The specific keys and format of the JSON log entries is different in some details from the original proposal. Because they are part of the regular pod logs, they should follow the standards already set in our pod logs (e.g. how timestamps are logged, use pre-existing message instead of proposed event, etc.).

Note that aggregated API endpoints, such as the Concierge's TokenCredentialRequest, already support Kubernetes audit logging. Additionally, they already have some trace logging using the k8s.io/utils/trace package, but those logs only appear in our pod logs if the user has configured the log level to be info or higher (not by default). If we add our own audit logging to these endpoints, then ideally the auditID would match that shown in the Kubernetes audit logs to allow for correlation between the pod logs and the audit logs. The approach taken in this PR works that way.

Here is an example Pinniped audit log event from a Concierge pod for a TokenCredentialRequest. Note that this has been piped through jq for pretty printing purposes.

{
  "level": "info",
  "timestamp": "2024-07-10T20:03:26.164470Z",
  "caller": "go.pinniped.dev/internal/registry/credentialrequest/rest.go:135$credentialrequest.(*REST).Create",
  "message": "TokenCredentialRequest",
  "auditID": "fdbd324a-0691-4425-9174-5279d799fb15",
  "auditEvent": true,
  "username": "ldap:[email protected]",
  "groups": [
    "ldap:ball-admins",
    "ldap:ball-game-players"
  ],
  "authenticated": true,
  "expires": "2024-07-10T20:08:26Z"
}

And here are the two audit log events from the Kubernetes API audit log for the same TokenCredentialRequest request, shown for when Kube API audit logging is configured to the metadata level. They show that this was an anonymous request, but they have no way of knowing how Pinniped resolved the identity of the user (which is shown by the Pinniped audit log above). Note that they have the same unique audit ID as above, even though they are not found in the same log file. Note that these have been piped through jq for pretty printing purposes.

{
  "kind": "Event",
  "apiVersion": "audit.k8s.io/v1",
  "level": "Metadata",
  "auditID": "fdbd324a-0691-4425-9174-5279d799fb15",
  "stage": "RequestReceived",
  "requestURI": "/apis/login.concierge.pinniped.dev/v1alpha1/tokencredentialrequests",
  "verb": "create",
  "user": {
    "username": "system:anonymous",
    "groups": [
      "system:unauthenticated"
    ]
  },
  "sourceIPs": [
    "172.18.0.1"
  ],
  "userAgent": "pinniped/v0.0.0 (darwin/arm64) kubernetes/$Format",
  "objectRef": {
    "resource": "tokencredentialrequests",
    "apiGroup": "login.concierge.pinniped.dev",
    "apiVersion": "v1alpha1"
  },
  "requestReceivedTimestamp": "2024-07-10T20:03:26.155574Z",
  "stageTimestamp": "2024-07-10T20:03:26.155574Z"
}

{
  "kind": "Event",
  "apiVersion": "audit.k8s.io/v1",
  "level": "Metadata",
  "auditID": "fdbd324a-0691-4425-9174-5279d799fb15",
  "stage": "ResponseComplete",
  "requestURI": "/apis/login.concierge.pinniped.dev/v1alpha1/tokencredentialrequests",
  "verb": "create",
  "user": {
    "username": "system:anonymous",
    "groups": [
      "system:unauthenticated"
    ]
  },
  "sourceIPs": [
    "172.18.0.1"
  ],
  "userAgent": "pinniped/v0.0.0 (darwin/arm64) kubernetes/$Format",
  "objectRef": {
    "resource": "tokencredentialrequests",
    "apiGroup": "login.concierge.pinniped.dev",
    "apiVersion": "v1alpha1"
  },
  "responseStatus": {
    "metadata": {},
    "code": 201
  },
  "requestReceivedTimestamp": "2024-07-10T20:03:26.155574Z",
  "stageTimestamp": "2024-07-10T20:03:26.165041Z",
  "annotations": {
    "authorization.k8s.io/decision": "allow",
    "authorization.k8s.io/reason": "RBAC: allowed by ClusterRoleBinding \"pinniped-concierge-pre-authn-apis\" of ClusterRole \"pinniped-concierge-pre-authn-apis\" to Group \"system:unauthenticated\""
  }
}

Note that for manual testing, Kubernetes API audit logs can be enabled in Kind clusters: https://kind.sigs.k8s.io/docs/user/auditing.

Here is an example snippet from a custom values.yaml for the Fluentbit Helm Chart showing how to route Pinniped audit logs to a different destination. This example is not for production use because the output destination is stdout. See comments below for more info.

  ## https://docs.fluentbit.io/manual/pipeline/filters
  filters: |
    # This filter is part of the default configuration from the Helm chart.
    [FILTER]
        Name kubernetes
        Match kube.*
        Merge_Log On
        Keep_Log Off
        K8S-Logging.Parser On
        K8S-Logging.Exclude On

    [FILTER]
        # Just to make this experiment's output easier to read, exclude any logs that came from the fluent-bit pods.
        # Otherwise we see each event multiple times because we are using stdout outputs below.
        # This would not be needed in a production setup where logs are sent elsewhere, not to stdout.
        Name grep
        Match kube.*
        Exclude $kubernetes['labels']['app.kubernetes.io/name'] ^fluent-bit$

    # A default fluent-bit configuration will capture logs from all pods, so it will capture Pinniped's
    # mixed pod logs that contain audit events and also other logs.
    # This example shows that fluent-bit could be configured to move the audit logs to a separate output
    # if desired. There are many ways to customize fluent-bit's logging pipeline, and it is quite flexible.
    # The example below removes Pinniped audit log lines from the exported Pinniped pods' logs
    # and captures those audit logs instead as one new output, mixing Supervisor and Concierge logs.
    # To determine which pods are Pinniped pods, this example looks at pod labels. Alternatively, it could
    # look at pod namespaces other other properties.
    # Although not shown here, it could alternatively be written to avoid removing audit logs records
    # from the exported pod logs, and could be written to separate Concierge audit logs from Supervisor
    # audit logs.

    [FILTER]
        # For all logs from all pods labelled like a Pinniped Supervisor pod,
        # or labelled like a Pinniped Concierge pod,
        # make a copy of the record at a new tag called pinniped-audit.
        # Keep the record in its original tag also.
        # At this point, the pinniped-audit tag will have all pod logs from
        # matching pods, not just audit logs. We will filter the audit logs below.
        # Multiple Rule statements act like an "or".
        # See https://docs.fluentbit.io/manual/pipeline/filters/rewrite-tag
        Name rewrite_tag
        Match kube.*
        Rule $kubernetes['labels']['app'] ^pinniped-supervisor$ pinniped-audit true
        Rule $kubernetes['labels']['app'] ^pinniped-concierge$ pinniped-audit true

    [FILTER]
        # For all log records that were copied into this new pinniped-audit tag,
        # only keep the events in the new tag that appear to be an audit log.
        # https://docs.fluentbit.io/manual/pipeline/filters/grep
        Name grep
        Match pinniped-audit
        Regex auditEvent ^true$

    [FILTER]
        # If desired, another grep filter can remove the Pinniped Supervisor audit logs from the original tag,
        # so they only appear in events of the new pinniped-audit tag, rather than appearing in both places.
        # See https://docs.fluentbit.io/manual/pipeline/filters/grep
        Name grep
        Match kube.*
        Logical_Op and
        Exclude $kubernetes['labels']['app'] ^pinniped-supervisor$
        Exclude auditEvent ^true$

    [FILTER]
        # If desired, another grep filter can remove the Pinniped Concierge audit logs from the original tag,
        # so they only appear in events of the new pinniped-audit tag, rather than appearing in both places.
        # See https://docs.fluentbit.io/manual/pipeline/filters/grep
        Name grep
        Match kube.*
        Logical_Op and
        Exclude $kubernetes['labels']['app'] ^pinniped-concierge$
        Exclude auditEvent ^true$

  ## https://docs.fluentbit.io/manual/pipeline/outputs
  outputs: |
    #[OUTPUT]
    #    Name stdout
    #    Match kube.*

    #[OUTPUT]
    #    Name stdout
    #    Match host.*

    [OUTPUT]
        # By the time a log record gets to the output, it was already filtered.
        # Any event with this tag is only a Pinniped audit log event, and we can
        # output it to anywhere we prefer.
        Name stdout
        Match pinniped-audit
        Format json_lines

Release note:

Audit logging will be a user-facing feature for admin users and needs release notes and documentation.

TBD

internal/federationdomain/requestlogger/request_logger.go

codecov · 2024-07-09T21:20:18Z

Codecov Report

Attention: Patch coverage is 73.59667% with 254 lines in your changes missing coverage. Please review.

Project coverage is 31.76%. Comparing base (e86f3cc) to head (e5d12db).

Files with missing lines	Patch %	Lines
internal/testutil/log_lines.go	0.00%	80 Missing ⚠️
...l/federationdomain/requestlogger/request_logger.go	61.00%	39 Missing ⚠️
...tiondomain/downstreamsession/downstream_session.go	0.00%	37 Missing ⚠️
internal/concierge/server/server.go	0.00%	23 Missing ⚠️
...nal/mocks/mockresponsewriter/mockresponsewriter.go	48.27%	15 Missing ⚠️
internal/plog/plog.go	91.61%	11 Missing and 2 partials ⚠️
internal/supervisor/server/server.go	0.00%	12 Missing ⚠️
.../controller/supervisorstorage/garbage_collector.go	83.60%	8 Missing and 2 partials ⚠️
internal/concierge/apiserver/apiserver.go	0.00%	7 Missing ⚠️
internal/config/supervisor/types.go	0.00%	4 Missing ⚠️
... and 10 more

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2009      +/-   ##
==========================================
+ Coverage   31.28%   31.76%   +0.48%     
==========================================
  Files         371      378       +7     
  Lines       61131    61896     +765     
==========================================
+ Hits        19124    19661     +537     
- Misses      41482    41705     +223     
- Partials      525      530       +5

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

joshuatcasey · 2024-08-01T15:02:35Z

I wonder if we could add the trace_id and span_id from OpenTelemetry into the audit logs. These should be available when handling a request from the APIServer (such as for an aggregate API) or be generated internally to Pinniped when not provided via HTTP headers.

joshuatcasey · 2024-08-01T15:04:01Z

I see a lot of refactoring, is there any possibility we could strictly separate refactoring and auditing? I would love to see either a commit or a PR that just added auditLogger.Audit lines.

joshuatcasey · 2024-08-09T16:10:58Z

I wonder if we could add Audit to the plog.Logger interface, so that we don't have to create new loggers and pass them around. This might make it somewhat easier to unit-test the logs with the existing unit test mechanism.

Nevermind, I see that is what happens. It was just confusing to me to see auditLogger.Audit instead of logger.Audit everywhere. I wonder if we should generally keep those separate?

netlify · 2024-10-25T19:34:17Z

✅ Deploy Preview for pinniped-dev canceled.

Name	Link
🔨 Latest commit	`e5d12db`
🔍 Latest deploy log	https://app.netlify.com/sites/pinniped-dev/deploys/673aba867e304e0007db8ad4

internal/federationdomain/endpoints/auth/auth_handler.go

internal/federationdomain/endpoints/auth/auth_handler_test.go

site/content/docs/reference/audit-logging.md

internal/federationdomain/endpoints/auth/auth_handler.go

internal/federationdomain/endpointsmanager/manager.go

internal/federationdomain/requestlogger/request_logger.go

internal/plog/plog.go

Co-authored-by: Joshua Casey <[email protected]>

…audit event 'Using Upstream IDP' to callback_handler

Co-authored-by: Joshua Casey <[email protected]>

vmwclabot added the cla-not-required label Jul 5, 2024

github-advanced-security bot found potential problems Jul 5, 2024

View reviewed changes

internal/federationdomain/requestlogger/request_logger.go Dismissed Show dismissed Hide dismissed

cfryanr force-pushed the audit_logging branch 3 times, most recently from 467453d to 96a35c6 Compare July 9, 2024 21:12

cfryanr force-pushed the audit_logging branch from 96a35c6 to c5bb5c3 Compare July 10, 2024 17:17

cfryanr marked this pull request as draft July 10, 2024 18:05

cfryanr force-pushed the audit_logging branch from c5bb5c3 to b2873cf Compare July 10, 2024 20:30

joshuatcasey force-pushed the audit_logging branch from b2873cf to ca1ca48 Compare August 8, 2024 15:48

cfryanr force-pushed the audit_logging branch from ca1ca48 to b8c1636 Compare August 12, 2024 16:29

joshuatcasey force-pushed the audit_logging branch from b8c1636 to cfe9698 Compare October 25, 2024 19:34

joshuatcasey reviewed Oct 28, 2024

View reviewed changes

internal/federationdomain/endpoints/auth/auth_handler.go Outdated Show resolved Hide resolved

joshuatcasey force-pushed the audit_logging branch from cfe9698 to 6958841 Compare October 29, 2024 21:54

joshuatcasey reviewed Oct 29, 2024

View reviewed changes

internal/federationdomain/endpoints/auth/auth_handler.go Outdated Show resolved Hide resolved

joshuatcasey reviewed Oct 29, 2024

View reviewed changes

internal/federationdomain/endpoints/auth/auth_handler_test.go Outdated Show resolved Hide resolved

joshuatcasey force-pushed the audit_logging branch 2 times, most recently from c9b96bf to 9558213 Compare November 4, 2024 15:30