Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audit request bodies for manage-security actions #62740

Conversation

albertzaharovits
Copy link
Contributor

This is the initial step for auditing changes to the security configuration (aka auditing ES Security's lifecycle).
It concerns security configuration that sits in the .security index (but not in the .security-tokens index), which is only a part of the entire security configuration.

The proposed PR takes the generic approach to audit transport request bodies for actions that change/add data to indices,
but it is initially confined by a strategy to only audit the request bodies of security actions (to serve the immediate goal of security auditing).
Yet the approach is generic so as to accommodate other request auditing strategies.

In general, when considering other transport request auditing strategies, the following aspects must be considered:

  • request bodies are specific to a particular action
  • actions can generate other child actions
  • generated child actions can reach other nodes in the cluster
  • only the parent action is authenticated
  • all actions are authorised
  • request bodies of child actions generally provide more context/details compared to the parent action's
  • request bodies of child actions generally duplicate context/details available to the parent action's

After intense rummaging I've settled on adding a new field to audit records instead of a separate record for the request body.
I think that the extra effort, for the user as well as for us the devs, to introduce a new audit stream for request bodies is not worth the benefits.
Case in point:

{"type":"audit", "timestamp":"2020-09-21T13:48:00,174+0300", "node.id":"NOyG-V5_SQG_w1Lz7PSBPw", "event.type":"transport", "event.action":"access_granted", "user.name":"elastic", "user.realm":"reserved", "user.roles":["superuser"], "origin.type":"rest", "authentication.type":"REALM", "origin.address":"[::1]:51640", "request.id":"PBc_vyWCSMmO7y29aEl3IA", "action":"cluster:admin/xpack/security/user/delete", "request.name":"DeleteUserRequest"}
{"type":"audit", "timestamp":"2020-09-21T13:48:15,019+0300", "node.id":"NOyG-V5_SQG_w1Lz7PSBPw", "origin.type":"local_node", "origin.address":"127.0.0.1:9300", "request.body":"{\"username\":\"jacknich\"}", "request.id":"PBc_vyWCSMmO7y29aEl3IA", "action":"cluster:admin/xpack/security/user/delete"}

There is redundant metadata that is required/useful to correlate the request body to the action and the rest of the metadata of the action. Contrast that with this:

{"type":"audit", "timestamp":"2020-09-22T01:57:09,745+0300", "node.id":"NiOfQpAcRlC3TptTrsVrLQ", "event.type":"transport", "event.action":"access_granted", "user.name":"elastic", "user.realm":"reserved", "user.roles":["superuser"], "origin.type":"rest", "authentication.type":"REALM", "origin.address":"[::1]:53168", "request.body":"{\"username\":\"jacknich\"}", "request.id":"fC0iCKiEQPWRo2nZK19VGw", "action":"cluster:admin/xpack/security/user/delete", "request.name":"DeleteUserRequest"}

Where the request body is simply another record field to the "access_granted" event.
When the "access_granted" field is not configured to be included, the above becomes:

{"type":"audit", "timestamp":"2020-09-22T02:02:04,704+0300", "node.id":"heJG0DQNR7Ga98WTiq8sKw", "origin.type":"local_node", "origin.address":"127.0.0.1:9300", "request.body":"{\"username\":\"jacknich\"}", "request.id":"EcWRaDyJSGuIHf7VbpbAHQ", "action":"cluster:admin/xpack/security/user/delete"}

This is a WIP PR, I still need to work out the actual serialisation of security requests, as well as tests in general.

Copy link
Member

@ywangd ywangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few code level details that we could discuss, e.g. subclassing LoggingAuditTrail vs a new AuditTrail in parallel with it. But I think the more important aspect right now is the overall approach. So I am going to comment on just that.

Overall, I am not sure whether we are trying too hard to fit the new auditing message into the existing structure. I think it might be better if we treated them separately. I guess the decision here should be, to some extend, drived by the long term vision of "security configuration auditing".

My understanding is that our current auditing offering is more of an auditing of traffic. It happens to be built upon the authentication/authorization events. But I think that was just an artifact that security happens to be the choking point of all requests. The current auditing is very generic and bears no business meaning. It really tries to audit everything with a few gaps:

  1. It does not log request body at transport layer
  2. It does not log responses

We are trying to solve item 1 here. If we are after a generic solution, we could just make every transportRequest serialisable. But it is problematic to do so and has low value because what we are really interested in is "security configuration changes". I think this limitation actually has a clear business meaning and feels like a standalone feature. I can see it has its own value even without the exisitng auditing offering. Hence I am leaning towards to have it handled separately. It could become hard to explain and/or easy to be confusing if tries to fit into current log messages:

  • We already have a emit_request_body setting and by default it is disabled. But I don't think we want this flag to control the new request body logged for the security changes. But this could be confusing.
  • If user turns off access_granted, we will not show access_granted event, but the transport request body that is part of it still shows up. So it becomes "partially off", which is also confusing.
  • There could be situation where services are directly invoked which does not go through auth flow? We also talked about having a lower level catch-all safety net to catch any requests that are not logged at higher leve. For these logs, there will be no access_xxx event to attach on.
  • There might be value for users to enable audit just for "security configuration changes" or vice versa. There is no explicit way for them to do that if everything is built into existing events.

In summary, I wonder whether we should make auditing "security configuration changes" a prominent feature on its own (i.e. long term vision). Depending on the answer, we could then decide how much of it should be retrofit into the existing structure.

Comment on lines +978 to +981
void logIfModified() {
if (this.modified) {
logger.info(AUDIT_MARKER, logEntry);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this modified check strictly necessary for this PR? Or it is a general improvement for skipping empty entries? The repetitions of this.modified = true is somewhat distracting. I wonder whether it is sufficient to just check whether number of entries have changed (increased) in the logEntry map.

@ywangd
Copy link
Member

ywangd commented Sep 23, 2020

@albertzaharovits This might be a crazy question, but if the intention is the audit security configuration "changes", do we really need to audit those "denied" events? After all, they have no impact on the security index. Is it possible to drop them for simplicity? I think this is somewhat consistent with your original idea of autiting at the DocWrite level, where only "granted" actions are audited.

@albertzaharovits
Copy link
Contributor Author

@ywangd That's not a crazy idea.

The scope of the project of "auditing security configuration changes" implies or suggests that auditing is for objects (eg roles, users, role mappings, etc, that has changed by some user/key) , whereas our existing audit logs are subject based (eg user X modified/deleted role, user, key). To a certain extent you can derive one from the other, to answer different questions, but the devil is in the details.

Referring specifically to your proposal, I think that if we were going to implement a true object based audit log, it doesn't make sense to audit change failures, at the very least not by default. But if we build on top of the subject based audit log that we've got (as a new field on existing records), then the failed actions become important (to the extent that the default configuration is now on from off).

I acknowledge the desire for an object based audit log, but I got turned off upon seeing both the subject (all metadata of the action, less the request body) and the object based entries one next to the other; the redundant metadata between the two was glaring and looked unnecessary.

For the moment, I am going to stick with auditing request bodies of access_granted and access_denied events in the existing audit log, within the request.body field. I'll try to make this fit into existing configurations of the audit log (avoid hinting to object based auditing).

I have some open questions about what it means to output index documents in a log-like file for ECS.

Hope this rambling make sense. Thank you for the valuable input!

@albertzaharovits
Copy link
Contributor Author

@ywangd Upon further thought, I think reusing the request.body field is confusing. Hence I'm going to implement it closer to your "crazy idea" to audit only successful changes, but as separate audit records.

I don't like reusing the request.body field because:

  • it is harder to explain which events now contain this field (it is not only records in relation to REST, but also "some" transport actions)
  • there is annoying redundancy if both the REST bodies and the security changes are audited simultaneously
  • it is unintuitive to toggle request body printing for some records but not for others

Conversely, if the security changes are audited in separate records, there's some unwanted overhead due to the inclusion of the common fields that all records in the stream must contain. In addition, the sometimes deeply nested nature of the configuration document (eg. roles with application privileges, and index privileges with DLS), makes it impossible to use the same flat, CSV-like, structure of the other audit records, so they'll necessary by nested (but still one per line).

@albertzaharovits
Copy link
Contributor Author

Closed in favour of #62916 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants