Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: create a new "admin audit log" #59356

Closed
wants to merge 1 commit into from

Conversation

knz
Copy link
Contributor

@knz knz commented Jan 23, 2021

Fixes #58334.

This feature was requested by customers: "we want a trace of SQL
queries run by admin users".

A naive implementation of this request would memorize that a user has
the admin role at the moment the session was opened, and
subsequently log every statement performed within that session. This
approach would be relatively simple.

However, it has two shortcomings:

  • if the user is granted the admin role after they are logged in,
    this would not work and fail to report some admin queries.
  • if the admin role is revoked after they are logged in, it would
    over-report non-admin queries as if they were admin queries.

To overcome these shortcomings, this PR instead tracks when the admin
privilege is effectively used, i.e. the fact the user was an admin
is used to bypass some other, more natural check.

It then logs the query to the "SENSITIVE_ACCESS" channel after it has
executed.

This ensures events are produced for most "interesting" cases:

  • configuration changes, that have special case for admin
  • statements that check role options and auto-grant all options to admin
  • statements that bypass other regular privilege checks because the user is
    in the admin role.
  • statements that are able to perform an operation because privilege
    was explicitly granted to the admin role and the user belong to it.

The resulting feature is very close to what the customer requests,
but just so slightly different: in case where an admin user performs
a SQL query that it legitimately can perfom based on its own
privileges, and not that of the admin role, that query would not be
reported. Here an example where the feature would not report an event:

$ cockroach sql --user myuser  # NB: 'myuser' has admin role

CREATE DATABASE mydb; -- event reported: admin bypass for option CREATEDB

DROP DATABASE mydb; -- no event: myuser is the actual owner of mydb, no admin needed

Release note (security update): It is now possible to track all SQL
statements executed by users that have the admin role, when
the admin privileges are effectively used -- i.e. when the user
would not have otherwise access to the statement if they were not
an admin member. This can be achieved via the cluter setting
sql.log.admin_audit.enabled; this emits structured events
of type admin_query to the logging channel SENSITIVE_ACCESS.

This feature was requested by customers: "we want a trace of SQL
queries run by admin users".

A naive implementation of this request would memorize that a user has
the `admin` role at the moment the session was opened, and
subsequently log every statement performed within that session. This
approach would be relatively simple.

However, it has two shortcomings:
- if the user is granted the `admin` role after they are logged in,
  this would not work and fail to report some admin queries.
- if the `admin` role is revoked after they are logged in, it would
  over-report non-admin queries as if they were admin queries.

To overcome these shortcomings, this PR instead tracks when the admin
privilege is *effectively used*, i.e. the fact the user was an admin
is used to bypass some other, more natural check.

It then logs the query to the "SENSITIVE_ACCESS" channel after it has
executed.

This ensures events are produced for most "interesting" cases:

- configuration changes, that have special case for `admin`
- statements that check role options and auto-grant all options to `admin`
- statements that bypass other regular privilege checks because the user is
  in the `admin` role.
- statements that are able to perform an operation because privilege
  was explicitly granted to the `admin` role and the user belong to it.

The resulting feature is very close to what the customer requests,
but just so slightly different: in case where an admin user performs
a SQL query that it legitimately can perfom based on its own
privileges, and not that of the `admin` role, that query would not be
reported. Here an example where the feature would *not* report an event:

```
$ cockroach sql --user myuser  # NB: 'myuser' has admin role

CREATE DATABASE mydb; -- event reported: admin bypass for option CREATEDB

DROP DATABASE mydb; -- no event: myuser is the actual owner of mydb, no admin needed
```

Release note (security update): It is now possible to track all SQL
statements executed by users that have the `admin` role, when
the `admin` privileges are *effectively* used -- i.e. when the user
would not have otherwise access to the statement if they were not
an `admin` member. This can be achieved via the cluter setting
`sql.log.admin_audit.enabled`; this emits structured events
of type `admin_query` to the logging channel SENSITIVE_ACCESS.
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@knz
Copy link
Contributor Author

knz commented Jan 23, 2021

Hey @thtruo @piyush-singh, and @solongordon @ajwerner , we have a little situation on our hands.

To start, I invite you to read the PR description at the top. The context is explained in the linked issue.

The reason why we need to discuss is that we have a design trade-off:

  1. the "simple" approach, described above and (currently) not used, is incomplete, and risks under-reporting;
  2. the "precise" approach taken in this PR, which reports exactly, but may surprise the customer who requested the feature;
  3. the "coarse" approach where we would re-check whether the user currently belongs to the admin role, anew upon every statement executed. I did not consider this approach because I believe it would incur an unacceptable performance overhead.

There are two questions that need answering:

  • from a PM perspective: what does the customer truly expect? Maybe the phrasing of the original issue was vague. Would they be satisfied with the approach proposed here, or do they want something else?

  • from the engineering perspective: is my assessment of design point (3) above accurate? That the overhead is unacceptable?

    • If so, are we comfortable maintaining the infrastructure defined in this PR, which requires every privilege check inside the SQL layer to go through the API in authorization.go? If not, is there a better approach?

@knz
Copy link
Contributor Author

knz commented Jan 25, 2021

cc @aaron-crl as you might have opinions too

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. the "coarse" approach where we would re-check whether the user currently belongs to the admin role, anew upon every statement executed. I did not consider this approach because I believe it would incur an unacceptable performance overhead.

Isn't this cheap-ish? We cache the membership in memory for each user as a mapping. In practice this should be a mutex lock and a hash map lookup which I suspect we are already doing for each table involved in each query. In short, I'm surprised to hear this is seen as expensive.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained

@knz
Copy link
Contributor Author

knz commented Jan 27, 2021

@ajwerner Isn't this cheap-ish?

I don't know man, it doesn't seem like it. The main problem is that we can't use the "current txn" to determine the membership -- the txn may be in error / aborted at the time the logging occurs.

Even if we changed the code to pull the admin bool at the beginning of each statement, not at the end, we'd still have the same problem: if a statement is issued while the txn is aborted (e.g. ROLLBACK), and our requirement is "log every admin statement", we need a way to determine for that ROLLBACK whether the user is an admin or not.

That would suggest creating a sub-txn for every statement to pass to the HasAdminRole() function. Does that sound reasonable?

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if we changed the code to pull the admin bool at the beginning of each statement, not at the end, we'd still have the same problem: if a statement is issued while the txn is aborted (e.g. ROLLBACK), and our requirement is "log every admin statement", we need a way to determine for that ROLLBACK whether the user is an admin or not.

Given the leasing protocol, it seems totally valid to query the admin bool only in the first statement of any given transaction and then maintain it for the duration of that transaction even if it enters the rollback state. If you want to be pedantic, you could imagine resetting that value in the case you revoke the admin role from yourself, or more generally, perhaps on any change to the role membership.

All in all, that doesn't seem to bad. We legitimately call HasAdminRole for every table for every statement during planning if I read the code correctly.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained

@knz
Copy link
Contributor Author

knz commented Jan 27, 2021

it seems totally valid to query the admin bool only in the first statement of any given transaction

hm ok I like that, I'll try that next.

@aaron-crl
Copy link

For what it's worth, from a security log consumption perspective I would expect a "log of all admin" activities to contain all the actions my admins conducted during their a privileged session (including those that did not require administrative permissions).

This keeps with the spirit of auditing and avoids the trivial case where a query may be constructed in a way that masks its parameters though non-admin actions but is then executed with a single opaque admin call.

This is why in a unix environment we tend to collect the command history for users in sudo in addition to explicit sudo calls. Can they still potentially circumvent this auditing: yes, but it starts to look very irregular quickly.

@ajwerner
Copy link
Contributor

ajwerner commented Feb 8, 2021

For what it's worth, from a security log consumption perspective I would expect a "log of all admin" activities to contain all the actions my admins conducted during their a privileged session (including those that did not require administrative permissions).

This keeps with the spirit of auditing and avoids the trivial case where a query may be constructed in a way that masks its parameters though non-admin actions but is then executed with a single opaque admin call.

This is why in a unix environment we tend to collect the command history for users in sudo in addition to explicit sudo calls. Can they still potentially circumvent this auditing: yes, but it starts to look very irregular quickly.

Nice. Seems like we've converged on the idea that we want to audit all operations from all transactions where the user has access to the admin role.

@knz
Copy link
Contributor Author

knz commented Feb 9, 2021

Ok thanks for explaining.
The new issue is that I am finding myself properly unable to work on this throughout this milestone due to other/higher priorities. We need to find someone to do the work if it's made to complete for v21.1.

@thtruo can you manage this please? This may involve you talking to someone on the new SQL Queries team.

@knz
Copy link
Contributor Author

knz commented Feb 9, 2021

@solongordon @RichardJCai as you can see in the discussion above, my PR here was not taking the right approach but @ajwerner made a concrete counter-proposal.

I am going to close this PR now, but would be grateful if you could follow up. Cheers

@RichardJCai
Copy link
Contributor

All in all, that doesn't seem to bad. We legitimately call HasAdminRole for every table for every statement during planning if I read the code correctly.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained

Hey, where do we call HasAdminRole for every table for every statement? I think I'm missing this.

@ajwerner
Copy link
Contributor

All in all, that doesn't seem to bad. We legitimately call HasAdminRole for every table for every statement during planning if I read the code correctly.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained

Hey, where do we call HasAdminRole for every table for every statement? I think I'm missing this.

Well, every time we check privileges on a descriptor we do the role expansion:

hasPriv, err := p.checkRolePredicate(ctx, user, func(role security.SQLUsername) bool {

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql: extend SQL audit logging to "operations run by admin users"
5 participants