Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rfcs: new RFC on finer-grained role options #51646

Merged
merged 1 commit into from
Sep 23, 2021

Conversation

solongordon
Copy link
Contributor

Release note: None

@solongordon solongordon requested a review from a team as a code owner July 21, 2020 16:13
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@solongordon
Copy link
Contributor Author

This is still a WIP since I need to hammer out a few more details around RESTORE/IMPORT but it should be far enough along for interested folks to start commenting.

This was largely based on an earlier document by @DuskEagle. For reference, these are some of the larger changes I made:

  • Added CHANGEFEED role option
  • Specified that we will change the CREATEROLE option to disallow non-admin users from dropping admins
  • Specified that only admins can change cluster settings. (This might be controversial but wanted to put a stake down here.)
  • Specified that there will not be an option which guarantees that a non-admin "operator" has privileges on all database objects. This needs to be managed via an admin user if desired.

Copy link
Contributor Author

@solongordon solongordon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @BramGruneir, @darinpp, @DuskEagle, @knz, and @thtruo)


docs/RFCS/20200720_finer_grained_role_privileges.md, line 31 at r1 (raw file):

These new options will also be useful for customers who want to grant users
administrative abilities without managing the security of their cluster by
limiting the number of people with true admin access.

@darinpp Perhaps we can add something about multi-tenant here too? I know you need the CREATEDB privilege but not sure what else.

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for bringing the convo over into RFC. See my comments below.

Reviewed 1 of 1 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @BramGruneir, @darinpp, @DuskEagle, @solongordon, and @thtruo)


docs/RFCS/20200720_finer_grained_role_privileges.md, line 59 at r1 (raw file):

Note that Raphael already has a PR for this:
https://github.com/cockroachdb/cockroach/pull/50601

Could you import the last few comments by Ben into the RFC: #50601 (comment)

The question here is how to evolve an existing cluster in a way that's not too unpleasant wrt UX.


docs/RFCS/20200720_finer_grained_role_privileges.md, line 64 at r1 (raw file):

As noted above, we already support this option. However, it currently does not
prevent a non-admin user from dropping an admin. We will add this restriction
so that an "operator" cannot drop essential admin users.

nit: change "operator" to "a user with this privilege".

(We have muddled waters around the word "operator" - no need to bring that mud here)


docs/RFCS/20200720_finer_grained_role_privileges.md, line 81 at r1 (raw file):

### CHANGEFEED
Allows users to run `CREATE CHANGEFEED` on tables they have `SELECT` privileges

Are you sure that "changefeed" should be a role option, and not a privilege that can be set on databases or tables?


docs/RFCS/20200720_finer_grained_role_privileges.md, line 89 at r1 (raw file):

users. Probably this should be determined by whether the user has the
appropriate write access to the database or tables in question but we need to
nail down the details.

There's a nuance that's maybe worth adding to the text: RESTORE/IMPORT and the other bulk i/o operations also need a privilege to access the nodelocal directory. This should (probably) be distinct too. Ask David for more input.


docs/RFCS/20200720_finer_grained_role_privileges.md, line 102 at r1 (raw file):

## Automatic grants
The new privileges proposed here do not include a way to guarantee that a role
has privileges on all objects in the database. This remains a unique property

I would like to re-launch the discussion on this.

  1. Can we not extend the privilege model to enable assigning SELECT/UDPATE etc at the level of a database, so that it is inherited by all objects in the schemas underneath?

  2. we can also revive the discussion on ownership, and say that an owner on the database also has all privs on the objects underneath.

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding cluster settings: I would like to propose to draw a different line, so that SQL end-users can configure e.g. sql.defaults.default_int_size -- i.e. logical / application-level settings.

Today cluster settings only have a public/reserved distinction. This distinction is about drawing a line about what we support / not-support.

I would recommend in this RFC to add another attribute on cluster settings: "app-level" vs "cluster-level".
Then make "app-level" settings also settable using a new role option. Cluster-level remain admin-only.

Here's the list of app-level settings (cursory scan):

sql.defaults.*
sql.notices.enabled
sql.stats.automatic_collection.*
sql.stats.histogram_collection.*
sql.stats.multi_column_collection.*
sql.trace.session_eventlog.enabled

Debatable, depending on further checking if a user can retrieve the outcomes:

sql.log.slow_query.latency_threshold
sql.metrics.statement_details.*
sql.metrics.transaction_details.enabled
sql.trace.log_statement_execute
sql.trace.txn.enable_threshold

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @BramGruneir, @darinpp, @DuskEagle, @solongordon, and @thtruo)

Copy link
Contributor Author

@solongordon solongordon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to ponder your suggestion about cluster settings, which seems totally reasonable. I'm assuming you also propose adding another role option to control access to the application-level settings?

A few things I'm considering:

  • Baking in the definition of what an app-level privilege is means we need to release a new binary to add or remove a privilege from this list. If we instead expose these settings via the CockroachCloud dashboard, we only need to release a new version of the dashboard to make such changes.
  • Calling these settings "app-level" feels a little misleading. They will affect every connection to the cluster, regardless of app name, role, database, etc. Maybe that's bikeshedding but it doesn't feel unreasonable to me to enforce that only an admin can make such wide-reaching changes. (Sidenote: Postgres allows database owners to modify session defaults at the database level. Seems like something we should consider in the future.)
  • Different customers may have different opinions on which cluster-wide settings a non-admin user should be permitted to change.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @BramGruneir, @darinpp, @DuskEagle, @knz, @solongordon, and @thtruo)


docs/RFCS/20200720_finer_grained_role_privileges.md, line 81 at r1 (raw file):

Previously, knz (kena) wrote…

Are you sure that "changefeed" should be a role option, and not a privilege that can be set on databases or tables?

I'm not sure. It feels more natural for it to be role-level to me for a few reasons:

  • I'm having trouble thinking of a realistic scenario where you would want a user to have the ability to create changefeeds on only certain tables they have read access to.
  • When a user creates a table, they get all the privileges on it. I'm not creating a table should automatically confer the ability to create a changefeed on it.

docs/RFCS/20200720_finer_grained_role_privileges.md, line 102 at r1 (raw file):

Previously, knz (kena) wrote…

I would like to re-launch the discussion on this.

  1. Can we not extend the privilege model to enable assigning SELECT/UDPATE etc at the level of a database, so that it is inherited by all objects in the schemas underneath?

  2. we can also revive the discussion on ownership, and say that an owner on the database also has all privs on the objects underneath.

I tried and failed to come up with a reasonable way to extend our privilege model to support this. In our current model, granting a privilege on a database does not guarantee inheritance on all objects underneath. It only acts as a default for newly created objects, so there is nothing to prevent later revoking those privileges.

Another complication is that we don't actually want to grant privileges on all objects. There may be databases this role should not have access to, like the internal tables used by CockroachCloud.

We are currently investigating adding owner support but I'm not sure how it would help here. We can't necessarily enforce that this role is the owner for all the databases in the cluster.

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Baking in the definition of what an app-level privilege is means we need to release a new binary to add or remove a privilege from this list. If we instead expose these settings via the CockroachCloud dashboard, we only need to release a new version of the dashboard to make such changes.

Or we can make a system table contain the list. There are multiple solutions. I agree it would be useful to change this dynamically without code upgrades.

Calling these settings "app-level" feels a little misleading. They will affect every connection to the cluster, regardless of app name, role, database, etc. Maybe that's bikeshedding but it doesn't feel unreasonable to me to enforce that only an admin can make such wide-reaching changes.

I made a mistake by introducing the word "app-level". I really meant "tenant-level". This is extremely easy to delineate properly as soon as you consider the multi-tenant set up. A cluster setting can be safely open to customers when the following conditions hold:

  • it does not impact stability, security or correctness of other tenants on the cluster.
  • the performance impact, if any, is well-understood in our docs and we are equipped on our side to monitor perf impact.

(Note something interesting about our multi-tenant story: today we do not implement different settings per tenant. So everything we're discussing here is not yet relevant to multi-tenant clusters, where today we'd say "no setting changes possible whatsoever by tenants". However, I find the multi-tenant is useful to draw a line between the two types of cluster settings.)

You'll see the list I provided above is trivially derived only from these two rules.

Different customers may have different opinions on which cluster-wide settings a non-admin user should be permitted to change.

That's another dimension I hadn't considered. Maybe the right way to move forward here is to not draw a line and introduce a privilege, and instead work with an allow-list in a system table.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @BramGruneir, @darinpp, @DuskEagle, @knz, @solongordon, and @thtruo)


docs/RFCS/20200720_finer_grained_role_privileges.md, line 81 at r1 (raw file):

Previously, solongordon (Solon) wrote…

I'm not sure. It feels more natural for it to be role-level to me for a few reasons:

  • I'm having trouble thinking of a realistic scenario where you would want a user to have the ability to create changefeeds on only certain tables they have read access to.
  • When a user creates a table, they get all the privileges on it. I'm not creating a table should automatically confer the ability to create a changefeed on it.

👍

(maybe explain that in "alternatives" at the end)


docs/RFCS/20200720_finer_grained_role_privileges.md, line 102 at r1 (raw file):
You are right, the owner concept is orthogonal and does not help here.

Let's come back to this:

In our current model, granting a privilege on a database does not guarantee inheritance on all objects underneath. It only acts as a default for newly created objects, so there is nothing to prevent later revoking those privileges.

Ok. Point taken.

we don't actually want to grant privileges on all objects. There may be databases this role should not have access to, like the internal tables used by CockroachCloud.

I think it would be good to call out the use case in the RFC as one of the problems to solve. This would help with two things:

  • it will help call out intuitive solutions as not adequate. For example:

    • I could have suggested to create another field in database descriptors, to store inheritable privileges. This would be a new thing, but would "guarantee inheritance on objects underneath". Then you'd say it is not satisfying because the user can later revoke this. (However see note 1 below)

    • I could have suggested to create a cluster-level set of privileges, inherited by all objects underneath. This would "guarantee inheritance on objects underneath" and also "prevent the user from revoking them". Then you'd say this is not satisfying because it overly grants privileges to the CC management db. (However see note 2 below)

  • it will help channel the discussion on solutions. For example:

    • note 1 on the above: we could make the separate field for inheritable privileges immutable, or only mutable with another privilege than CREATEDB or the owner. So the user cannot revoke that themselves. This line of thought would be facilitated later by also implementing pg-style templating (template0 and template1 and the corresponding SQL syntax) so that CC assigns the thing to the template and all user-defined dbs get them automatically.

    • note 2 on the above: we are going to have multi-tenancy soon. It is my expectation that CC management DB could be stored on a logical DB in a different tenant ID than that used for the customer's data. If/when the customer's tenant only contains customer data, having a cluster-level set of default SQL privs makes more sense.

What do you say?

@DuskEagle
Copy link
Member

It is my expectation that CC management DB could be stored on a logical DB in a different tenant ID than that used for the customer's data. If/when the customer's tenant only contains customer data, having a cluster-level set of default SQL privs makes more sense.

This is an interesting idea. If we make all dedicated CC clusters use a non-system tenant, than we can create internal SQL users or databases for backup and the like on the system tenant, and not worry about a customer dropping these users/databases inadvertently.

That said, I think the multi-tenant support that is going into 20.2 is too new to use on all dedicated clusters in CC. For one, non-system tenants will have limited-to-no access to the AdminUI or other debug endpoints, which would be a regression for dedicated clusters. It is an interesting idea though, and one I would be interested in exploring further.

@thtruo
Copy link
Contributor

thtruo commented Jul 27, 2020

cc @aaron-crl for awareness

Copy link
Contributor Author

@solongordon solongordon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of specifying "tenant-level" settings in a system table. One complication is what to do when new tenant-level settings are added in future Cockroach versions. Do we write a migration to add them to that table? That seems cumbersome and might not be what a customer wants if they're trying to be restrictive about permissions. But if we don't do so, it's a bit of a pain to make users modify this table themselves every time they upgrade to grant access to new settings (and CC would have to do this every time they upgrade a cluster.)

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @BramGruneir, @darinpp, @dt, @DuskEagle, @knz, and @thtruo)


docs/RFCS/20200720_finer_grained_role_privileges.md, line 59 at r1 (raw file):

Previously, knz (kena) wrote…

Could you import the last few comments by Ben into the RFC: #50601 (comment)

The question here is how to evolve an existing cluster in a way that's not too unpleasant wrt UX.

Have you decided on a solution here? Is the plan to write a migration to grant CREATELOGIN to any role who currently have CREATEROLE?


docs/RFCS/20200720_finer_grained_role_privileges.md, line 81 at r1 (raw file):

Previously, knz (kena) wrote…

👍

(maybe explain that in "alternatives" at the end)

Done.


docs/RFCS/20200720_finer_grained_role_privileges.md, line 89 at r1 (raw file):

Previously, knz (kena) wrote…

There's a nuance that's maybe worth adding to the text: RESTORE/IMPORT and the other bulk i/o operations also need a privilege to access the nodelocal directory. This should (probably) be distinct too. Ask David for more input.

Thanks for pointing this out. Per @dt this is an issue not only for nodelocal but also for implicit auth. I'll address this in the next revision.


docs/RFCS/20200720_finer_grained_role_privileges.md, line 102 at r1 (raw file):

Previously, knz (kena) wrote…

You are right, the owner concept is orthogonal and does not help here.

Let's come back to this:

In our current model, granting a privilege on a database does not guarantee inheritance on all objects underneath. It only acts as a default for newly created objects, so there is nothing to prevent later revoking those privileges.

Ok. Point taken.

we don't actually want to grant privileges on all objects. There may be databases this role should not have access to, like the internal tables used by CockroachCloud.

I think it would be good to call out the use case in the RFC as one of the problems to solve. This would help with two things:

  • it will help call out intuitive solutions as not adequate. For example:

    • I could have suggested to create another field in database descriptors, to store inheritable privileges. This would be a new thing, but would "guarantee inheritance on objects underneath". Then you'd say it is not satisfying because the user can later revoke this. (However see note 1 below)

    • I could have suggested to create a cluster-level set of privileges, inherited by all objects underneath. This would "guarantee inheritance on objects underneath" and also "prevent the user from revoking them". Then you'd say this is not satisfying because it overly grants privileges to the CC management db. (However see note 2 below)

  • it will help channel the discussion on solutions. For example:

    • note 1 on the above: we could make the separate field for inheritable privileges immutable, or only mutable with another privilege than CREATEDB or the owner. So the user cannot revoke that themselves. This line of thought would be facilitated later by also implementing pg-style templating (template0 and template1 and the corresponding SQL syntax) so that CC assigns the thing to the template and all user-defined dbs get them automatically.

    • note 2 on the above: we are going to have multi-tenancy soon. It is my expectation that CC management DB could be stored on a logical DB in a different tenant ID than that used for the customer's data. If/when the customer's tenant only contains customer data, having a cluster-level set of default SQL privs makes more sense.

What do you say?

Yes, I'll make the CockroachCloud use case more explicit in the RFC.

I like the idea of using multi-tenant in CC and adding the concept of cluster-level privileges. However from @DuskEagle's comments it sounds like this won't be viable yet in 20.2. My suggestion would be to revisit this in 21.1 since no one else seems to be clamoring about this feature. @DuskEagle, do you think the cron job workaround would be viable in the meantime? (Or maybe we can get away with not supporting this at all for the time being. I was only guessing that this would be a requirement.)

As to the immutable field for inheritable privileges, the concept makes sense to me but I'm concerned about the additional complexity it would add to our permissions model. It seems like it would take us one step further from Postgres behavior and could be confusing for our users to grapple with.

@tbg
Copy link
Member

tbg commented Jul 28, 2020

This is an interesting idea. If we make all dedicated CC clusters use a non-system tenant, than we can create internal SQL users or databases for backup and the like on the system tenant, and not worry about a customer dropping these users/databases inadvertently.

Without having caught up on all of the history of discussion, I have a few thoughts on this kind of idea. I suppose the thinking is that a dedicated customer cluster should have a single (secondary) tenant for use by the customer, and everything else is stored under the system tenant. However, in practice I don't know if we want to go down that route. I think the implication would be that the system tenant can perform arbitrary actions on behalf of the tenant, and this comes with considerable complexity.

First, the SQL teams need to implement this functionality (i.e. a sudo of sorts), so there is work to do and that work basically corresponds to adding yet another way in which a SQL subsystem can be spun up, except this time ephemerally in the context of an already running SQL subsystem. This seems undesirable as we already have a clean way to "become" a tenant using a separate process; it would likely end up messier and with some limitations if we tried to do it on the fly. As someone on the SQL team (I am not) I would be quite 👎 on this architecturally.

Second, it's also.. tricky from a security point of view. With multi-tenancy, we are doing all of this work to isolate the privileged system tenant from the secondary tenants physically (certs, pods, etc) and spinning up something that touches tenant data inside of a privileged process without any barriers (no such thing inside of a Go process) just strikes me as a bad idea (@aaron-crl would have a more informed gut feeling here).

So where I come down on all of this is that it seems useful at first glance, but I think the right way of thinking about this is that if you are going to do work as the tenant, it should be done as the tenant, through the proper channels.

That said, some management data that the tenant should never see and can't influence would be a good fit for being kept in the system tenant. But most "tenant-related" data, I suppose, needs to be accessed from the tenant itself (like password hashes), so the aforementioned problems arise.

Copy link
Contributor Author

@solongordon solongordon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @BramGruneir, @darinpp, @dt, @DuskEagle, @knz, and @thtruo)


docs/RFCS/20200720_finer_grained_role_privileges.md, line 89 at r1 (raw file):

Previously, solongordon (Solon) wrote…

Thanks for pointing this out. Per @dt this is an issue not only for nodelocal but also for implicit auth. I'll address this in the next revision.

Done.


docs/RFCS/20200720_finer_grained_role_privileges.md, line 86 at r2 (raw file):

be determined by the `CONTROLJOB` option.

## RESTORE and IMPORT INTO

@aaron-crl This is the section which I just said I would tag you on, though this whole document may be of interest.

@solongordon solongordon changed the title WIP: rfcs: new RFC on finer-grained role options rfcs: new RFC on finer-grained role options Jul 28, 2020
@solongordon
Copy link
Contributor Author

I'm removing the "WIP" from this since the RESTORE/IMPORT todo is resolved. (But of course since it's an RFC it's still a work in progress until it's merged. Please continue to share your comments.)

@tbg
Copy link
Member

tbg commented Jul 29, 2020

Without having caught up on all of the history of discussion, I have a few thoughts on this kind of idea.

I wanted to add that I'm not sure that I understand examples and use cases for this well enough to really get a good feel for what's appropriate here. So take my comment with a grain of salt. I'd like to understand better what the current pain points are and how a strawman solution using the additional tenant would help. This conversation is likely not appropriate for this PR.

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding access to system tables

This is a topic that was touched on briefly on slack by the SQL features+schema teams based on a request/remark by @ajwerner : what do we do for direct access to system tables via SQL?

We have the following problem to solve:

  • on the one hand, most system tables should be "out of reach" by users, especially CC users. We want to move away from a world where CC users have the admin role, and ahead of that we want to move away from a world where they can write to system tables directly. Here's what has been discussed so far (if my understanding is correct, please correct me if wrong):

    • while introducing ownership (sql: Add ownership concept for objects #51856 by @RichardJCai ) there's a proposal to reparent system tables to the node user. Then CC can remove write access by root and admin, and the only writes to system tables that are possible are via internal functions in the SQL code.

    • we'd need some SQL built-in functions to permit manual fixes, e.g in the past we've wanted the ability to repair system.namespace / system.descriptors.

  • on the other hand, we have a couple of application-level "system" tables: system.locations and system.replication_* which as per documentation and by design should be writable by end-users (the former to update the geographical location of their nodes; the latter to purge old/unused reports).

    For the record I personally hate that these tables landed in the system namespace because it breaks the boundary we would prefer to have, that system is off-limit for end-users. This was a terrible design decision and I pray every day that someone will come along with an initiative to fix that.

So, the proposal to re-own all system tables to node and remove write access to root/admin will break system.locations and similar.

Hence question for this RFC: do we need a new role option to access (especially write to) system tables?

Maybe a new role option will address the apparent conflict outlined above; or maybe we need to move that data elsewhere. I will welcome suggestions.

Regarding cluster settings

I like the idea of specifying "tenant-level" settings in a system table. One complication is what to do when new tenant-level settings are added in future Cockroach versions. Do we write a migration to add them to that table?

Yes.

Also, I think it is time to call out cluster settings in the text of the RFC.

Regarding the missing use cases

A question raised before: "how can we make certain roles have access to everything" and at the same time "how can we make end-user roles unable to write to CC management tables"?

This discussion was hard because the RFC has not yet outlined the missing use cases. Let's do this:

  • Use case 1: as an end-user, I want a role which gives me a guarantee that I can always inspect all databases, tables and other objects, even if I was not GRANTed access explicitly by their owners.

    Today the use case is satisfied; however the only role with this property is admin. This is because pg's privilege model only allows access transitively, via GRANTs.

    This RFC needs to say something about that. (see below)

  • Use case 2: as a CC SRE, I want the ability to store CC management data on end-user clusters in a way that prevents the end-users from altering that data (and perhaps even view it).

    Today this use case is not satisfied: end-users get the admin bit and thus can do anything with the CC mgt data. We want that in the future though (however I don't think we have a deadline for it).

    This RFC may need to say something about that, unsure. (it wasn't in scope. But see below)

  • Use case 3: as a CC management console implementer, I want the ability to create/drop users/roles in an end-user cluster and administrate privileges, even if the end-user made mistakes and accidentally revoked things in their cluster.

    Today this use case is already satisfied: CC management console can use a root TLS cert to log in and always gets access to manage users and credentials even if the end-user made mistakes. We do not plan to change that.

    Therefore this RFC does not need to cover this use case. I only mention the use case here because Tobias made a side remark about that above and we need to clarify scope.

Now back to use case 1 which is most important. I made two proposals for this: whole-cluster inheritable access privilege grants (probably stored in the role option table); or a new field on db+table descriptors storing non-revokable admin privileges.

Solon said the following:

  • a new field on db+table descriptors seem like unwanted complexity and divergence from pg. We want less stuff in descriptors, not more.
  • whole-cluster inheritable privilege grants are a very good solution for use case 1, however they seem incompatible with use case 2.

So here we are: we're really looking at use case 1 and want to pay some attention to use case 2 in that context (but maybe less so, since use case 2 wasn't in scope).

What about whole-cluster inheritable priv grants? What are the consequences of that?

  • it would satisfy use case 1. It's easy to understand, easy to explain, probably relatively easy to implement.
  • it permanently makes end-users ultimately "own" all the data objects in their logical schema. Therefore, it is incompatible with a view where CC management console could carve out a "private" space which is off-limit for end-users, and thus makes a privilege-based solution to use case 2 impossible.

Now that this is clear, we discussed what the future would look like in this world.

As is clear now, the team sees a lot of hope in using the multi-tenant infrastructure to help with use case 2.

At this point I need to remind readers that use case 2 is not in scope for this RFC and thus we do not need a final solution for that to move forward with the RFC (and complete it and deliver something). We can just be comfortable to say "we'll wait for multi-tenant to be a bit more mature before we address use case 2".

But for the sake of clarification, here how it would look like:

  1. the end-user data would be stored in a non-system tenant.
  2. the end-user's user and credential info would be stored in that tenant's system tables, as is true already. The CC management console would continue to administrate users and credentials by logging in as root in the user tenant (use case 3), and we don't need to change anything to keep that true.
  3. that cluster's CC management data would be stored in a different tenant keyspace.

Tobias Solon and Joel seem to assume that we'd use the system tenant for that. Actually, my idea was to create a separate non-system tenant, reserved for CC management. I think we want a non-system tenant for security purposes. But we can discuss that choice later.

In any case, using a different tenant keyspace ensures that CC management is invisible by the end-user. It's also guaranteed to remain available to CC mgt console regardless of mistakes that the end-user makes with their privileges.
4. any action that CC management console would need to perform on the end-user's tenant keyspace would be performed by a SQL client conn to the end-user tenant. I don't think we have any reason to require "executing operations on behalf of the user's tenant from the system tenant" like Tobias mentioned. I don't know where that idea came from but I don't like it and we should actively shoot it down.

Anyway these 4 points are mentioned here "for future reference". Since use case 2 is not in scope here, we don't need a solution for it right now.

(Also all this discussion needs to make its way to the text of the RFC.)

Reviewed 1 of 1 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @BramGruneir, @darinpp, @DuskEagle, @solongordon, and @thtruo)


docs/RFCS/20200720_finer_grained_role_privileges.md, line 59 at r1 (raw file):

Previously, solongordon (Solon) wrote…

Have you decided on a solution here? Is the plan to write a migration to grant CREATELOGIN to any role who currently have CREATEROLE?

Yes that seems reasonable.


docs/RFCS/20200720_finer_grained_role_privileges.md, line 102 at r1 (raw file):

Previously, solongordon (Solon) wrote…

Yes, I'll make the CockroachCloud use case more explicit in the RFC.

I like the idea of using multi-tenant in CC and adding the concept of cluster-level privileges. However from @DuskEagle's comments it sounds like this won't be viable yet in 20.2. My suggestion would be to revisit this in 21.1 since no one else seems to be clamoring about this feature. @DuskEagle, do you think the cron job workaround would be viable in the meantime? (Or maybe we can get away with not supporting this at all for the time being. I was only guessing that this would be a requirement.)

As to the immutable field for inheritable privileges, the concept makes sense to me but I'm concerned about the additional complexity it would add to our permissions model. It seems like it would take us one step further from Postgres behavior and could be confusing for our users to grapple with.

I'll let Joel answer on your question to him.

As to the other thing (The CC management table privileges) I would like to create a new section in the RFC to explain what we do about that.

Based on the discussion in this comment thread and "at the top" (in the main PR comment thread), I suspect that we are still hovering over the problem without a clear path forward. Will answer in the main thread above - but would be nice to have the discussion in the RFC itself too.

Copy link
Contributor Author

@solongordon solongordon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, I have a different understanding of what is in scope (and I suppose this drives home your point that I should have been more explicit about outlining use cases in the RFC.) From @DuskEagle's original gdoc, I gleaned that use case 2 was required for 20.2. It says that CC users should not have write access to the internal databases and tables created by CC.

On the other hand, I have never gotten a clear answer on whether use case 1 is in scope. This is a feature I imagined that customers (CC and otherwise) might want but I have never been given a hard requirement for it. This is partly why in the RFC I proposed a workaround to make this work for CC but did not want to bend over backwards to build this functionality into the database.

I would like to propose saving discussions of using multi-tenancy to solve these problems for a future RFC. The main question I want this RFC to answer is, "What is the minimum amount of work we can do to allow CC to stop granting end users admin access as of the 20.2 release?" It's useful to know there's hope that multi-tenancy could provide a nicer solution to these problems down the road, but I don't think it's realistic for CC to start using it in the short term.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @BramGruneir, @darinpp, @DuskEagle, @solongordon, and @thtruo)

@knz
Copy link
Contributor

knz commented Jul 29, 2020

The main question I want this RFC to answer is, "What is the minimum amount of work we can do to allow CC to stop granting end users admin access as of the 20.2 release?"

Sure that works for me. But then I'd like to be clear on whether the CC management tables are going to be stored on the user-visible logical schema or not. Unless we have clarity on that, we can't really move forward.

Copy link
Contributor Author

@solongordon solongordon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to be clear on whether the CC management tables are going to be stored on the user-visible logical schema or not.

Yes, I believe that's already the case and there are no immediate plans to change that. @DuskEagle would you please confirm?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @BramGruneir, @darinpp, @DuskEagle, @solongordon, and @thtruo)

@knz
Copy link
Contributor

knz commented Jul 29, 2020

In that case we're back to the drawing board. The two requirements "make an end-user role which has privilege over all dbs and tables" and "make end-users unable to affect CC management tables" appear incompatible within the same SQL logical schema.

@solongordon
Copy link
Contributor Author

solongordon commented Jul 29, 2020 via email

@DuskEagle
Copy link
Member

whether the CC management tables are going to be stored on the user-visible logical schema

The CC management tables that are stored within a cluster today are used by the SQLProber to do targeted lookups on ranges in the cluster and ensure that we receive the value we expect (a form of black-box monitoring). Today, we have no choice but to store those tables on the user-visible logical schema. Part of the goal of the original RFC I wrote was to ensure end-users did not have write access to this table. Thus, I considered use case 2 in-scope in the original RFC. Having said that...

In that case we're back to the drawing board. The two requirements "make an end-user role which has privilege over all dbs and tables" and "make end-users unable to affect CC management tables" appear incompatible within the same SQL logical schema.

If I had to cut something from scope of this RFC, I would cut use case 2, while preserving use case 1.

Without a solution for use case 2, we still run the risk that a user action against the SQLProber tables will result in an SRE getting paged. But we will still reap many other benefits from this RFC. We will have prevented a user from breaking their cluster via changing (system-level) cluster settings, writing to system tables, deleting internal SQL users, or canceling the regularly-scheduled backups.

Meanwhile, I would really like to preserve use case 1. When a CC customer creates a SQL user, they expect to be able to view the databases and tables created by other SQL users. In addition, without this property, if a customer deletes a SQL user, it's unclear how they would regain access to the tables and databases exclusively owned by that SQL user.

I can't see the cronjob workaround being viable for this. A user may attempt to inspect a newly-created database or table before the cronjob has run, and not see the database or table. I also have concerns about the fragility of using an external cronjob to regularly grant permissions to users.

Copy link
Contributor

@bdarnell bdarnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Baking in the definition of what an app-level privilege is means we need to release a new binary to add or remove a privilege from this list. If we instead expose these settings via the CockroachCloud dashboard, we only need to release a new version of the dashboard to make such changes.

I wouldn't let this be the determining factor - we need to speed up the release cycle to CC anyway, so ultimately we should be able to ship new database binaries nearly as quickly as we could update the CC dashboard.

(Sidenote: Postgres allows database owners to modify session defaults at the database level. Seems like something we should consider in the future.)

Postgres also has user-level session defaults, which would be a better fit for our model (our sessions are bound to a single user, but we don't restrict sessions to a single database) #21151

In that case we're back to the drawing board. The two requirements "make an end-user role which has privilege over all dbs and tables" and "make end-users unable to affect CC management tables" appear incompatible within the same SQL logical schema.

If I had to cut something from scope of this RFC, I would cut use case 2, while preserving use case 1.

I'd go the other direction: CC users don't get to create SQL users that automatically have access to all DBs. However, CC admins can use the CC console to see all databases and grant their SQL users access to them (this is basically a manual version of the cron job hack - more work for the user, but much less magical and less likely to go badly wrong).

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @BramGruneir, @darinpp, @DuskEagle, @knz, @piyush-singh, @solongordon, and @thtruo)


docs/RFCS/20200720_finer_grained_role_privileges.md, line 70 at r4 (raw file):

jobs created by admins.

### CANCELQUERY

We also need permissions for SHOW QUERIES and SHOW SESSIONS.

Copy link
Contributor Author

@solongordon solongordon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd go the other direction: CC users don't get to create SQL users that automatically have access to all DBs. However, CC admins can use the CC console to see all databases and grant their SQL users access to them (this is basically a manual version of the cron job hack - more work for the user, but much less magical and less likely to go badly wrong).

Yes, I'd be interested to hear @DuskEagle's thoughts about this. One idea I added to the RFC last week was that the console could have an interface for changing the owner of each database. That would allow granting all permissions in a way that couldn't be revoked by other users.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @BramGruneir, @darinpp, @DuskEagle, @knz, @piyush-singh, @solongordon, and @thtruo)

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @BramGruneir, @darinpp, @DuskEagle, @knz, @piyush-singh, @solongordon, and @thtruo)


docs/RFCS/20200720_finer_grained_role_privileges.md, line 81 at r1 (raw file):

Previously, solongordon (Solon) wrote…

Done.

I agree that it makes sense to define the conditions necessary to create a changefeed to be the role option CHANGEFEED combined with SELECT privileges on the specific object being watched. Do we have a corresponding BACKUP role option? I assume it would work the same way.

Also, should this be called CREATECHANGEFEED? That seems to be the pattern throughout the rest of these options.

Copy link
Contributor Author

@solongordon solongordon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @BramGruneir, @darinpp, @DuskEagle, @knz, @piyush-singh, @solongordon, and @thtruo)


docs/RFCS/20200720_finer_grained_role_privileges.md, line 81 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I agree that it makes sense to define the conditions necessary to create a changefeed to be the role option CHANGEFEED combined with SELECT privileges on the specific object being watched. Do we have a corresponding BACKUP role option? I assume it would work the same way.

Also, should this be called CREATECHANGEFEED? That seems to be the pattern throughout the rest of these options.

We could add one for BACKUP too though like with RESTORE, the backup destination would be some what restricted. (No nodelocal, implicit creds, etc.) I left it off because I don't know of an existing requirement for this. Do you? If not, this RFC probably needs a Future Work section anyway and I can add it there.

I didn't call it CREATECHANGEFEED because I expect that in the future it will also permit pausing or canceling a changefeed. But maybe I should call it something else like CONTROLCHANGEFEED to be more like CONTROLJOB.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @BramGruneir, @darinpp, @DuskEagle, @knz, @piyush-singh, @solongordon, and @thtruo)


docs/RFCS/20200720_finer_grained_role_privileges.md, line 81 at r1 (raw file):

I left it off because I don't know of an existing requirement for this. Do you?

Not directly, though we will be locking down both CDC and BACKUP for free-tier (#47913 and #47916). In our case, that is primarily due to limited engineering resources and because of strict outbound firewall policies, but it may also be an indication that such flexibility in the permission system is desirable.

Copy link
Contributor Author

@solongordon solongordon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @bdarnell, @BramGruneir, @darinpp, @DuskEagle, @knz, @piyush-singh, @solongordon, and @thtruo)


docs/RFCS/20200720_finer_grained_role_privileges.md, line 70 at r4 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

We also need permissions for SHOW QUERIES and SHOW SESSIONS.

Good point. So you're saying CANCELQUERY should imply SHOW QUERIES permissions and CANCELSESSION should imply SHOW SESSIONS, right?

Copy link
Contributor

@bdarnell bdarnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @BramGruneir, @darinpp, @DuskEagle, @knz, @piyush-singh, @solongordon, and @thtruo)


docs/RFCS/20200720_finer_grained_role_privileges.md, line 70 at r4 (raw file):

Previously, solongordon (Solon) wrote…

Good point. So you're saying CANCELQUERY should imply SHOW QUERIES permissions and CANCELSESSION should imply SHOW SESSIONS, right?

No, I don't think I'd make CANCEL permissions imply SHOW (or maybe SHOW needs to be split into multiple pieces).

SHOW QUERIES is potentially very sensitive since it may display PII embedded in in-flight queries. An operator may want to be able to run a job (analogous to pt-kill) that kills queries based on their running time (for example) with weaker permissions than that needed to show the actual data.

So splitting up SHOW QUERIES into a full version and a metadata-only version would make it possible to run a query killer with CANCELQUERY and SHOWQUERYMETADATA permissions without SHOWQUERIES. I don't think it's necessary to go to that length right now (especially since ideally we'd build in enough query deadlines that a metadata-only query killer wouldn't be useful), but that's why I wouldn't make the CANCEL privilege imply the SHOW.

Copy link
Contributor Author

@solongordon solongordon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @bdarnell, @BramGruneir, @darinpp, @DuskEagle, @knz, @piyush-singh, @solongordon, and @thtruo)


docs/RFCS/20200720_finer_grained_role_privileges.md, line 70 at r4 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

No, I don't think I'd make CANCEL permissions imply SHOW (or maybe SHOW needs to be split into multiple pieces).

SHOW QUERIES is potentially very sensitive since it may display PII embedded in in-flight queries. An operator may want to be able to run a job (analogous to pt-kill) that kills queries based on their running time (for example) with weaker permissions than that needed to show the actual data.

So splitting up SHOW QUERIES into a full version and a metadata-only version would make it possible to run a query killer with CANCELQUERY and SHOWQUERYMETADATA permissions without SHOWQUERIES. I don't think it's necessary to go to that length right now (especially since ideally we'd build in enough query deadlines that a metadata-only query killer wouldn't be useful), but that's why I wouldn't make the CANCEL privilege imply the SHOW.

Thanks, that makes sense. I'll update the RFC to reflect this.

Copy link
Contributor Author

@solongordon solongordon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to go with the above plan for now under the presumption that missing privileges can be granted via the management console one way or another. I'll follow up with the Cockroach Cloud team to make sure this is the case.

I wouldn't let this be the determining factor - we need to speed up the release cycle to CC anyway, so ultimately we should be able to ship new database binaries nearly as quickly as we could update the CC dashboard.

Okay, in that case I feel better about @knz's original proposal to designate cluster settings as cluster-level or tenant-level. Maybe some customers will end up wanting more flexibility to define this themselves,

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @bdarnell, @BramGruneir, @darinpp, @DuskEagle, @knz, @piyush-singh, and @thtruo)


docs/RFCS/20200720_finer_grained_role_privileges.md, line 59 at r1 (raw file):

Previously, knz (kena) wrote…

Yes that seems reasonable.

Done.


docs/RFCS/20200720_finer_grained_role_privileges.md, line 81 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I left it off because I don't know of an existing requirement for this. Do you?

Not directly, though we will be locking down both CDC and BACKUP for free-tier (#47913 and #47916). In our case, that is primarily due to limited engineering resources and because of strict outbound firewall policies, but it may also be an indication that such flexibility in the permission system is desirable.

Makes sense. I'm added this as future work just so we can limit 20.2 scope.


docs/RFCS/20200720_finer_grained_role_privileges.md, line 108 at r3 (raw file):

Previously, knz (kena) wrote…

Good question! I think we forgot about it.

There are 2 types of information currently hidden under the admin bit in the HTTP endpoints used by the admin UI:

  • information that reveals potentially sensitive data in a way that bypasses SQL - for example range details etc. I don't think we should change access this.
  • the statements page: this is because it includes details about patterns of behavior and application names which could be sensitive.
    • however, the text of the SQL queries is anonymized, so it's not "as much" sensitive
    • my recommendation would be to introduce a new role option for that

The other UI pages already offer separate behavior depending on the credentials of the logged-in user. I believe the status quo is OK.

Piyush is there any UI page besides the statements page that needs special treatment?

I introduced a new VIEWQUERYSTATS option for this. @piyush-singh did you think of any other admin-only UI pages we might want to provide access to?


docs/RFCS/20200720_finer_grained_role_privileges.md, line 70 at r4 (raw file):

Previously, solongordon (Solon) wrote…

Thanks, that makes sense. I'll update the RFC to reflect this.

Done. I added a SHOWQUERIES option to control access to both SHOW QUERIES and SHOW SESSIONS. Let me know if you think we need more granularity than that right now.

Copy link

@aaron-crl aaron-crl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for being so late to the PR.

I'd go the other direction: CC users don't get to create SQL users that automatically have access to all DBs. However, CC admins can use the CC console to see all databases and grant their SQL users access to them (this is basically a manual version of the cron job hack - more work for the user, but much less magical and less likely to go badly wrong).

Yes, I'd be interested to hear @DuskEagle's thoughts about this. One idea I added to the RFC last week was that the console could have an interface for changing the owner of each database. That would allow granting all permissions in a way that couldn't be revoked by other users.

I'm strongly in favor of relying on Console to manage these permissions with the explicit call out that anything an SRE can do regarding granting and removing permissions should land within the realm of self service within reason.

It also abstracts the concept of DBA for DBaaS to a function of the Console/orchestration layer which is more auditable, cleaner, and more readily tuned.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @bdarnell, @BramGruneir, @darinpp, @DuskEagle, @knz, @piyush-singh, @solongordon, and @thtruo)


docs/RFCS/20200720_finer_grained_role_privileges.md, line 86 at r2 (raw file):

Okay, in that case I feel better about @knz's original proposal to designate cluster settings as cluster-level or tenant-level. Maybe some customers will end up wanting more flexibility to define this themselves,

It looks like we're going with a cluster/tenant split now for these if I understand correctly?

@tbg tbg added the X-noremind Bots won't notify about PRs with X-noremind label May 6, 2021
@rafiss
Copy link
Collaborator

rafiss commented Sep 23, 2021

I'll go ahead and merge this. This has been implemented since v20.2.

bors r+

@craig
Copy link
Contributor

craig bot commented Sep 23, 2021

Build succeeded:

@craig craig bot merged commit 984ad1a into cockroachdb:master Sep 23, 2021
jayshrivastava added a commit to jayshrivastava/cockroach that referenced this pull request Feb 1, 2023
This is a preliminary change before adding the `VIEWJOB` role
option in the future. The purpose of the `VIEWJOB` role option
is to allow TSEs to view jobs in admin clusters without
modifying them. This requires access to view all jobs, including ones
owned by admins.

Adding `VIEWJOB` conflicts with the present `CONTROLJOB` implementation.
It is stronger than `CONTROLJOB` because it lets one view admin jobs when
`CONTROLJOB` doesn't, yet it is weaker because it only allows viewing
jobs and not pause/cancel/resume-ing them.

For this reason, this change is introduced to make `CONTROLJOB` at least as strong
as `VIEWJOB`: it now allows for all jobs to be viewed. In other words,
`VIEWJOB` lets you do a subset of things `CONTROLJOB` lets you do.

Release note (general change):
Previously, users with the `CONTROLJOB` role option could not view
For this reason, this change is introduced to make controljob stronger.

Furthermore, there was no discoverable reason (see [thread](https://cockroachlabs.slack.com/archives/C02DSDS9TM1/p1675194084663479)
and cockroachdb#51646) that `CONTROLJOB`
prevents controlling jobs owned by users that have the `ADMIN` role option,
so that restriction is removed. Note that they still cannot control jobs
owned by the `root` or `node` users, since those are critical internal jobs.

Release note (general change):
Previously, users with the `CONTROLJOB` role option could not view
or control jobs owned by admins (admins are internal users such as "node"
or "root" and also regular users that have with the `ADMIN` role).
Users such as "node" and "root" are used to created internal jobs such as
schema changes, migrations, stats collections etc. This change updates
`CONTROLJOB` to grant access to view all jobs (ex. via `SHOW JOBS`).
`CONTROLJOB` also now allows access to control all jobs except for ones
owned by the "root" or "node" users.

Epic: none
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
X-noremind Bots won't notify about PRs with X-noremind
Projects
None yet
Development

Successfully merging this pull request may close these issues.