-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rfcs: new RFC on finer-grained role options #51646
Conversation
This is still a WIP since I need to hammer out a few more details around RESTORE/IMPORT but it should be far enough along for interested folks to start commenting. This was largely based on an earlier document by @DuskEagle. For reference, these are some of the larger changes I made:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @BramGruneir, @darinpp, @DuskEagle, @knz, and @thtruo)
docs/RFCS/20200720_finer_grained_role_privileges.md, line 31 at r1 (raw file):
These new options will also be useful for customers who want to grant users administrative abilities without managing the security of their cluster by limiting the number of people with true admin access.
@darinpp Perhaps we can add something about multi-tenant here too? I know you need the CREATEDB privilege but not sure what else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for bringing the convo over into RFC. See my comments below.
Reviewed 1 of 1 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @BramGruneir, @darinpp, @DuskEagle, @solongordon, and @thtruo)
docs/RFCS/20200720_finer_grained_role_privileges.md, line 59 at r1 (raw file):
Note that Raphael already has a PR for this: https://github.com/cockroachdb/cockroach/pull/50601
Could you import the last few comments by Ben into the RFC: #50601 (comment)
The question here is how to evolve an existing cluster in a way that's not too unpleasant wrt UX.
docs/RFCS/20200720_finer_grained_role_privileges.md, line 64 at r1 (raw file):
As noted above, we already support this option. However, it currently does not prevent a non-admin user from dropping an admin. We will add this restriction so that an "operator" cannot drop essential admin users.
nit: change "operator"
to "a user with this privilege".
(We have muddled waters around the word "operator" - no need to bring that mud here)
docs/RFCS/20200720_finer_grained_role_privileges.md, line 81 at r1 (raw file):
### CHANGEFEED Allows users to run `CREATE CHANGEFEED` on tables they have `SELECT` privileges
Are you sure that "changefeed" should be a role option, and not a privilege that can be set on databases or tables?
docs/RFCS/20200720_finer_grained_role_privileges.md, line 89 at r1 (raw file):
users. Probably this should be determined by whether the user has the appropriate write access to the database or tables in question but we need to nail down the details.
There's a nuance that's maybe worth adding to the text: RESTORE/IMPORT and the other bulk i/o operations also need a privilege to access the nodelocal directory. This should (probably) be distinct too. Ask David for more input.
docs/RFCS/20200720_finer_grained_role_privileges.md, line 102 at r1 (raw file):
## Automatic grants The new privileges proposed here do not include a way to guarantee that a role has privileges on all objects in the database. This remains a unique property
I would like to re-launch the discussion on this.
-
Can we not extend the privilege model to enable assigning SELECT/UDPATE etc at the level of a database, so that it is inherited by all objects in the schemas underneath?
-
we can also revive the discussion on ownership, and say that an owner on the database also has all privs on the objects underneath.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding cluster settings: I would like to propose to draw a different line, so that SQL end-users can configure e.g. sql.defaults.default_int_size
-- i.e. logical / application-level settings.
Today cluster settings only have a public/reserved distinction. This distinction is about drawing a line about what we support / not-support.
I would recommend in this RFC to add another attribute on cluster settings: "app-level" vs "cluster-level".
Then make "app-level" settings also settable using a new role option. Cluster-level remain admin-only.
Here's the list of app-level settings (cursory scan):
sql.defaults.*
sql.notices.enabled
sql.stats.automatic_collection.*
sql.stats.histogram_collection.*
sql.stats.multi_column_collection.*
sql.trace.session_eventlog.enabled
Debatable, depending on further checking if a user can retrieve the outcomes:
sql.log.slow_query.latency_threshold
sql.metrics.statement_details.*
sql.metrics.transaction_details.enabled
sql.trace.log_statement_execute
sql.trace.txn.enable_threshold
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @BramGruneir, @darinpp, @DuskEagle, @solongordon, and @thtruo)
087c254
to
4c56373
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to ponder your suggestion about cluster settings, which seems totally reasonable. I'm assuming you also propose adding another role option to control access to the application-level settings?
A few things I'm considering:
- Baking in the definition of what an app-level privilege is means we need to release a new binary to add or remove a privilege from this list. If we instead expose these settings via the CockroachCloud dashboard, we only need to release a new version of the dashboard to make such changes.
- Calling these settings "app-level" feels a little misleading. They will affect every connection to the cluster, regardless of app name, role, database, etc. Maybe that's bikeshedding but it doesn't feel unreasonable to me to enforce that only an admin can make such wide-reaching changes. (Sidenote: Postgres allows database owners to modify session defaults at the database level. Seems like something we should consider in the future.)
- Different customers may have different opinions on which cluster-wide settings a non-admin user should be permitted to change.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @BramGruneir, @darinpp, @DuskEagle, @knz, @solongordon, and @thtruo)
docs/RFCS/20200720_finer_grained_role_privileges.md, line 81 at r1 (raw file):
Previously, knz (kena) wrote…
Are you sure that "changefeed" should be a role option, and not a privilege that can be set on databases or tables?
I'm not sure. It feels more natural for it to be role-level to me for a few reasons:
- I'm having trouble thinking of a realistic scenario where you would want a user to have the ability to create changefeeds on only certain tables they have read access to.
- When a user creates a table, they get all the privileges on it. I'm not creating a table should automatically confer the ability to create a changefeed on it.
docs/RFCS/20200720_finer_grained_role_privileges.md, line 102 at r1 (raw file):
Previously, knz (kena) wrote…
I would like to re-launch the discussion on this.
Can we not extend the privilege model to enable assigning SELECT/UDPATE etc at the level of a database, so that it is inherited by all objects in the schemas underneath?
we can also revive the discussion on ownership, and say that an owner on the database also has all privs on the objects underneath.
I tried and failed to come up with a reasonable way to extend our privilege model to support this. In our current model, granting a privilege on a database does not guarantee inheritance on all objects underneath. It only acts as a default for newly created objects, so there is nothing to prevent later revoking those privileges.
Another complication is that we don't actually want to grant privileges on all objects. There may be databases this role should not have access to, like the internal tables used by CockroachCloud.
We are currently investigating adding owner support but I'm not sure how it would help here. We can't necessarily enforce that this role is the owner for all the databases in the cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Baking in the definition of what an app-level privilege is means we need to release a new binary to add or remove a privilege from this list. If we instead expose these settings via the CockroachCloud dashboard, we only need to release a new version of the dashboard to make such changes.
Or we can make a system table contain the list. There are multiple solutions. I agree it would be useful to change this dynamically without code upgrades.
Calling these settings "app-level" feels a little misleading. They will affect every connection to the cluster, regardless of app name, role, database, etc. Maybe that's bikeshedding but it doesn't feel unreasonable to me to enforce that only an admin can make such wide-reaching changes.
I made a mistake by introducing the word "app-level". I really meant "tenant-level". This is extremely easy to delineate properly as soon as you consider the multi-tenant set up. A cluster setting can be safely open to customers when the following conditions hold:
- it does not impact stability, security or correctness of other tenants on the cluster.
- the performance impact, if any, is well-understood in our docs and we are equipped on our side to monitor perf impact.
(Note something interesting about our multi-tenant story: today we do not implement different settings per tenant. So everything we're discussing here is not yet relevant to multi-tenant clusters, where today we'd say "no setting changes possible whatsoever by tenants". However, I find the multi-tenant is useful to draw a line between the two types of cluster settings.)
You'll see the list I provided above is trivially derived only from these two rules.
Different customers may have different opinions on which cluster-wide settings a non-admin user should be permitted to change.
That's another dimension I hadn't considered. Maybe the right way to move forward here is to not draw a line and introduce a privilege, and instead work with an allow-list in a system table.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @BramGruneir, @darinpp, @DuskEagle, @knz, @solongordon, and @thtruo)
docs/RFCS/20200720_finer_grained_role_privileges.md, line 81 at r1 (raw file):
Previously, solongordon (Solon) wrote…
I'm not sure. It feels more natural for it to be role-level to me for a few reasons:
- I'm having trouble thinking of a realistic scenario where you would want a user to have the ability to create changefeeds on only certain tables they have read access to.
- When a user creates a table, they get all the privileges on it. I'm not creating a table should automatically confer the ability to create a changefeed on it.
👍
(maybe explain that in "alternatives" at the end)
docs/RFCS/20200720_finer_grained_role_privileges.md, line 102 at r1 (raw file):
You are right, the owner concept is orthogonal and does not help here.
Let's come back to this:
In our current model, granting a privilege on a database does not guarantee inheritance on all objects underneath. It only acts as a default for newly created objects, so there is nothing to prevent later revoking those privileges.
Ok. Point taken.
we don't actually want to grant privileges on all objects. There may be databases this role should not have access to, like the internal tables used by CockroachCloud.
I think it would be good to call out the use case in the RFC as one of the problems to solve. This would help with two things:
-
it will help call out intuitive solutions as not adequate. For example:
-
I could have suggested to create another field in database descriptors, to store inheritable privileges. This would be a new thing, but would "guarantee inheritance on objects underneath". Then you'd say it is not satisfying because the user can later revoke this. (However see note 1 below)
-
I could have suggested to create a cluster-level set of privileges, inherited by all objects underneath. This would "guarantee inheritance on objects underneath" and also "prevent the user from revoking them". Then you'd say this is not satisfying because it overly grants privileges to the CC management db. (However see note 2 below)
-
-
it will help channel the discussion on solutions. For example:
-
note 1 on the above: we could make the separate field for inheritable privileges immutable, or only mutable with another privilege than CREATEDB or the owner. So the user cannot revoke that themselves. This line of thought would be facilitated later by also implementing pg-style templating (
template0
andtemplate1
and the corresponding SQL syntax) so that CC assigns the thing to the template and all user-defined dbs get them automatically. -
note 2 on the above: we are going to have multi-tenancy soon. It is my expectation that CC management DB could be stored on a logical DB in a different tenant ID than that used for the customer's data. If/when the customer's tenant only contains customer data, having a cluster-level set of default SQL privs makes more sense.
-
What do you say?
This is an interesting idea. If we make all dedicated CC clusters use a non-system tenant, than we can create internal SQL users or databases for backup and the like on the system tenant, and not worry about a customer dropping these users/databases inadvertently. That said, I think the multi-tenant support that is going into 20.2 is too new to use on all dedicated clusters in CC. For one, non-system tenants will have limited-to-no access to the AdminUI or other debug endpoints, which would be a regression for dedicated clusters. It is an interesting idea though, and one I would be interested in exploring further. |
cc @aaron-crl for awareness |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of specifying "tenant-level" settings in a system table. One complication is what to do when new tenant-level settings are added in future Cockroach versions. Do we write a migration to add them to that table? That seems cumbersome and might not be what a customer wants if they're trying to be restrictive about permissions. But if we don't do so, it's a bit of a pain to make users modify this table themselves every time they upgrade to grant access to new settings (and CC would have to do this every time they upgrade a cluster.)
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @BramGruneir, @darinpp, @dt, @DuskEagle, @knz, and @thtruo)
docs/RFCS/20200720_finer_grained_role_privileges.md, line 59 at r1 (raw file):
Previously, knz (kena) wrote…
Could you import the last few comments by Ben into the RFC: #50601 (comment)
The question here is how to evolve an existing cluster in a way that's not too unpleasant wrt UX.
Have you decided on a solution here? Is the plan to write a migration to grant CREATELOGIN to any role who currently have CREATEROLE?
docs/RFCS/20200720_finer_grained_role_privileges.md, line 81 at r1 (raw file):
Previously, knz (kena) wrote…
👍
(maybe explain that in "alternatives" at the end)
Done.
docs/RFCS/20200720_finer_grained_role_privileges.md, line 89 at r1 (raw file):
Previously, knz (kena) wrote…
There's a nuance that's maybe worth adding to the text: RESTORE/IMPORT and the other bulk i/o operations also need a privilege to access the nodelocal directory. This should (probably) be distinct too. Ask David for more input.
Thanks for pointing this out. Per @dt this is an issue not only for nodelocal but also for implicit auth. I'll address this in the next revision.
docs/RFCS/20200720_finer_grained_role_privileges.md, line 102 at r1 (raw file):
Previously, knz (kena) wrote…
You are right, the owner concept is orthogonal and does not help here.
Let's come back to this:
In our current model, granting a privilege on a database does not guarantee inheritance on all objects underneath. It only acts as a default for newly created objects, so there is nothing to prevent later revoking those privileges.
Ok. Point taken.
we don't actually want to grant privileges on all objects. There may be databases this role should not have access to, like the internal tables used by CockroachCloud.
I think it would be good to call out the use case in the RFC as one of the problems to solve. This would help with two things:
it will help call out intuitive solutions as not adequate. For example:
I could have suggested to create another field in database descriptors, to store inheritable privileges. This would be a new thing, but would "guarantee inheritance on objects underneath". Then you'd say it is not satisfying because the user can later revoke this. (However see note 1 below)
I could have suggested to create a cluster-level set of privileges, inherited by all objects underneath. This would "guarantee inheritance on objects underneath" and also "prevent the user from revoking them". Then you'd say this is not satisfying because it overly grants privileges to the CC management db. (However see note 2 below)
it will help channel the discussion on solutions. For example:
note 1 on the above: we could make the separate field for inheritable privileges immutable, or only mutable with another privilege than CREATEDB or the owner. So the user cannot revoke that themselves. This line of thought would be facilitated later by also implementing pg-style templating (
template0
andtemplate1
and the corresponding SQL syntax) so that CC assigns the thing to the template and all user-defined dbs get them automatically.note 2 on the above: we are going to have multi-tenancy soon. It is my expectation that CC management DB could be stored on a logical DB in a different tenant ID than that used for the customer's data. If/when the customer's tenant only contains customer data, having a cluster-level set of default SQL privs makes more sense.
What do you say?
Yes, I'll make the CockroachCloud use case more explicit in the RFC.
I like the idea of using multi-tenant in CC and adding the concept of cluster-level privileges. However from @DuskEagle's comments it sounds like this won't be viable yet in 20.2. My suggestion would be to revisit this in 21.1 since no one else seems to be clamoring about this feature. @DuskEagle, do you think the cron job workaround would be viable in the meantime? (Or maybe we can get away with not supporting this at all for the time being. I was only guessing that this would be a requirement.)
As to the immutable field for inheritable privileges, the concept makes sense to me but I'm concerned about the additional complexity it would add to our permissions model. It seems like it would take us one step further from Postgres behavior and could be confusing for our users to grapple with.
Without having caught up on all of the history of discussion, I have a few thoughts on this kind of idea. I suppose the thinking is that a dedicated customer cluster should have a single (secondary) tenant for use by the customer, and everything else is stored under the system tenant. However, in practice I don't know if we want to go down that route. I think the implication would be that the system tenant can perform arbitrary actions on behalf of the tenant, and this comes with considerable complexity. First, the SQL teams need to implement this functionality (i.e. a sudo of sorts), so there is work to do and that work basically corresponds to adding yet another way in which a SQL subsystem can be spun up, except this time ephemerally in the context of an already running SQL subsystem. This seems undesirable as we already have a clean way to "become" a tenant using a separate process; it would likely end up messier and with some limitations if we tried to do it on the fly. As someone on the SQL team (I am not) I would be quite 👎 on this architecturally. Second, it's also.. tricky from a security point of view. With multi-tenancy, we are doing all of this work to isolate the privileged system tenant from the secondary tenants physically (certs, pods, etc) and spinning up something that touches tenant data inside of a privileged process without any barriers (no such thing inside of a Go process) just strikes me as a bad idea (@aaron-crl would have a more informed gut feeling here). So where I come down on all of this is that it seems useful at first glance, but I think the right way of thinking about this is that if you are going to do work as the tenant, it should be done as the tenant, through the proper channels. That said, some management data that the tenant should never see and can't influence would be a good fit for being kept in the system tenant. But most "tenant-related" data, I suppose, needs to be accessed from the tenant itself (like password hashes), so the aforementioned problems arise. |
4c56373
to
62c9ad5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @BramGruneir, @darinpp, @dt, @DuskEagle, @knz, and @thtruo)
docs/RFCS/20200720_finer_grained_role_privileges.md, line 89 at r1 (raw file):
Previously, solongordon (Solon) wrote…
Thanks for pointing this out. Per @dt this is an issue not only for nodelocal but also for implicit auth. I'll address this in the next revision.
Done.
docs/RFCS/20200720_finer_grained_role_privileges.md, line 86 at r2 (raw file):
be determined by the `CONTROLJOB` option. ## RESTORE and IMPORT INTO
@aaron-crl This is the section which I just said I would tag you on, though this whole document may be of interest.
62c9ad5
to
80a6f7c
Compare
I'm removing the "WIP" from this since the RESTORE/IMPORT todo is resolved. (But of course since it's an RFC it's still a work in progress until it's merged. Please continue to share your comments.) |
I wanted to add that I'm not sure that I understand examples and use cases for this well enough to really get a good feel for what's appropriate here. So take my comment with a grain of salt. I'd like to understand better what the current pain points are and how a strawman solution using the additional tenant would help. This conversation is likely not appropriate for this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding access to system tables
This is a topic that was touched on briefly on slack by the SQL features+schema teams based on a request/remark by @ajwerner : what do we do for direct access to system tables via SQL?
We have the following problem to solve:
-
on the one hand, most system tables should be "out of reach" by users, especially CC users. We want to move away from a world where CC users have the
admin
role, and ahead of that we want to move away from a world where they can write tosystem
tables directly. Here's what has been discussed so far (if my understanding is correct, please correct me if wrong):-
while introducing ownership (sql: Add ownership concept for objects #51856 by @RichardJCai ) there's a proposal to reparent
system
tables to thenode
user. Then CC can remove write access byroot
andadmin
, and the only writes tosystem
tables that are possible are via internal functions in the SQL code. -
we'd need some SQL built-in functions to permit manual fixes, e.g in the past we've wanted the ability to repair system.namespace / system.descriptors.
-
-
on the other hand, we have a couple of application-level "system" tables:
system.locations
andsystem.replication_*
which as per documentation and by design should be writable by end-users (the former to update the geographical location of their nodes; the latter to purge old/unused reports).For the record I personally hate that these tables landed in the
system
namespace because it breaks the boundary we would prefer to have, thatsystem
is off-limit for end-users. This was a terrible design decision and I pray every day that someone will come along with an initiative to fix that.
So, the proposal to re-own all system
tables to node
and remove write access to root/admin will break system.locations and similar.
Hence question for this RFC: do we need a new role option to access (especially write to) system tables?
Maybe a new role option will address the apparent conflict outlined above; or maybe we need to move that data elsewhere. I will welcome suggestions.
Regarding cluster settings
I like the idea of specifying "tenant-level" settings in a system table. One complication is what to do when new tenant-level settings are added in future Cockroach versions. Do we write a migration to add them to that table?
Yes.
Also, I think it is time to call out cluster settings in the text of the RFC.
Regarding the missing use cases
A question raised before: "how can we make certain roles have access to everything" and at the same time "how can we make end-user roles unable to write to CC management tables"?
This discussion was hard because the RFC has not yet outlined the missing use cases. Let's do this:
-
Use case 1: as an end-user, I want a role which gives me a guarantee that I can always inspect all databases, tables and other objects, even if I was not GRANTed access explicitly by their owners.
Today the use case is satisfied; however the only role with this property is
admin
. This is because pg's privilege model only allows access transitively, via GRANTs.This RFC needs to say something about that. (see below)
-
Use case 2: as a CC SRE, I want the ability to store CC management data on end-user clusters in a way that prevents the end-users from altering that data (and perhaps even view it).
Today this use case is not satisfied: end-users get the
admin
bit and thus can do anything with the CC mgt data. We want that in the future though (however I don't think we have a deadline for it).This RFC may need to say something about that, unsure. (it wasn't in scope. But see below)
-
Use case 3: as a CC management console implementer, I want the ability to create/drop users/roles in an end-user cluster and administrate privileges, even if the end-user made mistakes and accidentally revoked things in their cluster.
Today this use case is already satisfied: CC management console can use a
root
TLS cert to log in and always gets access to manage users and credentials even if the end-user made mistakes. We do not plan to change that.Therefore this RFC does not need to cover this use case. I only mention the use case here because Tobias made a side remark about that above and we need to clarify scope.
Now back to use case 1 which is most important. I made two proposals for this: whole-cluster inheritable access privilege grants (probably stored in the role option table); or a new field on db+table descriptors storing non-revokable admin privileges.
Solon said the following:
- a new field on db+table descriptors seem like unwanted complexity and divergence from pg. We want less stuff in descriptors, not more.
- whole-cluster inheritable privilege grants are a very good solution for use case 1, however they seem incompatible with use case 2.
So here we are: we're really looking at use case 1 and want to pay some attention to use case 2 in that context (but maybe less so, since use case 2 wasn't in scope).
What about whole-cluster inheritable priv grants? What are the consequences of that?
- it would satisfy use case 1. It's easy to understand, easy to explain, probably relatively easy to implement.
- it permanently makes end-users ultimately "own" all the data objects in their logical schema. Therefore, it is incompatible with a view where CC management console could carve out a "private" space which is off-limit for end-users, and thus makes a privilege-based solution to use case 2 impossible.
Now that this is clear, we discussed what the future would look like in this world.
As is clear now, the team sees a lot of hope in using the multi-tenant infrastructure to help with use case 2.
At this point I need to remind readers that use case 2 is not in scope for this RFC and thus we do not need a final solution for that to move forward with the RFC (and complete it and deliver something). We can just be comfortable to say "we'll wait for multi-tenant to be a bit more mature before we address use case 2".
But for the sake of clarification, here how it would look like:
- the end-user data would be stored in a non-system tenant.
- the end-user's user and credential info would be stored in that tenant's
system
tables, as is true already. The CC management console would continue to administrate users and credentials by logging in asroot
in the user tenant (use case 3), and we don't need to change anything to keep that true. - that cluster's CC management data would be stored in a different tenant keyspace.
Tobias Solon and Joel seem to assume that we'd use the system tenant for that. Actually, my idea was to create a separate non-system tenant, reserved for CC management. I think we want a non-system tenant for security purposes. But we can discuss that choice later.
In any case, using a different tenant keyspace ensures that CC management is invisible by the end-user. It's also guaranteed to remain available to CC mgt console regardless of mistakes that the end-user makes with their privileges.
4. any action that CC management console would need to perform on the end-user's tenant keyspace would be performed by a SQL client conn to the end-user tenant. I don't think we have any reason to require "executing operations on behalf of the user's tenant from the system tenant" like Tobias mentioned. I don't know where that idea came from but I don't like it and we should actively shoot it down.
Anyway these 4 points are mentioned here "for future reference". Since use case 2 is not in scope here, we don't need a solution for it right now.
(Also all this discussion needs to make its way to the text of the RFC.)
Reviewed 1 of 1 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @BramGruneir, @darinpp, @DuskEagle, @solongordon, and @thtruo)
docs/RFCS/20200720_finer_grained_role_privileges.md, line 59 at r1 (raw file):
Previously, solongordon (Solon) wrote…
Have you decided on a solution here? Is the plan to write a migration to grant CREATELOGIN to any role who currently have CREATEROLE?
Yes that seems reasonable.
docs/RFCS/20200720_finer_grained_role_privileges.md, line 102 at r1 (raw file):
Previously, solongordon (Solon) wrote…
Yes, I'll make the CockroachCloud use case more explicit in the RFC.
I like the idea of using multi-tenant in CC and adding the concept of cluster-level privileges. However from @DuskEagle's comments it sounds like this won't be viable yet in 20.2. My suggestion would be to revisit this in 21.1 since no one else seems to be clamoring about this feature. @DuskEagle, do you think the cron job workaround would be viable in the meantime? (Or maybe we can get away with not supporting this at all for the time being. I was only guessing that this would be a requirement.)
As to the immutable field for inheritable privileges, the concept makes sense to me but I'm concerned about the additional complexity it would add to our permissions model. It seems like it would take us one step further from Postgres behavior and could be confusing for our users to grapple with.
I'll let Joel answer on your question to him.
As to the other thing (The CC management table privileges) I would like to create a new section in the RFC to explain what we do about that.
Based on the discussion in this comment thread and "at the top" (in the main PR comment thread), I suspect that we are still hovering over the problem without a clear path forward. Will answer in the main thread above - but would be nice to have the discussion in the RFC itself too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, I have a different understanding of what is in scope (and I suppose this drives home your point that I should have been more explicit about outlining use cases in the RFC.) From @DuskEagle's original gdoc, I gleaned that use case 2 was required for 20.2. It says that CC users should not have write access to the internal databases and tables created by CC.
On the other hand, I have never gotten a clear answer on whether use case 1 is in scope. This is a feature I imagined that customers (CC and otherwise) might want but I have never been given a hard requirement for it. This is partly why in the RFC I proposed a workaround to make this work for CC but did not want to bend over backwards to build this functionality into the database.
I would like to propose saving discussions of using multi-tenancy to solve these problems for a future RFC. The main question I want this RFC to answer is, "What is the minimum amount of work we can do to allow CC to stop granting end users admin access as of the 20.2 release?" It's useful to know there's hope that multi-tenancy could provide a nicer solution to these problems down the road, but I don't think it's realistic for CC to start using it in the short term.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @BramGruneir, @darinpp, @DuskEagle, @solongordon, and @thtruo)
Sure that works for me. But then I'd like to be clear on whether the CC management tables are going to be stored on the user-visible logical schema or not. Unless we have clarity on that, we can't really move forward. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to be clear on whether the CC management tables are going to be stored on the user-visible logical schema or not.
Yes, I believe that's already the case and there are no immediate plans to change that. @DuskEagle would you please confirm?
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @BramGruneir, @darinpp, @DuskEagle, @solongordon, and @thtruo)
In that case we're back to the drawing board. The two requirements "make an end-user role which has privilege over all dbs and tables" and "make end-users unable to affect CC management tables" appear incompatible within the same SQL logical schema. |
I think the best thing now is for someone from the CC side to weigh in on
whether the first one is a real requirement and if so whether it's
adequately solved by the cronjob workaround I proposed in the RFC. I've
asked a few folks to do so. Then we can proceed from there.
… |
The CC management tables that are stored within a cluster today are used by the SQLProber to do targeted lookups on ranges in the cluster and ensure that we receive the value we expect (a form of black-box monitoring). Today, we have no choice but to store those tables on the user-visible logical schema. Part of the goal of the original RFC I wrote was to ensure end-users did not have write access to this table. Thus, I considered use case 2 in-scope in the original RFC. Having said that...
If I had to cut something from scope of this RFC, I would cut use case 2, while preserving use case 1. Without a solution for use case 2, we still run the risk that a user action against the SQLProber tables will result in an SRE getting paged. But we will still reap many other benefits from this RFC. We will have prevented a user from breaking their cluster via changing (system-level) cluster settings, writing to system tables, deleting internal SQL users, or canceling the regularly-scheduled backups. Meanwhile, I would really like to preserve use case 1. When a CC customer creates a SQL user, they expect to be able to view the databases and tables created by other SQL users. In addition, without this property, if a customer deletes a SQL user, it's unclear how they would regain access to the tables and databases exclusively owned by that SQL user. I can't see the cronjob workaround being viable for this. A user may attempt to inspect a newly-created database or table before the cronjob has run, and not see the database or table. I also have concerns about the fragility of using an external cronjob to regularly grant permissions to users. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Baking in the definition of what an app-level privilege is means we need to release a new binary to add or remove a privilege from this list. If we instead expose these settings via the CockroachCloud dashboard, we only need to release a new version of the dashboard to make such changes.
I wouldn't let this be the determining factor - we need to speed up the release cycle to CC anyway, so ultimately we should be able to ship new database binaries nearly as quickly as we could update the CC dashboard.
(Sidenote: Postgres allows database owners to modify session defaults at the database level. Seems like something we should consider in the future.)
Postgres also has user-level session defaults, which would be a better fit for our model (our sessions are bound to a single user, but we don't restrict sessions to a single database) #21151
In that case we're back to the drawing board. The two requirements "make an end-user role which has privilege over all dbs and tables" and "make end-users unable to affect CC management tables" appear incompatible within the same SQL logical schema.
If I had to cut something from scope of this RFC, I would cut use case 2, while preserving use case 1.
I'd go the other direction: CC users don't get to create SQL users that automatically have access to all DBs. However, CC admins can use the CC console to see all databases and grant their SQL users access to them (this is basically a manual version of the cron job hack - more work for the user, but much less magical and less likely to go badly wrong).
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @BramGruneir, @darinpp, @DuskEagle, @knz, @piyush-singh, @solongordon, and @thtruo)
docs/RFCS/20200720_finer_grained_role_privileges.md, line 70 at r4 (raw file):
jobs created by admins. ### CANCELQUERY
We also need permissions for SHOW QUERIES and SHOW SESSIONS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd go the other direction: CC users don't get to create SQL users that automatically have access to all DBs. However, CC admins can use the CC console to see all databases and grant their SQL users access to them (this is basically a manual version of the cron job hack - more work for the user, but much less magical and less likely to go badly wrong).
Yes, I'd be interested to hear @DuskEagle's thoughts about this. One idea I added to the RFC last week was that the console could have an interface for changing the owner of each database. That would allow granting all permissions in a way that couldn't be revoked by other users.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @BramGruneir, @darinpp, @DuskEagle, @knz, @piyush-singh, @solongordon, and @thtruo)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @BramGruneir, @darinpp, @DuskEagle, @knz, @piyush-singh, @solongordon, and @thtruo)
docs/RFCS/20200720_finer_grained_role_privileges.md, line 81 at r1 (raw file):
Previously, solongordon (Solon) wrote…
Done.
I agree that it makes sense to define the conditions necessary to create a changefeed to be the role option CHANGEFEED
combined with SELECT
privileges on the specific object being watched. Do we have a corresponding BACKUP
role option? I assume it would work the same way.
Also, should this be called CREATECHANGEFEED
? That seems to be the pattern throughout the rest of these options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @BramGruneir, @darinpp, @DuskEagle, @knz, @piyush-singh, @solongordon, and @thtruo)
docs/RFCS/20200720_finer_grained_role_privileges.md, line 81 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I agree that it makes sense to define the conditions necessary to create a changefeed to be the role option
CHANGEFEED
combined withSELECT
privileges on the specific object being watched. Do we have a correspondingBACKUP
role option? I assume it would work the same way.Also, should this be called
CREATECHANGEFEED
? That seems to be the pattern throughout the rest of these options.
We could add one for BACKUP too though like with RESTORE, the backup destination would be some what restricted. (No nodelocal, implicit creds, etc.) I left it off because I don't know of an existing requirement for this. Do you? If not, this RFC probably needs a Future Work section anyway and I can add it there.
I didn't call it CREATECHANGEFEED
because I expect that in the future it will also permit pausing or canceling a changefeed. But maybe I should call it something else like CONTROLCHANGEFEED
to be more like CONTROLJOB
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @BramGruneir, @darinpp, @DuskEagle, @knz, @piyush-singh, @solongordon, and @thtruo)
docs/RFCS/20200720_finer_grained_role_privileges.md, line 81 at r1 (raw file):
I left it off because I don't know of an existing requirement for this. Do you?
Not directly, though we will be locking down both CDC and BACKUP for free-tier (#47913 and #47916). In our case, that is primarily due to limited engineering resources and because of strict outbound firewall policies, but it may also be an indication that such flexibility in the permission system is desirable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @bdarnell, @BramGruneir, @darinpp, @DuskEagle, @knz, @piyush-singh, @solongordon, and @thtruo)
docs/RFCS/20200720_finer_grained_role_privileges.md, line 70 at r4 (raw file):
Previously, bdarnell (Ben Darnell) wrote…
We also need permissions for SHOW QUERIES and SHOW SESSIONS.
Good point. So you're saying CANCELQUERY
should imply SHOW QUERIES
permissions and CANCELSESSION
should imply SHOW SESSIONS
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @BramGruneir, @darinpp, @DuskEagle, @knz, @piyush-singh, @solongordon, and @thtruo)
docs/RFCS/20200720_finer_grained_role_privileges.md, line 70 at r4 (raw file):
Previously, solongordon (Solon) wrote…
Good point. So you're saying
CANCELQUERY
should implySHOW QUERIES
permissions andCANCELSESSION
should implySHOW SESSIONS
, right?
No, I don't think I'd make CANCEL permissions imply SHOW (or maybe SHOW needs to be split into multiple pieces).
SHOW QUERIES is potentially very sensitive since it may display PII embedded in in-flight queries. An operator may want to be able to run a job (analogous to pt-kill
) that kills queries based on their running time (for example) with weaker permissions than that needed to show the actual data.
So splitting up SHOW QUERIES into a full version and a metadata-only version would make it possible to run a query killer with CANCELQUERY and SHOWQUERYMETADATA permissions without SHOWQUERIES. I don't think it's necessary to go to that length right now (especially since ideally we'd build in enough query deadlines that a metadata-only query killer wouldn't be useful), but that's why I wouldn't make the CANCEL privilege imply the SHOW.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @bdarnell, @BramGruneir, @darinpp, @DuskEagle, @knz, @piyush-singh, @solongordon, and @thtruo)
docs/RFCS/20200720_finer_grained_role_privileges.md, line 70 at r4 (raw file):
Previously, bdarnell (Ben Darnell) wrote…
No, I don't think I'd make CANCEL permissions imply SHOW (or maybe SHOW needs to be split into multiple pieces).
SHOW QUERIES is potentially very sensitive since it may display PII embedded in in-flight queries. An operator may want to be able to run a job (analogous to
pt-kill
) that kills queries based on their running time (for example) with weaker permissions than that needed to show the actual data.So splitting up SHOW QUERIES into a full version and a metadata-only version would make it possible to run a query killer with CANCELQUERY and SHOWQUERYMETADATA permissions without SHOWQUERIES. I don't think it's necessary to go to that length right now (especially since ideally we'd build in enough query deadlines that a metadata-only query killer wouldn't be useful), but that's why I wouldn't make the CANCEL privilege imply the SHOW.
Thanks, that makes sense. I'll update the RFC to reflect this.
b0b00a6
to
83fa891
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to go with the above plan for now under the presumption that missing privileges can be granted via the management console one way or another. I'll follow up with the Cockroach Cloud team to make sure this is the case.
I wouldn't let this be the determining factor - we need to speed up the release cycle to CC anyway, so ultimately we should be able to ship new database binaries nearly as quickly as we could update the CC dashboard.
Okay, in that case I feel better about @knz's original proposal to designate cluster settings as cluster-level or tenant-level. Maybe some customers will end up wanting more flexibility to define this themselves,
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @bdarnell, @BramGruneir, @darinpp, @DuskEagle, @knz, @piyush-singh, and @thtruo)
docs/RFCS/20200720_finer_grained_role_privileges.md, line 59 at r1 (raw file):
Previously, knz (kena) wrote…
Yes that seems reasonable.
Done.
docs/RFCS/20200720_finer_grained_role_privileges.md, line 81 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I left it off because I don't know of an existing requirement for this. Do you?
Not directly, though we will be locking down both CDC and BACKUP for free-tier (#47913 and #47916). In our case, that is primarily due to limited engineering resources and because of strict outbound firewall policies, but it may also be an indication that such flexibility in the permission system is desirable.
Makes sense. I'm added this as future work just so we can limit 20.2 scope.
docs/RFCS/20200720_finer_grained_role_privileges.md, line 108 at r3 (raw file):
Previously, knz (kena) wrote…
Good question! I think we forgot about it.
There are 2 types of information currently hidden under the
admin
bit in the HTTP endpoints used by the admin UI:
- information that reveals potentially sensitive data in a way that bypasses SQL - for example range details etc. I don't think we should change access this.
- the statements page: this is because it includes details about patterns of behavior and application names which could be sensitive.
- however, the text of the SQL queries is anonymized, so it's not "as much" sensitive
- my recommendation would be to introduce a new role option for that
The other UI pages already offer separate behavior depending on the credentials of the logged-in user. I believe the status quo is OK.
Piyush is there any UI page besides the statements page that needs special treatment?
I introduced a new VIEWQUERYSTATS option for this. @piyush-singh did you think of any other admin-only UI pages we might want to provide access to?
docs/RFCS/20200720_finer_grained_role_privileges.md, line 70 at r4 (raw file):
Previously, solongordon (Solon) wrote…
Thanks, that makes sense. I'll update the RFC to reflect this.
Done. I added a SHOWQUERIES option to control access to both SHOW QUERIES and SHOW SESSIONS. Let me know if you think we need more granularity than that right now.
83fa891
to
2cf1d27
Compare
Release note: None
2cf1d27
to
732fcb5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for being so late to the PR.
I'd go the other direction: CC users don't get to create SQL users that automatically have access to all DBs. However, CC admins can use the CC console to see all databases and grant their SQL users access to them (this is basically a manual version of the cron job hack - more work for the user, but much less magical and less likely to go badly wrong).
Yes, I'd be interested to hear @DuskEagle's thoughts about this. One idea I added to the RFC last week was that the console could have an interface for changing the owner of each database. That would allow granting all permissions in a way that couldn't be revoked by other users.
I'm strongly in favor of relying on Console to manage these permissions with the explicit call out that anything an SRE can do regarding granting and removing permissions should land within the realm of self service within reason.
It also abstracts the concept of DBA for DBaaS to a function of the Console/orchestration layer which is more auditable, cleaner, and more readily tuned.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @bdarnell, @BramGruneir, @darinpp, @DuskEagle, @knz, @piyush-singh, @solongordon, and @thtruo)
docs/RFCS/20200720_finer_grained_role_privileges.md, line 86 at r2 (raw file):
Okay, in that case I feel better about @knz's original proposal to designate cluster settings as cluster-level or tenant-level. Maybe some customers will end up wanting more flexibility to define this themselves,
It looks like we're going with a cluster/tenant split now for these if I understand correctly?
I'll go ahead and merge this. This has been implemented since v20.2. bors r+ |
Build succeeded: |
This is a preliminary change before adding the `VIEWJOB` role option in the future. The purpose of the `VIEWJOB` role option is to allow TSEs to view jobs in admin clusters without modifying them. This requires access to view all jobs, including ones owned by admins. Adding `VIEWJOB` conflicts with the present `CONTROLJOB` implementation. It is stronger than `CONTROLJOB` because it lets one view admin jobs when `CONTROLJOB` doesn't, yet it is weaker because it only allows viewing jobs and not pause/cancel/resume-ing them. For this reason, this change is introduced to make `CONTROLJOB` at least as strong as `VIEWJOB`: it now allows for all jobs to be viewed. In other words, `VIEWJOB` lets you do a subset of things `CONTROLJOB` lets you do. Release note (general change): Previously, users with the `CONTROLJOB` role option could not view For this reason, this change is introduced to make controljob stronger. Furthermore, there was no discoverable reason (see [thread](https://cockroachlabs.slack.com/archives/C02DSDS9TM1/p1675194084663479) and cockroachdb#51646) that `CONTROLJOB` prevents controlling jobs owned by users that have the `ADMIN` role option, so that restriction is removed. Note that they still cannot control jobs owned by the `root` or `node` users, since those are critical internal jobs. Release note (general change): Previously, users with the `CONTROLJOB` role option could not view or control jobs owned by admins (admins are internal users such as "node" or "root" and also regular users that have with the `ADMIN` role). Users such as "node" and "root" are used to created internal jobs such as schema changes, migrations, stats collections etc. This change updates `CONTROLJOB` to grant access to view all jobs (ex. via `SHOW JOBS`). `CONTROLJOB` also now allows access to control all jobs except for ones owned by the "root" or "node" users. Epic: none
Release note: None