Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: add system privilege(s) for debug/repair endpoints #109814

Closed
rafiss opened this issue Aug 31, 2023 · 0 comments · Fixed by #110609
Closed

sql: add system privilege(s) for debug/repair endpoints #109814

rafiss opened this issue Aug 31, 2023 · 0 comments · Fixed by #110609
Assignees
Labels
A-sql-privileges SQL privilege handling and permission checks. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)

Comments

@rafiss
Copy link
Collaborator

rafiss commented Aug 31, 2023

There are currently many endpoints that are used in on-call/support situations that require the admin role.

This is not ideal, since doing support work should not require unlimited access to the cluster. (Note that the admin role allows the user to view any data in the cluster.)

For this issue, audit all the RequireAdminRole checks in the code, and change them to use a system privilege instead. It likely would be best to add two:

  • a view-only system privilege, possibly reusing the VIEWCLUSTERMETADATA privilege.
  • a "repair" system privilege, for operations that modify cluster state.

Epic: CC-25593

Jira issue: CRDB-31109

@rafiss rafiss added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-sql-privileges SQL privilege handling and permission checks. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) labels Aug 31, 2023
craig bot pushed a commit that referenced this issue Sep 20, 2023
110931: ui: add plan gist as option on bundle collection r=maryliag a=maryliag

Note to reviewers: There is another option to get any plan _except_ from the selected gist, but that is not part of this PR. Once a design is created, this option can be added.

---
Part Of #103018

This commit adds an option to collect statement bundle based on a specific plan gist.

<img width="578" alt="Screenshot 2023-09-19 at 3 18 01 PM" src="https://github.com/cockroachdb/cockroach/assets/1017486/5ab807b7-08f4-49e7-b540-2dadface766d">


https://www.loom.com/share/59335438f0884b75a7d163d96effe5a8

Release note (ui change): Add option to filter out by specific plan gist when collecting a statement bundle.

110976: syntheticprivilege: admin always has ALL global privileges r=rafiss a=rafiss

### syntheticprivilege: admin always has ALL global privileges

As we move away from requiring the admin role to perform cluster
debug/repair operations, we want to use a privilege instead. To
facilitate that, the admin role now implicitly has ALL global
privileges. The privilege for admins is not revokeable.

---

### sql: use better error message for missing system privilege

Since we document privileges on the GlobalPrivilegeObject using the
phrase "system privilege", we should make the error message say that
too.

informs #109814
Release note: None

110979: roachtest: use correct format directive for job ID r=yuzefovich a=yuzefovich

`catpb.JobID` doesn't implement `fmt.Stringer`.

Touches: #110782.

Epic: None

Release note: None

Co-authored-by: maryliag <[email protected]>
Co-authored-by: Rafi Shamim <[email protected]>
Co-authored-by: Yahor Yuzefovich <[email protected]>
craig bot pushed a commit that referenced this issue Sep 21, 2023
110084: privilege: add REPAIRCLUSTERMETADATA system prvilege and remove admin checks r=rafiss a=rafiss

As we move away from requiring the admin role to perform cluster debug/repair operations, we want to use a privilege instead. To facilitate that, the admin role now implicitly has ALL global privileges. The privilege for admins is not revokeable.

Some exceptions were made, since the focus of this change is to use this
privilege for debug/repair endpoints. The following are still gated by
the admin role:
- ALTER ROLE on an admin
- ALTER DEFAULT PRIVILEGES FOR ALL ROLES
- changing a backup schedule metric

Release note (sql change): SQL debug/repair commands that were previously only usable
by users with the admin role can now be used by users with the
VIEWCLUSTERMETADATA or REPAIRCLUSTERMETADATA system privilege, depending
on whether the operation is read-only or modifies state.

informs #109814
Release note: None

Co-authored-by: Rafi Shamim <[email protected]>
@craig craig bot closed this as completed in 235babd Sep 22, 2023
craig bot pushed a commit that referenced this issue Nov 27, 2023
114861: server: remove admin checks from server-v2 r=rafiss a=rafiss

These are now replaced by checks for the VIEWCLUSTERMETADATA privilege. The equivalent change was already made on the v1 server.

The following endpoints are affected (these endpoints are not used in the frontend, so this change does not have any user-facing impact):
- /sessions/
- /nodes/
- /nodes/{node_id}/ranges/
- /ranges/{range_id:[0-9]+}/
- /events/

informs #114384
informs #79571
informs #109814
Release note: None

115000: streamingccl: only return lag replanning error if lagging node has not advanced r=stevendanna a=msbutler

Previously, the frontier processor would return a lag replanning error if it detected a lagging node and after the hwm had advanced during the flow. This implies the frontier processor could replan as soon as a lagging node finished its catchup scan and bumped the hwm, but was still far behind the other nodes, as we observed in #114706. Ideally, the frontier processor should not throw this replanning error because the lagging node is making progress and because replanning can cause repeated work.

This patch prevents this scenario by teaching the frontier processor to only throw a replanning error if:
- the hwm has advanced in the flow
- two consecutive lagging node checks detected a lagging node and the hwm has not advanced during those two checks.

Informs #114706

Release note: none

115046: catalog/lease: fix flake for TestTableCreationPushesTxnsInRecentPast r=fqazi a=fqazi

Previously, TestTableCreationPushesTxnsInRecentPast could flake because we attempted to increase odds of hitting the uncertainty interval error by adding a delay on KV RPC calls. This wasn't effective and could still intermittent failures, and instead we are going to directly modify the uncertainty interval by setting a large MaxOffset on the clock. This will cause the desired behaviour in a more deterministic way.

Fixes: #114366

Release note: None

Co-authored-by: Rafi Shamim <[email protected]>
Co-authored-by: Michael Butler <[email protected]>
Co-authored-by: Faizan Qazi <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-privileges SQL privilege handling and permission checks. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Projects
No open projects
Archived in project
Development

Successfully merging a pull request may close this issue.

1 participant