diff --git a/docs/RFCS/20220426_rbac_extensions_system_privileges.md b/docs/RFCS/20220426_rbac_extensions_system_privileges.md new file mode 100644 index 000000000000..98367177ce50 --- /dev/null +++ b/docs/RFCS/20220426_rbac_extensions_system_privileges.md @@ -0,0 +1,195 @@ +- Feature Name: RBAC Extensions and System Level Privileges +- Status: draft +- Start Date: 2022-04-26 +- Authors: Richard Cai +- RFC PR: [#80580](https://github.com/cockroachdb/cockroach/pull/80580) + +# Summary +CockroachDB’s privilege system today is split between grantable privileges and role options requiring users to maintain two mental models when it comes to RBAC. Furthermore, there are use cases that role options based privilege model do not cover. Notably, role options are scoped per role and cannot be inherited. +The proposal here is to extend our current privilege system to support more objects that are not necessarily backed by a descriptor such that privilege inheritance can be well defined. + +# Motivation +In CockroachDB today, the privilege system is somewhat bifurcated. We have database object level grants such as SELECT for tables or CREATE for databases. These grants exist only on database objects (tables, schemas, databases, types). We have also leveraged the role option system to define some coarse system level privileges. For example, we have role options such as CONTROLJOB which lets a user pause/cancel jobs. Jobs are not a database level object meaning they do not fall anywhere in the database/schema/table hierarchy and they are also not backed by a descriptor. As such we cannot define privileges on them currently, we only allow defining privileges on database level objects that are backed by a descriptor. In the meanwhile, CONTROLJOB gives us some control over who can interact with jobs. + +The problem with role options are that they are not inheritable, this can be considered an anti-pattern with regards to RBAC. Generally in the RBAC model, when a role is granted a privilege, granting that role to another role will also give the role that privilege. With role options we currently do not support inheritance due to ambiguous inheritance semantics. + +This has led to customer issues such as [#79571](https://github.com/cockroachdb/cockroach/issues/79571). + +Another relevant use case is that for Cockroach Cloud we want to be able to define an “operator” user which is a step below admin, (granting admin generally violates the least privilege principle). As of right now, the operator role is given `CREATEROLE CREATELOGIN CREATEDB CONTROLJOB CANCELQUERY VIEWACTIVITY CONTROLCHANGEFEED` role options, the problem is, granting the operator role to a user does not actually allow the user to perform operator abilities due to role options not being inherited. The user must login as the operator role itself. + +This will also allow us to support grants on virtual tables which has been a pain point in the past. + +# Technical Design +The goal here is to extend our current privilege system to allow more granular grants on objects / entities that are not necessarily backed by a descriptor such as jobs and changefeed objects. +Certain role options should be replaced with grants. At a high level, we want to introduce “system” or “global” level grants that follow general inheritance rules. This is a concept that exists in many popular non-Postgres databases. (MySQL, Oracle, SQL Server, Db2). We also want to support more granular grants on jobs and bulkio operations. + +## Replacing Relevant Role Options +Note - to preface this, role options will likely not be deprecated, we can keep them side by side with their equivalent form of regular grants due to the overhead not being high enough to warrant a potentially tricky migration. We can emphasize in docs to use system privileges instead. + +In short, role options should be reserved for user administration permissions, anything that reads/writes from the database should be a grant. + +Role Option | Inheritable +---- | ---- +CREATEROLE / NOCREATEROLE | Unchanged +PASSWORD | Unchanged +LOGIN / NOLOGIN | Unchanged +VALIDUNTIL | Unchanged +CONTROLJOB / NOCONTROLJOB | Replace with per job grants Example: GRANT PAUSE ON JOB TYPE BACKUP +CONTROLCHANGEFEED / NOCONTROLCHANGEFEED | Changefeeds are created per table and write to a sink. We can support CHANGEFEED privilege on tables or on the database and also introduce grants on sinks +CREATEDB / NOCREATEDB | Unchanged (Is a Postgres roleoption) +CREATELOGIN / NOCREATELOGIN | Unchanged (?) +VIEWACTIVITY / NOVIEWACTIVITY | Replace with system level grant +CANCELQUERY / NOCANCELQUERY | Replace with system level grant +MODIFYCLUSTERSETTING / NOMODIFYCLUSTERSETTING | Replace with system level grant +VIEWACTIVITYREDACTED / NOVIEWACTIVITYREDACTED | Replace with system level grant +SQLLOGIN / NOSQLLOGIN | Unchanged +VIEWCLUSTERSETTING / NOVIEWCLUSTERSETTING | Replace with system level grant + +## New System Table + +The system.privileges table will store privileges for non-descriptor backed objects. +We will require either a path-style string or the combination of object_type + id to uniquely identify them. The privileges will be represented as a bit mask. We can serialize and deserialize synthetic privilege descriptors using rows in this table to leverage our existing code that manipulates PrivilegeDescriptors for grants. We can also consider making the privileges and role_options columns have type `[]STRING`, the downside here is that it takes slightly more space (probably not a concern) and that we have to add some logic to serialize the string array into a bit array which is how we currently manipulate privileges. + +### system.privileges schema option 1 + +``` +CREATE TABLE system.privileges ( +username STRING NOT NULL, +object string NOT NULL, +privileges VARBIT NOT NULL, +with_grant VARBIT NOT NULL, +CONSTRAINT "primary" PRIMARY KEY (username, object) +); +``` + +The `object` is a string style path that represents an object. +The general format of the string is +`/namespace/id.../` + +For example we can identify jobs per user using the string +`/jobs/user/$USERID/backup/$JOBID` + +To identify a virtual system table for example the `system.descriptor` table with id 3, we can do: +`/virtual-table/0/29/3` +`virtual-table` identifies the namespace which is virtual tables. +The 0 is the system database id, the 29 is the schema id and the 3 is the table id. + +We could identify system level or system cluster setting privileges with: +`/system/` or `/system/cluster-settings/` + +### system.privileges schema option 2 + +Have a object_type (string) and (object_id) INT column. + +We will create a new “system.privileges” table. +``` +CREATE TABLE system.privileges ( +username STRING NOT NULL, +object_id INT NOT NULL, +object_type string NOT NULL, +privileges VARBIT NOT NULL, +with_grant VARBIT NOT NULL, +CONSTRAINT "primary" PRIMARY KEY (username, object, object_type) +); +``` + + +We have also considered making the `object_type` an enum, however this would lead to headaches in mixed version cluster settings. We would have to worry about if the enum value exists or not and this makes adding object types fairly annoying as we'd have to update the system enum. + +`system.object_type` will be an enum that defines the various object types that can support privileges in CockroachDB. This tentatively looks like +``` +CREATE TYPE system.object_type AS ENUM('system', 'jobs', 'changefeed_sink') +``` + +Note that this will be the first example of a system enum. + +### Option 1 vs Option 2 Pros and Cons +The path-style string allows for more flexibility, we can for example define jobs per user whereas in option 2 where we identify objects based on the type and id, this is less straightforward to do. +The path-style string also opens up the possibility of easily support wild card grants. The code complexity of option 1 seems marginally higher than option 2. +Option 1 overall seems more appealing due to the flexibility and allowing for expressive grants in the future. + +### Alternate considerations +The system level grants could be stored on a PrivilegeDescriptor on the system database, however this would require us to support mutations on the system database descriptor which we currently do not do and this allows us to not perform a round trip whenever we look for a system table. + +We could create descriptors to represent every object we want to grant on. This is not an attractive option at first glance as these descriptors will likely only be used to store privileges and nothing more. + +## Syntax for System Level Privileges +The current proposal is to support `GRANT SYSTEM [privileges...] TO [users...]`. +We cannot do `GRANT [privileges…] ON SYSTEM TO [users…]` due to the fact that tables can be called system, this would be ambiguous. Using two words, such as SYSTEM ADMIN (can’t think of a good phrase here) would work. + +Proposed new system level privileges: +VIEWACTIVITY (this can be broken down into more privileges) +VIEWACTIVITYREDACTED +CANCELQUERY +MODIFYCLUSTERSETTING +VIEWCLSUTERSETTING +VIEWSYSTEMTABLES (can be broken down further if necessary) + + +The syntax is up in the air however, the grant would create or modify the entry in the system.privileges table. + +Example: +`GRANT SYSTEM MODIFYCLUSTERSETTING TO foo` + +Results in the row +(foo, 1, “system”, 16384, 0) + +The bitmask value for MODIFYCLUSTERSETTING is assumed to be 1 << 14 = 16384 in the above example. + +Followed by + +`GRANT SYSTEM VIEWCLUSTERSETTING TO foo` + +Results in the row +(foo, 1, “system”, 49152, 0) + +The bitmask value for VIEWCLUSTERSETTING is assumed to be 1 << 15 = 32768 in the above example. 1<<14 | 1<<15 = 49152. + +## Support for grants on virtual tables +One thing worth explicitly calling out is that users have asked for the ability to grant on virtual tables (crdb_internal, pg_catalog). This proposal allows us to support grants on virtual tables and unify the grant experience for tables. Virtual tables have OIDs starting at `MaxUint32 = 1<<32 - 1` counting backwards, an INT (64 bit) column for object id supports this case. + +## Adding new privileges +This proposal allows for CockroachDB engineering teams to add privileges on arbitrarily defined objects as long as those objects can be identified by an object type + object id. +Once the SQL Experience team completes the work for this framework, it is the responsibility of individual engineering teams to add privileges. + +## New Bulkio related privileges +Right now we have an issue that we do not have granular controls to enable backup / restore and export / import. For example, right now, admin is required to do a full cluster backup. We could add a system level privilege (ie FULLCLUSTERBACKUP) to enable full cluster backups without admin privileges. + +For backing up individual database objects right now, we check for CONNECT on databases, SELECT on tables and USAGE on schemas and types. We can potentially augment these in the future? + +## New Jobs related privileges +The ability to interact with jobs hinges on the coarse grained CONTROLJOB role option. It makes sense to define privileges such as CONTROL on individual job types. + +Another ask has been to have VIEW only privilege on jobs. + +For example: +`GRANT CONTROL ON JOB TYPE BACKUP TO foo`. + +## New Changefeed related privileges +Changefeed privileges can be defined on the table level, database level. Furthermore, we want to consider defining privileges on sinks. + +## Backwards Compatibility Considerations +We’ll aim to not deprecate role options and continue to support those albeit we may want to place emphasis on docs on using regular grants. + +New grants will not affect backwards compatibility. + +## Observability +Do we include privileges from system.privileges in pg_catalog or only a crdb_internal table? +Likely confine them to a crdb_internal table as this is crdb specific. + +## Multi-region Latency Considerations +In a multi-region cluster, the latency to query a system table may be high. To mitigate this, we can use a cache to reduce the number of hits to the system table. We have caches in similar places such as for authentication and role membership. +A straightforward cache invalidation policy is to clear the cache if any change to the system.privileges table occurs, however we may want to be a little smarter and invalidate entries only if the object identifier is changed in the system table. +We could potentially also use a GLOBAL table. + +## Backup / Restore Considerations +Currently we only restore privileges during a full cluster restore. There is no additional work we'd have to do in this case as system tables are handled in backup / restore. +However, we do have plans to support restoring privileges. [#79144](https://github.com/cockroachdb/cockroach/pull/79144). We'll have to update this PR to also special case restoring rows to the system table. +For restore, we have to consider that ID's can change and track ID changes before writing the privilege row to the `system.privileges` table. +The work for restoring privileges in the non full cluster restore case will not be handled in the initial implementation. + +## Data Sovereignty Considerations +If an object is located in a specific location, does the privilege metadata also have to belong to that region? +The metadata stored in the `system.privileges` data is not sensitive but it is an open question about whether we also need to locate the metadata in the same region as an object given data domiciling requirements. + + diff --git a/pkg/BUILD.bazel b/pkg/BUILD.bazel index 4e612aa83d21..f9e3949656c1 100644 --- a/pkg/BUILD.bazel +++ b/pkg/BUILD.bazel @@ -321,6 +321,7 @@ ALL_TESTS = [ "//pkg/sql/opt/bench:bench_test", "//pkg/sql/opt/cat:cat_test", "//pkg/sql/opt/constraint:constraint_test", + "//pkg/sql/opt/cycle:cycle_test", "//pkg/sql/opt/distribution:distribution_test", "//pkg/sql/opt/exec/execbuilder:execbuilder_test", "//pkg/sql/opt/exec/explain:explain_test", diff --git a/pkg/kv/kvserver/helpers_test.go b/pkg/kv/kvserver/helpers_test.go index 7e812a5593fd..c91337d05c14 100644 --- a/pkg/kv/kvserver/helpers_test.go +++ b/pkg/kv/kvserver/helpers_test.go @@ -195,7 +195,8 @@ func (s *Store) ManualReplicaGC(repl *Replica) error { // ManualRaftSnapshot will manually send a raft snapshot to the target replica. func (s *Store) ManualRaftSnapshot(repl *Replica, target roachpb.ReplicaID) error { - return s.raftSnapshotQueue.processRaftSnapshot(context.Background(), repl, target) + _, err := s.raftSnapshotQueue.processRaftSnapshot(context.Background(), repl, target) + return err } func (s *Store) ReservationCount() int { diff --git a/pkg/kv/kvserver/raft_snapshot_queue.go b/pkg/kv/kvserver/raft_snapshot_queue.go index f3243cfc89ed..d1ab607a7bac 100644 --- a/pkg/kv/kvserver/raft_snapshot_queue.go +++ b/pkg/kv/kvserver/raft_snapshot_queue.go @@ -80,7 +80,7 @@ func (rq *raftSnapshotQueue) shouldQueue( func (rq *raftSnapshotQueue) process( ctx context.Context, repl *Replica, _ spanconfig.StoreReader, -) (processed bool, err error) { +) (anyProcessed bool, _ error) { // If a follower requires a Raft snapshot, perform it. if status := repl.RaftStatus(); status != nil { // raft.Status.Progress is only populated on the Raft group leader. @@ -89,29 +89,30 @@ func (rq *raftSnapshotQueue) process( if log.V(1) { log.Infof(ctx, "sending raft snapshot") } - if err := rq.processRaftSnapshot(ctx, repl, roachpb.ReplicaID(id)); err != nil { + if processed, err := rq.processRaftSnapshot(ctx, repl, roachpb.ReplicaID(id)); err != nil { return false, err + } else if processed { + anyProcessed = true } - processed = true } } } - return processed, nil + return anyProcessed, nil } func (rq *raftSnapshotQueue) processRaftSnapshot( ctx context.Context, repl *Replica, id roachpb.ReplicaID, -) error { +) (processed bool, _ error) { desc := repl.Desc() repDesc, ok := desc.GetReplicaDescriptorByID(id) if !ok { - return errors.Errorf("%s: replica %d not present in %v", repl, id, desc.Replicas()) + return false, errors.Errorf("%s: replica %d not present in %v", repl, id, desc.Replicas()) } snapType := kvserverpb.SnapshotRequest_VIA_SNAPSHOT_QUEUE if typ := repDesc.GetType(); typ == roachpb.LEARNER || typ == roachpb.NON_VOTER { if fn := repl.store.cfg.TestingKnobs.RaftSnapshotQueueSkipReplica; fn != nil && fn() { - return nil + return false, nil } if repl.hasOutstandingSnapshotInFlightToStore(repDesc.StoreID) { // There is a snapshot being transferred. It's probably an INITIAL snap, @@ -135,7 +136,7 @@ func (rq *raftSnapshotQueue) processRaftSnapshot( // some point the snapshot lock above will be released and we'll fall // through to the logic below. repl.reportSnapshotStatus(ctx, repDesc.ReplicaID, err) - return nil + return false, nil } } @@ -160,7 +161,7 @@ func (rq *raftSnapshotQueue) processRaftSnapshot( // We're currently not handling this and instead rely on the quota pool to // make sure that log truncations won't require snapshots for healthy // followers. - return err + return err == nil /* processed */, err } func (*raftSnapshotQueue) timer(_ time.Duration) time.Duration { diff --git a/pkg/sql/opt/cycle/BUILD.bazel b/pkg/sql/opt/cycle/BUILD.bazel new file mode 100644 index 000000000000..b27fd3315666 --- /dev/null +++ b/pkg/sql/opt/cycle/BUILD.bazel @@ -0,0 +1,14 @@ +load("@io_bazel_rules_go//go:def.bzl", "go_library", "go_test") + +go_library( + name = "cycle", + srcs = ["detector.go"], + importpath = "github.com/cockroachdb/cockroach/pkg/sql/opt/cycle", + visibility = ["//visibility:public"], +) + +go_test( + name = "cycle_test", + srcs = ["detector_test.go"], + embed = [":cycle"], +) diff --git a/pkg/sql/opt/cycle/detector.go b/pkg/sql/opt/cycle/detector.go new file mode 100644 index 000000000000..dd2d5687c34a --- /dev/null +++ b/pkg/sql/opt/cycle/detector.go @@ -0,0 +1,118 @@ +// Copyright 2022 The Cockroach Authors. +// +// Use of this software is governed by the Business Source License +// included in the file licenses/BSL.txt. +// +// As of the Change Date specified in that file, in accordance with +// the Business Source License, use of this software will be governed +// by the Apache License, Version 2.0, included in the file +// licenses/APL.txt. + +package cycle + +// Detector finds cycles in directed graphs. Vertices are identified as unique +// integers by the caller. +// +// Example usage: +// +// var d Detector +// d.AddEdge(Vertex(0), Vertex(1)) +// d.AddEdge(Vertex(1), Vertex(2)) +// d.AddEdge(Vertex(2), Vertex(0)) +// d.FindCycleStartingAtVertex(Vertex(0)) +// => [0, 1, 2, 0], true +// +type Detector struct { + // edges are the directed edges in the graph. + edges map[Vertex][]Vertex +} + +// Vertex is a vertex in the graph. +type Vertex int + +// status represents one of three possible states of a vertex during cycle +// detection. +type status int + +const ( + // statusNotVisited indicates that a vertex has not been previously visited. + statusNotVisited status = iota + // statusVisited indicates that a vertex has been visited but a verdict on + // its involvement in a cycle has not yet been reached. + statusVisited + // statusDone indicates that a vertex has been visited and has been proven + // innocent of being involved in a cycle. + statusDone +) + +// AddEdge adds a directed edge to the graph. +func (cd *Detector) AddEdge(from, to Vertex) { + if cd.edges == nil { + cd.edges = make(map[Vertex][]Vertex) + } + cd.edges[from] = append(cd.edges[from], to) +} + +// FindCycleStartingAtVertex searches for a cycle starting from v. If one is +// found, a path containing that cycle is returned. After finding the first +// cycle, searching ceases. If no cycle is found, ok=false is returned. +func (cd *Detector) FindCycleStartingAtVertex(v Vertex) (cycle []Vertex, ok bool) { + vertices := make(map[Vertex]status) + var s stack + if ok := cd.hasCycle(v, vertices, &s); ok { + // If a cycle was found, s will contain a path that includes the cycle. + return s, true + } + return nil, false +} + +// hasCycle performs a depth-first search of the graph starting at v in search +// of a cycle. It returns true when the first cycle is found. If a cycle was +// found, s contains a path that includes the cycle. Cycles are detected by +// finding edges that backtrack to a previously visited vertex. +func (cd *Detector) hasCycle(v Vertex, vertices map[Vertex]status, s *stack) bool { + // Mark the given Vertex as visited and push it onto the stack. + vertices[v] = statusVisited + s.push(v) + + // Search all children of v for cycles. + for _, child := range cd.edges[v] { + switch vertices[child] { + case statusVisited: + // A cycle has been found. The current stack is a path to the cycle. + // Push the child onto the stack so that the cycle is more obvious + // in the path. + s.push(child) + return true + case statusNotVisited: + // Recurse if this child of v has not yet been visited. + if ok := cd.hasCycle(child, vertices, s); ok { + // If a cycle was found deeper down, propagate the result and + // halt searching other children of v. + return true + } + case statusDone: + // We have already proven that this child does not backtrack to + // anywhere further up the stack, so we do not need to recurse into + // it. + } + } + + // If we didn't return early from the loop, we did not find a cycle below + // this vertex, so we can mark the vertex as done and pop it off the stack. + vertices[v] = statusDone + s.pop() + return false +} + +// stack is a simple implementation of a stack of vertices. It allows more +// ergonomic mutation by callees than a slice. +type stack []Vertex + +func (s *stack) push(i Vertex) { + *s = append(*s, i) +} + +func (s *stack) pop() { + *s = (*s)[:len(*s)-1] +} diff --git a/pkg/sql/opt/cycle/detector_test.go b/pkg/sql/opt/cycle/detector_test.go new file mode 100644 index 000000000000..5dccee16a5e3 --- /dev/null +++ b/pkg/sql/opt/cycle/detector_test.go @@ -0,0 +1,188 @@ +// Copyright 2022 The Cockroach Authors. +// +// Use of this software is governed by the Business Source License +// included in the file licenses/BSL.txt. +// +// As of the Change Date specified in that file, in accordance with +// the Business Source License, use of this software will be governed +// by the Apache License, Version 2.0, included in the file +// licenses/APL.txt. + +package cycle + +import ( + "reflect" + "testing" +) + +type edge struct { + from Vertex + to Vertex +} + +func TestDetector(t *testing.T) { + testCases := []struct { + edges []edge + start Vertex + expected []Vertex + }{ + // No cycles. + { + edges: []edge{}, + start: 0, + expected: nil, + }, + { + edges: []edge{ + {0, 1}, + {1, 2}, + {2, 3}, + }, + start: 0, + expected: nil, + }, + { + edges: []edge{ + {0, 1}, + {0, 2}, + {0, 3}, + {1, 2}, + {2, 3}, + }, + start: 0, + expected: nil, + }, + { + edges: []edge{ + {0, 1}, + {0, 2}, + {0, 3}, + {1, 2}, + {3, 1}, + }, + start: 0, + expected: nil, + }, + { + edges: []edge{ + {0, 1}, + {0, 4}, + {1, 2}, + {1, 3}, + {4, 3}, + {4, 5}, + {5, 3}, + }, + start: 0, + expected: nil, + }, + { + edges: []edge{ + {1, 2}, + {1, 3}, + {1, 4}, + {1, 7}, + {4, 5}, + {5, 6}, + {6, 7}, + {7, 4}, + }, + start: 0, + expected: nil, + }, + + // Cycles. + { + edges: []edge{ + {0, 0}, + }, + start: 0, + expected: []Vertex{0, 0}, + }, + { + edges: []edge{ + {0, 1}, + {1, 0}, + }, + start: 0, + expected: []Vertex{0, 1, 0}, + }, + { + edges: []edge{ + {0, 1}, + {1, 0}, + }, + start: 1, + expected: []Vertex{1, 0, 1}, + }, + { + edges: []edge{ + {0, 1}, + {1, 2}, + {2, 3}, + {3, 4}, + {4, 1}, + }, + start: 0, + expected: []Vertex{0, 1, 2, 3, 4, 1}, + }, + { + edges: []edge{ + {0, 1}, + {1, 3}, + {3, 2}, + {2, 1}, + }, + start: 0, + expected: []Vertex{0, 1, 3, 2, 1}, + }, + { + edges: []edge{ + {0, 1}, + {0, 4}, + {1, 2}, + {1, 3}, + {3, 5}, + {4, 3}, + {4, 5}, + {5, 1}, + }, + start: 0, + expected: []Vertex{0, 1, 3, 5, 1}, + }, + { + edges: []edge{ + {1, 2}, + {1, 3}, + {1, 4}, + {1, 7}, + {4, 5}, + {5, 6}, + {6, 7}, + {7, 4}, + }, + start: 1, + expected: []Vertex{1, 4, 5, 6, 7, 4}, + }, + { + edges: []edge{ + {0, 1}, + {1, 2}, + {2, 3}, + {3, 3}, + }, + start: 1, + expected: []Vertex{1, 2, 3, 3}, + }, + } + for _, tc := range testCases { + var d Detector + for _, edge := range tc.edges { + d.AddEdge(edge.from, edge.to) + } + res, _ := d.FindCycleStartingAtVertex(tc.start) + if !reflect.DeepEqual(res, tc.expected) { + t.Errorf("expected %v, got %v", tc.expected, res) + } + } +} diff --git a/pkg/sql/opt/memo/logical_props_builder.go b/pkg/sql/opt/memo/logical_props_builder.go index c5186d61efab..499bc705c7eb 100644 --- a/pkg/sql/opt/memo/logical_props_builder.go +++ b/pkg/sql/opt/memo/logical_props_builder.go @@ -2578,6 +2578,11 @@ func (b *logicalPropsBuilder) buildMemoCycleTestRelProps( // from transforming mc into an empty Values expression. inputProps := mc.Input.Relational() rel.Cardinality = inputProps.Cardinality + // Make the output cols the same as the input cols to avoid assertion + // failures. + rel.OutputCols = inputProps.OutputCols + // Make the row count non-zero to avoid assertion failures. + rel.Stats.RowCount = 1 } // WithUses returns the WithUsesMap for the given expression. diff --git a/pkg/sql/opt/testutils/opttester/opt_tester.go b/pkg/sql/opt/testutils/opttester/opt_tester.go index e387c0279b83..24e83a894fe4 100644 --- a/pkg/sql/opt/testutils/opttester/opt_tester.go +++ b/pkg/sql/opt/testutils/opttester/opt_tester.go @@ -739,6 +739,9 @@ func (ot *OptTester) RunCommand(tb testing.TB, d *datadriven.TestData) string { case "expropt": e, err := ot.ExprOpt() if err != nil { + if len(errors.GetAllDetails(err)) > 0 { + return fmt.Sprintf("error: %s\ndetails:\n%s", err, errors.FlattenDetails(err)) + } return fmt.Sprintf("error: %s\n", err) } ot.postProcess(tb, d, e) diff --git a/pkg/sql/opt/xform/BUILD.bazel b/pkg/sql/opt/xform/BUILD.bazel index f51af7ab55bc..92b8420484b7 100644 --- a/pkg/sql/opt/xform/BUILD.bazel +++ b/pkg/sql/opt/xform/BUILD.bazel @@ -4,6 +4,7 @@ go_library( name = "xform", srcs = [ "coster.go", + "cycle_funcs.go", "explorer.go", "general_funcs.go", "groupby_funcs.go", @@ -30,6 +31,7 @@ go_library( "//pkg/sql/opt", "//pkg/sql/opt/cat", "//pkg/sql/opt/constraint", + "//pkg/sql/opt/cycle", "//pkg/sql/opt/distribution", "//pkg/sql/opt/idxconstraint", "//pkg/sql/opt/invertedexpr", diff --git a/pkg/sql/opt/xform/cycle_funcs.go b/pkg/sql/opt/xform/cycle_funcs.go new file mode 100644 index 000000000000..2b8e6138ece6 --- /dev/null +++ b/pkg/sql/opt/xform/cycle_funcs.go @@ -0,0 +1,18 @@ +// Copyright 2022 The Cockroach Authors. +// +// Use of this software is governed by the Business Source License +// included in the file licenses/BSL.txt. +// +// As of the Change Date specified in that file, in accordance with +// the Business Source License, use of this software will be governed +// by the Apache License, Version 2.0, included in the file +// licenses/APL.txt. + +package xform + +import "github.com/cockroachdb/cockroach/pkg/sql/opt/memo" + +// EmptySubqueryPrivate returns an empty SubqueryPrivate. +func (c *CustomFuncs) EmptySubqueryPrivate() *memo.SubqueryPrivate { + return &memo.SubqueryPrivate{} +} diff --git a/pkg/sql/opt/xform/memo_format.go b/pkg/sql/opt/xform/memo_format.go index 7a4988326738..8de29ab6e930 100644 --- a/pkg/sql/opt/xform/memo_format.go +++ b/pkg/sql/opt/xform/memo_format.go @@ -16,6 +16,7 @@ import ( "sort" "github.com/cockroachdb/cockroach/pkg/sql/opt" + "github.com/cockroachdb/cockroach/pkg/sql/opt/cycle" "github.com/cockroachdb/cockroach/pkg/sql/opt/memo" "github.com/cockroachdb/cockroach/pkg/sql/opt/props/physical" "github.com/cockroachdb/cockroach/pkg/util/treeprinter" @@ -29,6 +30,9 @@ const ( // FmtPretty performs a breadth-first topological sort on the memo groups, // and shows the root group at the top of the memo. FmtPretty FmtFlags = iota + // FmtCycle formats a memo like FmtPretty, but attempts to detect a cycle in + // the memo and include it in the formatted output. + FmtCycle ) type group struct { @@ -71,8 +75,8 @@ func (mf *memoFormatter) format() string { desc = "optimized" } tpRoot := tp.Childf( - "memo (%s, ~%dKB, required=%s)", - desc, mf.o.mem.MemoryEstimate()/1024, m.RootProps(), + "memo (%s, ~%dKB, required=%s%s)", + desc, mf.o.mem.MemoryEstimate()/1024, m.RootProps(), mf.formatCycles(), ) for i, e := range mf.groups { @@ -317,3 +321,57 @@ func firstExpr(expr opt.Expr) opt.Expr { } return expr } + +func (mf *memoFormatter) formatCycles() string { + m := mf.o.mem + var cd cycle.Detector + + if mf.flags != FmtCycle { + return "" + } + + addExpr := func(from cycle.Vertex, e opt.Expr) { + for i := 0; i < e.ChildCount(); i++ { + child := e.Child(i) + if opt.IsListItemOp(child) { + child = child.Child(0) + } + to := cycle.Vertex(mf.group(child)) + cd.AddEdge(from, to) + } + } + + addGroup := func(from cycle.Vertex, first memo.RelExpr) { + for member := first; member != nil; member = member.NextExpr() { + addExpr(from, member) + } + } + + // Add an edge to the cycle detector from a group to each of the groups it + // references. + for i, e := range mf.groups { + from := cycle.Vertex(i) + rel, ok := e.first.(memo.RelExpr) + if !ok { + addExpr(from, e.first) + continue + } + addGroup(from, rel) + } + + // Search for a cycle from the root expression group. + root := cycle.Vertex(mf.group(m.RootExpr())) + if cyclePath, ok := cd.FindCycleStartingAtVertex(root); ok { + mf.buf.Reset() + mf.buf.WriteString(", cycle=[") + for i, group := range cyclePath { + if i > 0 { + mf.buf.WriteString("->") + } + fmt.Fprintf(mf.buf, "G%d", group+1) + } + mf.buf.WriteString("]") + return mf.buf.String() + } + return "" +} diff --git a/pkg/sql/opt/xform/optimizer.go b/pkg/sql/opt/xform/optimizer.go index c45c4457a752..98b41b07ced6 100644 --- a/pkg/sql/opt/xform/optimizer.go +++ b/pkg/sql/opt/xform/optimizer.go @@ -468,7 +468,7 @@ func (o *Optimizer) optimizeGroup(grp memo.RelExpr, required *physical.Required) // times, there is likely a cycle in the memo. To avoid a stack // overflow, throw an internal error. The formatted memo is included as // an error detail to aid in debugging the cycle. - mf := makeMemoFormatter(o, FmtPretty) + mf := makeMemoFormatter(o, FmtCycle) panic(errors.WithDetail( errors.AssertionFailedf( "memo group optimization passes surpassed limit of %v; "+ diff --git a/pkg/sql/opt/xform/rules/cycle.opt b/pkg/sql/opt/xform/rules/cycle.opt index 29f2559b00ef..2196d90ae8c9 100644 --- a/pkg/sql/opt/xform/rules/cycle.opt +++ b/pkg/sql/opt/xform/rules/cycle.opt @@ -10,3 +10,20 @@ (MemoCycleTestRel $input:* $filters:*) => (MemoCycleTestRel (MemoCycleTestRel $input $filters) $filters) + +# MemoCycleTestRelRuleFilter is similar to MemoCycleTestRelRule, but creates +# a cycle in the memo through a filter expression. +[MemoCycleTestRelRuleFilter, Explore] +(MemoCycleTestRel $input:* $filters:*) +=> +(Select + $input + [ + (FiltersItem + (Exists + (MemoCycleTestRel $input $filters) + (EmptySubqueryPrivate) + ) + ) + ] +) diff --git a/pkg/sql/opt/xform/testdata/rules/cycle b/pkg/sql/opt/xform/testdata/rules/cycle index add3d9366e59..7d7aef7b0a43 100644 --- a/pkg/sql/opt/xform/testdata/rules/cycle +++ b/pkg/sql/opt/xform/testdata/rules/cycle @@ -6,13 +6,103 @@ CREATE TABLE ab ( ) ---- +exec-ddl +CREATE TABLE uv ( + u INT PRIMARY KEY, + v INT, + INDEX (v) +) +---- + +exec-ddl +CREATE TABLE xy ( + x INT PRIMARY KEY, + y INT, + INDEX (y) +) +---- + # This test ensures that a memo cycle does not cause a stack overflow. Instead, # the cycle is detected and the optimizer throws an internal error. The cycle is # created by the test-only exploration rule MemoCycleTestRelRule. -expropt +expropt disable=MemoCycleTestRelRuleFilter (MemoCycleTestRel (Scan [ (Table "ab") (Cols "a,b") ]) [ (Eq (Var "b") (Const 1 "int")) ] ) ---- error: memo group optimization passes surpassed limit of 100000; there may be a cycle in the memo +details: +memo (not optimized, ~3KB, required=[], cycle=[G1->G1]) + ├── G1: (memo-cycle-test-rel G2 G3) (memo-cycle-test-rel G1 G3) + ├── G2: (scan ab,cols=(1,2)) (scan ab@ab_b_idx,cols=(1,2)) + │ └── [] + │ ├── best: (scan ab,cols=(1,2)) + │ └── cost: 1064.42 + ├── G3: (filters G4) + ├── G4: (eq G5 G6) + ├── G5: (variable b) + └── G6: (const 1) + +expropt disable=MemoCycleTestRelRuleFilter +(LeftJoin + (Scan [ (Table "ab") (Cols "a,b") ]) + (LeftJoin + (MemoCycleTestRel + (Scan [ (Table "uv") (Cols "u,v") ]) + [ (Eq (Var "v") (Const 1 "int")) ] + ) + (Scan [ (Table "xy") (Cols "x,y") ]) + [ ] + [ ] + ) + [ ] + [ ] +) +---- +error: memo group optimization passes surpassed limit of 100000; there may be a cycle in the memo +details: +memo (not optimized, ~15KB, required=[], cycle=[G1->G3->G5->G5]) + ├── G1: (left-join G2 G3 G4) + ├── G2: (scan ab,cols=()) (scan ab@ab_b_idx,cols=()) + │ └── [] + │ ├── best: (scan ab,cols=()) + │ └── cost: 1044.22 + ├── G3: (left-join G5 G6 G4) + ├── G4: (filters) + ├── G5: (memo-cycle-test-rel G7 G8) (memo-cycle-test-rel G5 G8) + ├── G6: (scan xy,cols=()) + ├── G7: (scan uv,cols=(5,6)) (scan uv@uv_v_idx,cols=(5,6)) + │ └── [] + │ ├── best: (scan uv,cols=(5,6)) + │ └── cost: 1064.42 + ├── G8: (filters G9) + ├── G9: (eq G10 G11) + ├── G10: (variable v) + └── G11: (const 1) + +# Ensure that a cycle through a filter can be detected. +expropt disable=MemoCycleTestRelRule +(MemoCycleTestRel + (Scan [ (Table "uv") (Cols "u,v") ]) + [ (Eq (Var "v") (Const 1 "int")) ] +) +---- +error: memo group optimization passes surpassed limit of 100000; there may be a cycle in the memo +details: +memo (not optimized, ~4KB, required=[], cycle=[G1->G4->G6->G9->G1]) + ├── G1: (memo-cycle-test-rel G2 G3) (select G2 G4) + ├── G2: (scan uv,cols=(1,2)) (scan uv@uv_v_idx,cols=(1,2)) + │ ├── [limit hint: 1000.00] + │ │ ├── best: (scan uv,cols=(1,2)) + │ │ └── cost: 1054.02 + │ └── [] + │ ├── best: (scan uv,cols=(1,2)) + │ └── cost: 1064.42 + ├── G3: (filters G5) + ├── G4: (filters G6) + ├── G5: (eq G7 G8) + ├── G6: (exists G9 &{ 0 unknown true}) + ├── G7: (variable v) + ├── G8: (const 1) + └── G9: (limit G1 G8) diff --git a/pkg/sql/sem/cast/cast_map.go b/pkg/sql/sem/cast/cast_map.go index 3e7b1c90087d..5a5d58a5680a 100644 --- a/pkg/sql/sem/cast/cast_map.go +++ b/pkg/sql/sem/cast/cast_map.go @@ -493,7 +493,7 @@ var castMap = map[oid.Oid]map[oid.Oid]Cast{ }, oid.T_name: { oid.T_bpchar: {MaxContext: ContextAssignment, origin: ContextOriginPgCast, Volatility: volatility.Immutable}, - oid.T_text: {MaxContext: ContextImplicit, origin: ContextOriginPgCast, Volatility: volatility.Immutable}, + oid.T_text: {MaxContext: ContextImplicit, origin: ContextOriginPgCast, Volatility: volatility.LeakProof}, oid.T_varchar: {MaxContext: ContextAssignment, origin: ContextOriginPgCast, Volatility: volatility.Immutable}, // Automatic I/O conversions to string types. oid.T_char: {MaxContext: ContextAssignment, origin: ContextOriginAutomaticIOConversion, Volatility: volatility.Immutable}, @@ -720,7 +720,7 @@ var castMap = map[oid.Oid]map[oid.Oid]Cast{ oid.T_bpchar: {MaxContext: ContextImplicit, origin: ContextOriginPgCast, Volatility: volatility.Immutable}, oid.T_char: {MaxContext: ContextAssignment, origin: ContextOriginPgCast, Volatility: volatility.Immutable}, oidext.T_geometry: {MaxContext: ContextImplicit, origin: ContextOriginPgCast, Volatility: volatility.Immutable}, - oid.T_name: {MaxContext: ContextImplicit, origin: ContextOriginPgCast, Volatility: volatility.Immutable}, + oid.T_name: {MaxContext: ContextImplicit, origin: ContextOriginPgCast, Volatility: volatility.LeakProof}, oid.T_regclass: {MaxContext: ContextImplicit, origin: ContextOriginPgCast, Volatility: volatility.Stable}, // We include a TEXT->TEXT entry to mimic the VARCHAR->VARCHAR entry // that is included in the pg_cast table. Postgres doesn't include a diff --git a/pkg/sql/sem/cast/testdata/pg_cast_dump.csv b/pkg/sql/sem/cast/testdata/pg_cast_dump.csv index db4a657872ad..3f982c9e5949 100644 --- a/pkg/sql/sem/cast/testdata/pg_cast_dump.csv +++ b/pkg/sql/sem/cast/testdata/pg_cast_dump.csv @@ -7,7 +7,7 @@ castsource|casttarget|provolatile|proleakproof|castcontext|pg_version 18|25|i|f|i|13.5 18|1042|i|f|a|13.5 18|1043|i|f|a|13.5 -19|25|i|f|i|13.5 +19|25|i|t|i|14.2 19|1042|i|f|a|13.5 19|1043|i|f|a|13.5 20|21|i|f|a|13.5 @@ -57,7 +57,7 @@ castsource|casttarget|provolatile|proleakproof|castcontext|pg_version 23|1700|i|f|i|13.5 24|20|i|f|a|13.5 25|18|i|f|a|13.5 -25|19|i|f|i|13.5 +25|19|i|t|i|14.2 25|142|s|f|e|13.5 25|2205|s|f|i|13.5 26|20|i|f|a|13.5