Merge #70648 #71011 #71105 #71206 #71212

70648: sql: move a single remote flow to the gateway in some cases r=yuzefovich a=yuzefovich **sql: show distribution info based on actual physical plan in EXPLAIN** Previously, the `distribution` info in `EXPLAIN` output was printed based on the recommendation about the distribution of the plan. For example, if the plan is determined as "should be distributed", yet it only contains a single flow on the gateway, we would say that the plan has "full" distribution. This commit updates the code to print the distribution based on the actual physical plan (in the example above it would say "local"), regardless of the reason - whether it is the recommendation to plan locally or the data happened to be only on the gateway. I think it makes more sense this way since now DISTSQL diagram consisting of a single flow on the gateway more appropriately corresponds to "local" distribution. Additionally, this change is motivated by the follow-up commit which will introduce changes to the physical plan during the plan finalization, and we want to show the correct distribution in the EXPLAIN output for that too. Release note: None **sql: move a single remote flow to the gateway in some cases** This commit updates the physical planner to move a single remote flow onto the gateway in some cases, namely when - the flow contains a processor that might increase the cardinality of the data flowing through it or that performs the KV work - we estimate that the whole flow doesn't reduce the cardinality when compared against the number of rows read by the table readers. To be conservative, when there is no estimate, we don't apply this change to the physical plan. The justification behind this change is the fact that we're pinning the whole physical planning based on the placement of table readers. If the plan consists only of a single flow, and the flow is quite expensive, then with high enough frequency of such flows, the node having the lease for the ranges of the table readers becomes the hot spot (we have seen this in practice a few months ago). In such a scenario we might now choose to run the flow locally to distribute the load on the cluster better (assuming that the queries are issued against all nodes with equal frequency). The EXPLAIN output will correctly say "distribution: local" if the flow is moved to the gateway. Informs: #59014. Release note (bug fix): Some query patterns that previously could cause a single node to become a hot spot have been fixed so that the load is evenly distributed across the whole cluster. 71011: cli: add --max-sql-memory flag to `cockroach mt start-sql` r=knz a=jaylim-crl Previously the `--max-sql-memory` flag wasn't available to the multi-tenancy start-sql command, even though the feature was already there for other `start`-related commands. Release note (cli change): `cockroach mt start-sql` will now support the `--max-sql-memory` flag to configure maximum SQL memory capacity to store temporary data. Release justification: The upcoming Serverless MVP release plans to use a different value for `--max-sql-memory` instead of the default value of 25% of container memory. This commit is only a flag change that will only be used in multi-tenant scenarios, and should have no impact on dedicated customers. 71105: sql: do not collect statistics on virtual columns r=mgartner a=mgartner PR #68312 intended to update the behavior of `CREATE STATISTICS` to prevent statistics collection on virtual computed columns. However, it failed to account for multi-column statistics and for `CREATE STATISTICS` statements that explicitly reference virtual columns. This commit accounts for these two cases. This prevents internal errors from occuring when the system tries to collect statistics on `NOT NULL` virtual columns. Virtual column values are not included in the primary index. So when the statistics job reads the primary index to sample the virtual column, it assumes the value is null, which violates the column's `NOT NULL` constraint. This violation causes an error. Fixes #71080 Release note (bug fix): A bug has been fixed which caused internal errors when collecting statistics on tables with virtual computed columns. 71206: cmd/roachtest: add testLifecycle to hibernateIgnoreList r=ZhouXing19 a=ZhouXing19 Resolves #70482 Add `org.hibernate.userguide.pc.WhereTest.testLifecycle` to `hibernateIgnoreList21_1`, `hibernateIgnoreList21_2`, and `hibernateIgnoreList22_1`. Release note: None Release justification: None 71212: opt: use fragment for optstepweb with long URLs r=mgartner a=mgartner The `optstepsweb` test command can produce very long URLs. If the URL is longer than ~8201 characters, the GitHub Pages server hosting `optsteps.html` responds with a 414 status code. To make these long URLs work, this commit uses a fragment rather than a query parameter in the URL if the compressed data that represents the optimizer steps is over 8100 characters (the 100 characters of buffer is meant to account for the protocol, domain, and path). A fragment is not sent to the server by the browser, so Github Pages responds successfully. A downside is that when anchor links are clicked to navigate the page, the original fragment is overridden and the URL is invalid. For this reason, we still use a query parameter when the compressed data is small enough. Related to #68697. Release note: None Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: Jay <[email protected]> Co-authored-by: Marcus Gartner <[email protected]> Co-authored-by: Jane Xing <[email protected]>
cockroachdb · Oct 6, 2021 · 76ce001 · 76ce001
6 parents d36bb1a + 7607dad + e93430e + d9eed2b + ed42f47 + 7187a27
commit 76ce001
Show file tree

Hide file tree

Showing 50 changed files with 674 additions and 203 deletions.
diff --git a/docs/generated/settings/settings-for-tenants.txt b/docs/generated/settings/settings-for-tenants.txt
@@ -81,7 +81,7 @@ sql.defaults.datestyle	enumeration	iso, mdy	default value for DateStyle session
 sql.defaults.datestyle.enabled	boolean	false	default value for datestyle_enabled session setting
 sql.defaults.default_int_size	integer	8	the size, in bytes, of an INT type
 sql.defaults.disallow_full_table_scans.enabled	boolean	false	setting to true rejects queries that have planned a full table scan
-sql.defaults.distsql	enumeration	auto	default distributed SQL execution mode [off = 0, auto = 1, on = 2]
+sql.defaults.distsql	enumeration	auto	default distributed SQL execution mode [off = 0, auto = 1, on = 2, always = 3]
 sql.defaults.experimental_alter_column_type.enabled	boolean	false	default value for experimental_alter_column_type session setting; enables the use of ALTER COLUMN TYPE for general conversions
 sql.defaults.experimental_auto_rehoming.enabled	boolean	false	default value for experimental_enable_auto_rehoming; allows for rows in REGIONAL BY ROW tables to be auto-rehomed on UPDATE
 sql.defaults.experimental_distsql_planning	enumeration	off	default experimental_distsql_planning mode; enables experimental opt-driven DistSQL planning [off = 0, on = 1]

diff --git a/docs/generated/settings/settings.html b/docs/generated/settings/settings.html
@@ -86,7 +86,7 @@
 <tr><td><code>sql.defaults.datestyle.enabled</code></td><td>boolean</td><td><code>false</code></td><td>default value for datestyle_enabled session setting</td></tr>
 <tr><td><code>sql.defaults.default_int_size</code></td><td>integer</td><td><code>8</code></td><td>the size, in bytes, of an INT type</td></tr>
 <tr><td><code>sql.defaults.disallow_full_table_scans.enabled</code></td><td>boolean</td><td><code>false</code></td><td>setting to true rejects queries that have planned a full table scan</td></tr>
-<tr><td><code>sql.defaults.distsql</code></td><td>enumeration</td><td><code>auto</code></td><td>default distributed SQL execution mode [off = 0, auto = 1, on = 2]</td></tr>
+<tr><td><code>sql.defaults.distsql</code></td><td>enumeration</td><td><code>auto</code></td><td>default distributed SQL execution mode [off = 0, auto = 1, on = 2, always = 3]</td></tr>
 <tr><td><code>sql.defaults.experimental_alter_column_type.enabled</code></td><td>boolean</td><td><code>false</code></td><td>default value for experimental_alter_column_type session setting; enables the use of ALTER COLUMN TYPE for general conversions</td></tr>
 <tr><td><code>sql.defaults.experimental_auto_rehoming.enabled</code></td><td>boolean</td><td><code>false</code></td><td>default value for experimental_enable_auto_rehoming; allows for rows in REGIONAL BY ROW tables to be auto-rehomed on UPDATE</td></tr>
 <tr><td><code>sql.defaults.experimental_distsql_planning</code></td><td>enumeration</td><td><code>off</code></td><td>default experimental_distsql_planning mode; enables experimental opt-driven DistSQL planning [off = 0, on = 1]</td></tr>

diff --git a/pkg/ccl/logictestccl/testdata/logic_test/zone b/pkg/ccl/logictestccl/testdata/logic_test/zone
@@ -27,7 +27,7 @@ ALTER INDEX t@secondary CONFIGURE ZONE USING constraints='[+region=test,+dc=dc1]
 query T retry
 EXPLAIN SELECT * FROM t WHERE k=10
 ----
-distribution: full
+distribution: local
 vectorized: true
 ·
 • scan
@@ -74,7 +74,7 @@ ALTER INDEX t@tertiary CONFIGURE ZONE USING constraints='[+region=test,+dc=dc1]'
 query T retry
 EXPLAIN SELECT * FROM t WHERE k=10
 ----
-distribution: full
+distribution: local
 vectorized: true
 ·
 • scan
@@ -121,7 +121,7 @@ ALTER INDEX t@tertiary CONFIGURE ZONE USING constraints='[+region=test,+dc=dc3]'
 query T retry
 EXPLAIN SELECT * FROM t WHERE k=10
 ----
-distribution: full
+distribution: local
 vectorized: true
 ·
 • scan
@@ -143,7 +143,7 @@ ALTER INDEX t@secondary CONFIGURE ZONE USING constraints='[+region=test,+dc=dc2]
 query T retry
 EXPLAIN SELECT * FROM t WHERE k=10
 ----
-distribution: full
+distribution: local
 vectorized: true
 ·
 • scan
@@ -187,7 +187,7 @@ PREPARE p AS SELECT * FROM [EXPLAIN SELECT k, v FROM t WHERE k=10]
 query T retry
 EXECUTE p
 ----
-distribution: full
+distribution: local
 vectorized: true
 ·
 • scan
@@ -204,7 +204,7 @@ ALTER INDEX t@secondary CONFIGURE ZONE USING constraints='[+region=test,+dc=dc1]
 query T retry
 EXECUTE p
 ----
-distribution: full
+distribution: local
 vectorized: true
 ·
 • scan
@@ -231,7 +231,7 @@ USING constraints='[+region=test]', lease_preferences='[[+region=test,+dc=dc1]]'
 query T retry
 EXPLAIN SELECT * FROM t WHERE k=10
 ----
-distribution: full
+distribution: local
 vectorized: true
 ·
 • scan
@@ -255,7 +255,7 @@ USING constraints='[+region=test]', lease_preferences='[[+region=test,+dc=dc1]]'
 query T retry
 EXPLAIN SELECT * FROM t WHERE k=10
 ----
-distribution: full
+distribution: local
 vectorized: true
 ·
 • scan
@@ -312,7 +312,7 @@ USING constraints='[+region=test]', lease_preferences='[[+region=test,+dc=dc1]]'
 query T retry
 EXPLAIN SELECT * FROM t WHERE k=10
 ----
-distribution: full
+distribution: local
 vectorized: true
 ·
 • scan
@@ -359,7 +359,7 @@ PREPARE p AS SELECT * FROM [EXPLAIN SELECT k, v FROM t WHERE k=10]
 query T retry
 EXECUTE p
 ----
-distribution: full
+distribution: local
 vectorized: true
 ·
 • scan
@@ -374,7 +374,7 @@ USING constraints='[+region=test]', lease_preferences='[[+region=test,+dc=dc2]]'
 query T retry
 EXECUTE p
 ----
-distribution: full
+distribution: local
 vectorized: true
 ·
 • scan
@@ -406,7 +406,7 @@ ALTER INDEX t36642@secondary CONFIGURE ZONE USING constraints='[+region=test]',
 query T retry
 EXPLAIN SELECT * FROM t36642 WHERE k=10
 ----
-distribution: full
+distribution: local
 vectorized: true
 ·
 • scan
@@ -423,7 +423,7 @@ ALTER INDEX t36642@secondary CONFIGURE ZONE USING constraints='[+region=test]',
 query T retry
 EXPLAIN SELECT * FROM t36642 WHERE k=10
 ----
-distribution: full
+distribution: local
 vectorized: true
 ·
 • scan
@@ -481,7 +481,7 @@ CONFIGURE ZONE USING constraints='[+region=test]', lease_preferences='[[+dc=dc1]
 query T retry
 EXPLAIN SELECT * FROM t36644 WHERE k=10
 ----
-distribution: full
+distribution: local
 vectorized: true
 ·
 • scan
@@ -499,7 +499,7 @@ CONFIGURE ZONE USING constraints='[+region=test]', lease_preferences='[[+dc=dc1]
 query T retry
 EXPLAIN SELECT * FROM t36644 WHERE k=10
 ----
-distribution: full
+distribution: local
 vectorized: true
 ·
 • scan

diff --git a/pkg/cli/clisqlshell/sql_test.go b/pkg/cli/clisqlshell/sql_test.go
@@ -216,7 +216,7 @@ func Example_misc_table() {
 	// sql --format=table -e explain select s, 'foo' from t.t
 	//            info
 	// --------------------------
-	//   distribution: full
+	//   distribution: local
 	//   vectorized: true
 	//
 	//   • render

diff --git a/pkg/cli/flags.go b/pkg/cli/flags.go
@@ -933,7 +933,7 @@ func init() {
 		}
 	}
 
-	// Multi-tenancy commands.
+	// Multi-tenancy start-sql command flags.
 	{
 		f := mtStartSQLCmd.Flags()
 		varFlag(f, &tenantIDWrapper{&serverCfg.SQLConfig.TenantID}, cliflags.TenantID)
@@ -954,10 +954,13 @@ func init() {
 
 		stringSliceFlag(f, &serverCfg.SQLConfig.TenantKVAddrs, cliflags.KVAddrs)
 
+		// Enable/disable various external storage endpoints.
 		boolFlag(f, &serverCfg.ExternalIODirConfig.DisableHTTP, cliflags.ExternalIODisableHTTP)
 		boolFlag(f, &serverCfg.ExternalIODirConfig.DisableOutbound, cliflags.ExternalIODisabled)
 		boolFlag(f, &serverCfg.ExternalIODirConfig.DisableImplicitCredentials, cliflags.ExternalIODisableImplicitCredentials)
 
+		// Engine flags.
+		varFlag(f, sqlSizeValue, cliflags.SQLMem)
 		// N.B. diskTempStorageSizeValue.ResolvePercentage() will be called after
 		// the stores flag has been parsed and the storage device that a percentage
 		// refers to becomes known.

diff --git a/pkg/cmd/roachtest/tests/hibernate_blocklist.go b/pkg/cmd/roachtest/tests/hibernate_blocklist.go
@@ -219,4 +219,6 @@ var hibernateIgnoreList22_1 = hibernateIgnoreList21_2
 
 var hibernateIgnoreList21_2 = hibernateIgnoreList21_1
 
-var hibernateIgnoreList21_1 = blocklist{}
+var hibernateIgnoreList21_1 = blocklist{
+	"org.hibernate.userguide.pc.WhereTest.testLifecycle": "unknown",
+}
diff --git a/pkg/sql/create_stats.go b/pkg/sql/create_stats.go
@@ -247,6 +247,13 @@ func (n *createStatsNode) makeJobRecord(ctx context.Context) (*jobs.Record, erro
 
 		columnIDs := make([]descpb.ColumnID, len(columns))
 		for i := range columns {
+			if columns[i].IsVirtual() {
+				return nil, pgerror.Newf(
+					pgcode.InvalidColumnReference,
+					"cannot create statistics on virtual column %q",
+					columns[i].ColName(),
+				)
+			}
 			columnIDs[i] = columns[i].GetID()
 		}
 		col, err := tableDesc.FindColumnWithID(columnIDs[0])
@@ -441,9 +448,16 @@ func createStatsDefaultColumns(
 				continue
 			}
 
-			colIDs := make([]descpb.ColumnID, j+1)
+			colIDs := make([]descpb.ColumnID, 0, j+1)
 			for k := 0; k <= j; k++ {
-				colIDs[k] = idx.GetKeyColumnID(k)
+				col, err := desc.FindColumnWithID(idx.GetKeyColumnID(k))
+				if err != nil {
+					return nil, err
+				}
+				if col.IsVirtual() {
+					continue
+				}
+				colIDs = append(colIDs, col.GetID())
 			}
 
 			// Check for existing stats and remember the requested stats.