Merge #83213 #85231 #85695 #85731 #85809

83213: kvserver: make MVCC GC less disruptive to foreground traffic r=aayushshah15 a=aayushshah15 This commit changes GC requests to no longer declare exclusive latches at their BatchRequest's timestamp. This was already incorrect as explained in #55293. >The first use is broken because we acquire write latches at the batch header's timestamp, which is set to time.Now(), so we're only serializing with reads in the future and all other writes [1]. So we're disruptive to everyone except who we want to serialize with – reads in the past! This commit makes GC requests only declare a non-mvcc exclusive latch over the `RangeGCThresholdKey`. This is correct because: ``` // 1. We define "correctness" to be the property that a reader reading at / // around the GC threshold will either see the correct results or receive an // error. // 2. Readers perform their command evaluation over a stable snapshot of the // storage engine. This means that the reader will not see the effects of a // subsequent GC run as long as it created a Pebble iterator before the GC // request. // 3. A reader checks the in-memory GC threshold of a Replica after it has // created this snapshot (i.e. after a Pebble iterator has been created). // 4. If the in-memory GC threshold is above the timestamp of the read, the // reader receives an error. Otherwise, the reader is guaranteed to see a // state of the storage engine that hasn't been affected by the GC request [5]. // 5. GC requests bump the in-memory GC threshold of a Replica as a pre-apply // side effect. This means that if a reader checks the in-memory GC threshold // after it has created a Pebble iterator, it is impossible for the iterator // to point to a storage engine state that has been affected by the GC // request. ``` As a result, GC requests should now be much less disruptive to foreground traffic since they're no longer redundantly declaring exclusive latches over global keys. Resolves #55293 Release note(performance improvement): MVCC garbage collection should now be much less disruptive to foreground traffic than before. 85231: backupccl: add RESTORE with schema_only r=dt a=msbutler Fixes #83470 Release note (sql change): This pr adds the schema_only flag to RESTORE, allowing a user to run a normal RESTORE, without restoring any user table data. This can be used to quickly validate that a given backup is restorable. A schema_only restore runtime is O(# of descriptors) which is a fraction of a regular restore's runtime O(# of table rows). Note that during a cluster level, schema_only restore, the system tables are read from S3 and written to disk, as this provides important validation coverage without much runtime cost (system tables should not be large). After running a successful schema_only RESTORE, the user can revert the cluster to its pre-restore state by simply dropping the descriptors the schema_only restore added (e.g. if the user restored a database, they can drop the database after the restore completes). Note that in the cluster level case, the restored system data cannot be reverted, this shouldn't matter, as the cluster was empty before hand. For the Backup validation use case, RESTORE with schema_only provides near total validation coverage. In other words, if a user's schema_only RESTORE works, they can be quite confident that a real RESTORE will work. There's one notable place schema_only RESTORE lacks coverage: It doesn't read (or write) from any of the SSTs that store backed up user table data. To ensure a Backup's SSTs are where the RESTORE cmd would expect them to be, a user should run SHOW BACKUP ... with check_files. Further, in an upcoming patch, another flag for RESTORE validation will be introduced -- the verify_backup_table_data flag -- which extends schema_only functionality to read the table data from S3 and conduct checksums on it. Like with the schema_only flag, no table data will be ingested into the cluster. 85695: colexec: add support for ILIKE and NOT ILIKE r=yuzefovich a=yuzefovich **colexec: clean up NOT LIKE operator generation** This commit cleans up the way we generate operators for NOT LIKE. Previously, they would get their own copy which was exactly the same as for LIKE with a difference in a single line, and now the same underlying operator will handle both LIKE and NOT LIKE - the result of comparison just needs to be negated. The performance hit of this extra boolean comparison is negligible yet we can remove some of the duplicated generated code. ``` name old time/op new time/op delta LikeOps/selPrefixBytesBytesConstOp-24 17.8µs ± 1% 16.9µs ± 0% -4.93% (p=0.000 n=10+10) LikeOps/selSuffixBytesBytesConstOp-24 18.5µs ± 0% 18.7µs ± 0% +1.37% (p=0.000 n=10+10) LikeOps/selContainsBytesBytesConstOp-24 27.8µs ± 0% 28.0µs ± 0% +1.02% (p=0.000 n=9+10) LikeOps/selRegexpBytesBytesConstOp-24 479µs ± 1% 484µs ± 0% +1.10% (p=0.000 n=10+10) LikeOps/selSkeletonBytesBytesConstOp-24 39.9µs ± 0% 40.3µs ± 0% +0.85% (p=0.000 n=10+10) LikeOps/selRegexpSkeleton-24 871µs ± 2% 871µs ± 0% ~ (p=1.000 n=10+10) ``` Release note: None **colexec: add support for ILIKE and NOT ILIKE** This commit adds the native vectorized support for ILIKE and NOT ILIKE comparisons. The idea is simple - convert both the argument and the pattern to capital letters. This required minor changes to the templates to add a "prelude" step of that conversion as well as conversion of the pattern to the upper case during planning. Initially, I generated separate operators for case-insensitive cases, but the benchmarks shown that the performance impact of a single conditional inside of the `for` loop is barely noticeable given that the branch prediction will always be right, so I refactored the existing operators to support case insensitivity. ``` name old time/op new time/op delta LikeOps/selPrefixBytesBytesConstOp-24 16.8µs ± 0% 17.7µs ± 0% +5.30% (p=0.000 n=10+10) LikeOps/selSuffixBytesBytesConstOp-24 18.7µs ± 0% 19.2µs ± 0% +2.99% (p=0.000 n=10+10) LikeOps/selContainsBytesBytesConstOp-24 28.0µs ± 0% 27.8µs ± 0% -0.73% (p=0.000 n=10+10) LikeOps/selRegexpBytesBytesConstOp-24 479µs ± 0% 480µs ± 0% +0.33% (p=0.008 n=9+10) LikeOps/selSkeletonBytesBytesConstOp-24 40.2µs ± 0% 41.4µs ± 0% +3.20% (p=0.000 n=9+10) LikeOps/selRegexpSkeleton-24 860µs ± 0% 857µs ± 0% -0.36% (p=0.023 n=10+10) ``` Addresses: #49781. Release note (performance improvement): ILIKE and NOT ILIKE filters can now be evaluated more efficiently in some cases. 85731: rowexec: allow ordered joinReader to stream matches to the first row r=DrewKimball a=DrewKimball Currently the `joinReaderOrderingStrategy` implementation buffers all looked up rows before matching them with input rows and emitting them. This is necessary because the looked up rows may not be received in input order (which must be maintained). However, rows that match the first input row can be emitted immediately. In the case when there are many rows that match the first input row, this can decrease overhead of the buffer. Additionally, this change can allow a limit to be satisfied earlier, which can significantly decrease latency. This is especially advantageous in the case when there is only one input row, since all lookups can then be rendered and returned in streaming fashion. Release note (performance improvement): The execution engine can now short-circuit execution of lookup joins in more cases, which can decrease latency for queries with limits. 85809: ui: fix time window selection with mouse on Metrics charts r=koorosh a=koorosh This patch fixes an issue that prevents proper time selection with mouse on Metrics charts. The root cause of it is updated time scale object didn't include correct value of `windowSize` that's used to calculate `start` position of time range. Release note (ui change): fix issue with incorrect start time position of selected time range on Metrics page. Resolves: #84001 Co-authored-by: Aayush Shah <[email protected]> Co-authored-by: Michael Butler <[email protected]> Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: DrewKimball <[email protected]> Co-authored-by: Andrii Vorobiov <[email protected]>
cockroachdb · Aug 9, 2022 · 7c18668 · 7c18668
6 parents 6fe360e + 3e0270a + b811739 + 02b32a6 + d413fe7 + d46ba25
commit 7c18668
Show file tree

Hide file tree

Showing 40 changed files with 1,646 additions and 1,506 deletions.
diff --git a/docs/generated/sql/bnf/stmt_block.bnf b/docs/generated/sql/bnf/stmt_block.bnf
@@ -1266,6 +1266,7 @@ unreserved_keyword ::=
 	| 'RUNNING'
 	| 'SCHEDULE'
 	| 'SCHEDULES'
+	| 'SCHEMA_ONLY'
 	| 'SCROLL'
 	| 'SETTING'
 	| 'SETTINGS'
@@ -2492,6 +2493,7 @@ restore_options ::=
 	| 'NEW_DB_NAME' '=' string_or_placeholder
 	| 'INCREMENTAL_LOCATION' '=' string_or_placeholder_opt_list
 	| 'TENANT' '=' string_or_placeholder
+	| 'SCHEMA_ONLY'
 
 scrub_option_list ::=
 	( scrub_option ) ( ( ',' scrub_option ) )*

diff --git a/pkg/ccl/backupccl/backup_telemetry.go b/pkg/ccl/backupccl/backup_telemetry.go
@@ -65,6 +65,7 @@ const (
 	telemetryOptionSkipMissingSequenceOwners = "skip_missing_sequence_owners"
 	telemetryOptionSkipMissingViews          = "skip_missing_views"
 	telemetryOptionSkipLocalitiesCheck       = "skip_localities_check"
+	telemetryOptionSchemaOnly                = "schema_only"
 )
 
 // logBackupTelemetry publishes an eventpb.RecoveryEvent about a manually
@@ -397,6 +398,9 @@ func logRestoreTelemetry(
 	if opts.Detached {
 		options = append(options, telemetryOptionDetached)
 	}
+	if opts.SchemaOnly {
+		options = append(options, telemetryOptionSchemaOnly)
+	}
 	sort.Strings(options)
 
 	event := &eventpb.RecoveryEvent{

diff --git a/pkg/ccl/backupccl/backup_test.go b/pkg/ccl/backupccl/backup_test.go
@@ -2674,99 +2674,109 @@ func TestBackupRestoreDuringUserDefinedTypeChange(t *testing.T) {
 	}
 
 	for _, tc := range testCases {
-		t.Run(tc.name, func(t *testing.T) {
-			// Protects numTypeChangesStarted and numTypeChangesFinished.
-			var mu syncutil.Mutex
-			numTypeChangesStarted := 0
-			numTypeChangesFinished := 0
-			typeChangesStarted := make(chan struct{})
-			waitForBackup := make(chan struct{})
-			typeChangesFinished := make(chan struct{})
-			_, sqlDB, _, cleanupFn := backupRestoreTestSetupWithParams(t, singleNode, 0, InitManualReplication, base.TestClusterArgs{
-				ServerArgs: base.TestServerArgs{
-					Knobs: base.TestingKnobs{
-						SQLTypeSchemaChanger: &sql.TypeSchemaChangerTestingKnobs{
-							RunBeforeEnumMemberPromotion: func(context.Context) error {
-								mu.Lock()
-								if numTypeChangesStarted < len(tc.queries) {
-									numTypeChangesStarted++
-									if numTypeChangesStarted == len(tc.queries) {
-										close(typeChangesStarted)
+		for _, isSchemaOnly := range []bool{true, false} {
+			suffix := ""
+			if isSchemaOnly {
+				suffix = "-schema-only"
+			}
+			t.Run(tc.name+suffix, func(t *testing.T) {
+				// Protects numTypeChangesStarted and numTypeChangesFinished.
+				var mu syncutil.Mutex
+				numTypeChangesStarted := 0
+				numTypeChangesFinished := 0
+				typeChangesStarted := make(chan struct{})
+				waitForBackup := make(chan struct{})
+				typeChangesFinished := make(chan struct{})
+				_, sqlDB, _, cleanupFn := backupRestoreTestSetupWithParams(t, singleNode, 0, InitManualReplication, base.TestClusterArgs{
+					ServerArgs: base.TestServerArgs{
+						Knobs: base.TestingKnobs{
+							SQLTypeSchemaChanger: &sql.TypeSchemaChangerTestingKnobs{
+								RunBeforeEnumMemberPromotion: func(context.Context) error {
+									mu.Lock()
+									if numTypeChangesStarted < len(tc.queries) {
+										numTypeChangesStarted++
+										if numTypeChangesStarted == len(tc.queries) {
+											close(typeChangesStarted)
+										}
+										mu.Unlock()
+										<-waitForBackup
+									} else {
+										mu.Unlock()
 									}
-									mu.Unlock()
-									<-waitForBackup
-								} else {
-									mu.Unlock()
-								}
-								return nil
+									return nil
+								},
 							},
 						},
 					},
-				},
-			})
-			defer cleanupFn()
+				})
+				defer cleanupFn()
 
-			// Create a database with a type.
-			sqlDB.Exec(t, `
+				// Create a database with a type.
+				sqlDB.Exec(t, `
 CREATE DATABASE d;
 CREATE TYPE d.greeting AS ENUM ('hello', 'howdy', 'hi');
 `)
 
-			// Start ALTER TYPE statement(s) that will block.
-			for _, query := range tc.queries {
-				go func(query string, totalQueries int) {
-					// Note we don't use sqlDB.Exec here because we can't Fatal from within a goroutine.
-					if _, err := sqlDB.DB.ExecContext(context.Background(), query); err != nil {
-						t.Error(err)
-					}
-					mu.Lock()
-					numTypeChangesFinished++
-					if numTypeChangesFinished == totalQueries {
-						close(typeChangesFinished)
-					}
-					mu.Unlock()
-				}(query, len(tc.queries))
-			}
-
-			// Wait on the type changes to start.
-			<-typeChangesStarted
+				// Start ALTER TYPE statement(s) that will block.
+				for _, query := range tc.queries {
+					go func(query string, totalQueries int) {
+						// Note we don't use sqlDB.Exec here because we can't Fatal from within a goroutine.
+						if _, err := sqlDB.DB.ExecContext(context.Background(), query); err != nil {
+							t.Error(err)
+						}
+						mu.Lock()
+						numTypeChangesFinished++
+						if numTypeChangesFinished == totalQueries {
+							close(typeChangesFinished)
+						}
+						mu.Unlock()
+					}(query, len(tc.queries))
+				}
 
-			// Now create a backup while the type change job is blocked so that
-			// greeting is backed up with some enum members in READ_ONLY state.
-			sqlDB.Exec(t, `BACKUP DATABASE d TO 'nodelocal://0/test/'`)
+				// Wait on the type changes to start.
+				<-typeChangesStarted
 
-			// Let the type change finish.
-			close(waitForBackup)
-			<-typeChangesFinished
+				// Now create a backup while the type change job is blocked so that
+				// greeting is backed up with some enum members in READ_ONLY state.
+				sqlDB.Exec(t, `BACKUP DATABASE d TO 'nodelocal://0/test/'`)
 
-			// Now drop the database and restore.
-			sqlDB.Exec(t, `DROP DATABASE d`)
-			sqlDB.Exec(t, `RESTORE DATABASE d FROM 'nodelocal://0/test/'`)
+				// Let the type change finish.
+				close(waitForBackup)
+				<-typeChangesFinished
 
-			// The type change job should be scheduled and finish. Note that we can't use
-			// sqlDB.CheckQueryResultsRetry as it Fatal's upon an error. The case below
-			// will error until the job completes.
-			for i, query := range tc.succeedAfter {
-				testutils.SucceedsSoon(t, func() error {
-					_, err := sqlDB.DB.ExecContext(context.Background(), query)
-					return err
-				})
-				sqlDB.CheckQueryResults(t, query, [][]string{{tc.expectedSuccess[i]}})
-			}
+				// Now drop the database and restore.
+				sqlDB.Exec(t, `DROP DATABASE d`)
+				restoreQuery := `RESTORE DATABASE d FROM 'nodelocal://0/test/'`
+				if isSchemaOnly {
+					restoreQuery = restoreQuery + " with schema_only"
+				}
+				sqlDB.Exec(t, restoreQuery)
+
+				// The type change job should be scheduled and finish. Note that we can't use
+				// sqlDB.CheckQueryResultsRetry as it Fatal's upon an error. The case below
+				// will error until the job completes.
+				for i, query := range tc.succeedAfter {
+					testutils.SucceedsSoon(t, func() error {
+						_, err := sqlDB.DB.ExecContext(context.Background(), query)
+						return err
+					})
+					sqlDB.CheckQueryResults(t, query, [][]string{{tc.expectedSuccess[i]}})
+				}
 
-			for i, query := range tc.errorAfter {
-				testutils.SucceedsSoon(t, func() error {
-					_, err := sqlDB.DB.ExecContext(context.Background(), query)
-					if err == nil {
-						return errors.New("expected error, found none")
-					}
-					if !testutils.IsError(err, tc.expectedError[i]) {
-						return errors.Newf("expected error %q, found %v", tc.expectedError[i], pgerror.FullError(err))
-					}
-					return nil
-				})
-			}
-		})
+				for i, query := range tc.errorAfter {
+					testutils.SucceedsSoon(t, func() error {
+						_, err := sqlDB.DB.ExecContext(context.Background(), query)
+						if err == nil {
+							return errors.New("expected error, found none")
+						}
+						if !testutils.IsError(err, tc.expectedError[i]) {
+							return errors.Newf("expected error %q, found %v", tc.expectedError[i], pgerror.FullError(err))
+						}
+						return nil
+					})
+				}
+			})
+		}
 	}
 }
 

diff --git a/pkg/ccl/backupccl/backuprand/backup_rand_test.go b/pkg/ccl/backupccl/backuprand/backup_rand_test.go
@@ -32,7 +32,8 @@ import (
 // TestBackupRestoreRandomDataRoundtrips conducts backup/restore roundtrips on
 // randomly generated tables and verifies their data and schema are preserved.
 // It tests that full database backup as well as all subsets of per-table backup
-// roundtrip properly.
+// roundtrip properly. 50% of the time, the test runs the restore with the
+// schema_only parameter, which does not restore any rows from user tables.
 func TestBackupRestoreRandomDataRoundtrips(t *testing.T) {
 	defer leaktest.AfterTest(t)()
 	defer log.Scope(t).Close(t)
@@ -66,6 +67,11 @@ func TestBackupRestoreRandomDataRoundtrips(t *testing.T) {
 	}
 	numInserts := 20
 
+	runSchemaOnlyExtension := ""
+	if rng.Intn(10)%2 == 0 {
+		runSchemaOnlyExtension = ", schema_only"
+	}
+
 	tables := sqlDB.Query(t, `SELECT name FROM crdb_internal.tables WHERE 
 database_name = 'rand' AND schema_name = 'public'`)
 	var tableNames []string
@@ -87,7 +93,9 @@ database_name = 'rand' AND schema_name = 'public'`)
 	expectedData := make(map[string][][]string)
 	for _, tableName := range tableNames {
 		expectedCreateTableStmt[tableName] = sqlDB.QueryStr(t, fmt.Sprintf(`SELECT create_statement FROM [SHOW CREATE TABLE %s]`, tableName))[0][0]
-		expectedData[tableName] = sqlDB.QueryStr(t, fmt.Sprintf(`SELECT * FROM %s`, tableName))
+		if runSchemaOnlyExtension == "" {
+			expectedData[tableName] = sqlDB.QueryStr(t, fmt.Sprintf(`SELECT * FROM %s`, tableName))
+		}
 	}
 
 	// Now that we've created our random tables, backup and restore the whole DB
@@ -97,12 +105,12 @@ database_name = 'rand' AND schema_name = 'public'`)
 	tablesBackup := localFoo + "alltables"
 	dbBackups := []string{dbBackup, tablesBackup}
 	if err := backuputils.VerifyBackupRestoreStatementResult(
-		t, sqlDB, "BACKUP DATABASE rand TO $1", dbBackup,
+		t, sqlDB, "BACKUP DATABASE rand INTO $1", dbBackup,
 	); err != nil {
 		t.Fatal(err)
 	}
 	if err := backuputils.VerifyBackupRestoreStatementResult(
-		t, sqlDB, "BACKUP TABLE rand.* TO $1", tablesBackup,
+		t, sqlDB, "BACKUP TABLE rand.* INTO $1", tablesBackup,
 	); err != nil {
 		t.Fatal(err)
 	}
@@ -118,7 +126,12 @@ database_name = 'rand' AND schema_name = 'public'`)
 				fmt.Sprintf(`SELECT create_statement FROM [SHOW CREATE TABLE %s]`, restoreTable))[0][0]
 			assert.Equal(t, expectedCreateTableStmt[tableName], createStmt,
 				"SHOW CREATE %s not equal after RESTORE", tableName)
-			sqlDB.CheckQueryResults(t, fmt.Sprintf(`SELECT * FROM %s`, tableName), expectedData[tableName])
+			if runSchemaOnlyExtension == "" {
+				sqlDB.CheckQueryResults(t, fmt.Sprintf(`SELECT * FROM %s`, restoreTable), expectedData[tableName])
+			} else {
+				sqlDB.CheckQueryResults(t, fmt.Sprintf(`SELECT count(*) FROM %s`, restoreTable),
+					[][]string{{"0"}})
+			}
 		}
 	}
 
@@ -128,17 +141,17 @@ database_name = 'rand' AND schema_name = 'public'`)
 	for _, backup := range dbBackups {
 		sqlDB.Exec(t, "DROP DATABASE IF EXISTS restoredb")
 		sqlDB.Exec(t, "CREATE DATABASE restoredb")
+		tableQuery := fmt.Sprintf("RESTORE rand.* FROM LATEST IN $1 WITH OPTIONS (into_db='restoredb'%s)", runSchemaOnlyExtension)
 		if err := backuputils.VerifyBackupRestoreStatementResult(
-			t, sqlDB, "RESTORE rand.* FROM $1 WITH OPTIONS (into_db='restoredb')", backup,
+			t, sqlDB, tableQuery, backup,
 		); err != nil {
 			t.Fatal(err)
 		}
 		verifyTables(t, tableNames)
 		sqlDB.Exec(t, "DROP DATABASE IF EXISTS restoredb")
 
-		if err := backuputils.VerifyBackupRestoreStatementResult(
-			t, sqlDB, "RESTORE DATABASE rand FROM $1 WITH OPTIONS (new_db_name='restoredb')", backup,
-		); err != nil {
+		dbQuery := fmt.Sprintf("RESTORE DATABASE rand FROM LATEST IN $1 WITH OPTIONS (new_db_name='restoredb'%s)", runSchemaOnlyExtension)
+		if err := backuputils.VerifyBackupRestoreStatementResult(t, sqlDB, dbQuery, backup); err != nil {
 			t.Fatal(err)
 		}
 		verifyTables(t, tableNames)
@@ -155,8 +168,9 @@ database_name = 'rand' AND schema_name = 'public'`)
 		}
 		tables := strings.Join(combo, ", ")
 		t.Logf("Testing subset backup/restore %s", tables)
-		sqlDB.Exec(t, fmt.Sprintf(`BACKUP TABLE %s TO $1`, tables), backupTarget)
-		_, err := tc.Conns[0].Exec(fmt.Sprintf("RESTORE TABLE %s FROM $1 WITH OPTIONS (into_db='restoredb')", tables),
+		sqlDB.Exec(t, fmt.Sprintf(`BACKUP TABLE %s INTO $1`, tables), backupTarget)
+		_, err := tc.Conns[0].Exec(
+			fmt.Sprintf("RESTORE TABLE %s FROM LATEST IN $1 WITH OPTIONS (into_db='restoredb' %s)", tables, runSchemaOnlyExtension),
 			backupTarget)
 		if err != nil {
 			if strings.Contains(err.Error(), "skip_missing_foreign_keys") {

diff --git a/pkg/ccl/backupccl/restore_data_processor.go b/pkg/ccl/backupccl/restore_data_processor.go
@@ -17,7 +17,6 @@ import (
 	"github.com/cockroachdb/cockroach/pkg/ccl/backupccl/backuppb"
 	"github.com/cockroachdb/cockroach/pkg/ccl/storageccl"
 	"github.com/cockroachdb/cockroach/pkg/cloud"
-	"github.com/cockroachdb/cockroach/pkg/jobs/jobspb"
 	"github.com/cockroachdb/cockroach/pkg/keys"
 	"github.com/cockroachdb/cockroach/pkg/kv/bulk"
 	"github.com/cockroachdb/cockroach/pkg/roachpb"
@@ -138,10 +137,6 @@ func newRestoreDataProcessor(
 ) (execinfra.Processor, error) {
 	sv := &flowCtx.Cfg.Settings.SV
 
-	if spec.Validation != jobspb.RestoreValidation_DefaultRestore {
-		return nil, errors.New("Restore Data Processor does not support validation yet")
-	}
-
 	rd := &restoreDataProcessor{
 		flowCtx:    flowCtx,
 		input:      input,