Skip to content

Commit

Permalink
[wip] migration: onboard the first long-running migration
Browse files Browse the repository at this point in the history
This PR onboards the first real long-running migration using the
infrastructure we've been building up within pkg/migration. It adds in
the final missing pieces described in our original RFC (#48843). These
components were originally prototyped in #56107.

The migration in question (which happens to be a below-Raft one, first
attempted in #42822) now lets us do the following:

  i.  Use the RangeAppliedState on all ranges
  ii. Use the unreplicated TruncatedState on all ranges

The missing pieces we introduce along side this migration are:

  a. The `Migrate` KV request. This forces all ranges overlapping with
  the request spans to execute the (below-raft) migrations corresponding
  to the specific version, moving out of any legacy modes they may
  currently be in. KV waits for this command to durably apply on all the
  replicas before returning, guaranteeing to the caller that all
  pre-migration state has been completely purged from the system.

  b. `IterateRangeDescriptors`. This provides a handle on every range
  descriptor in the system, which callers can then use to send out
  arbitrary KV requests to in order to run arbitrary KV-level
  migrations. These requests will typically just be the `Migrate`
  request, with added code next to the `Migrate` command implementation
  to do the specific KV-level things intended for the specified version.

  c. The `system.migrations` table. We use it to store metadata about
  ongoing migrations for external visibility/introspection. The schema
  (listed below) is designed with an eye towards scriptability. We
  want operators to be able programmatically use this table to control
  their upgrade processes, if needed.

      CREATE TABLE public.migrations (
          version STRING NOT NULL,
          status STRING NOT NULL,
          description STRING NOT NULL,
          start TIMESTAMP NOT NULL DEFAULT now():::TIMESTAMP,
          completed TIMESTAMP NULL,
          progress STRING NULL,
          CONSTRAINT "primary" PRIMARY KEY (version ASC),
          FAMILY "primary" (version, status, description, start, completed, progress)
      )

Release note (general change): Cluster version upgrades, as initiated by
`SET CLUSTER SETTING version = <major>-<minor>`, now perform internal
maintenance duties that will delay how long it takes for the command to
complete. The delay is proportional to the amount of data currently
stored in the cluster. The cluster will also experience a small amount
of additional load during this period while the upgrade is being
finalized.

Release note (general change): We introduce a new `system.migrations`
table for introspection into crdb internal data migrations. These
migrations are the "maintenance duties" mentioned above. The table
surfaces the currently ongoing migrations, the previously completed
migrations, and in the case of failure, the errors from the last failed
attempt.
  • Loading branch information
irfansharif committed Dec 6, 2020
1 parent 677f6f8 commit c93d65a
Show file tree
Hide file tree
Showing 54 changed files with 3,275 additions and 1,367 deletions.
2 changes: 1 addition & 1 deletion docs/generated/settings/settings.html
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,6 @@
<tr><td><code>trace.debug.enable</code></td><td>boolean</td><td><code>false</code></td><td>if set, traces for recent requests can be seen in the /debug page</td></tr>
<tr><td><code>trace.lightstep.token</code></td><td>string</td><td><code></code></td><td>if set, traces go to Lightstep using this token</td></tr>
<tr><td><code>trace.zipkin.collector</code></td><td>string</td><td><code></code></td><td>if set, traces go to the given Zipkin instance (example: '127.0.0.1:9411'); ignored if trace.lightstep.token is set</td></tr>
<tr><td><code>version</code></td><td>version</td><td><code>20.2-4</code></td><td>set the active cluster version in the format '<major>.<minor>'</td></tr>
<tr><td><code>version</code></td><td>version</td><td><code>20.2-8</code></td><td>set the active cluster version in the format '<major>.<minor>'</td></tr>
</tbody>
</table>
3 changes: 3 additions & 0 deletions pkg/ccl/backupccl/system_schema.go
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,9 @@ var systemTableBackupConfiguration = map[string]systemBackupConfiguration{
systemschema.DeprecatedNamespaceTable.Name: {
includeInClusterBackup: optOutOfClusterBackup,
},
systemschema.MigrationsTable.Name: {
includeInClusterBackup: optOutOfClusterBackup,
},
systemschema.ProtectedTimestampsMetaTable.Name: {
includeInClusterBackup: optOutOfClusterBackup,
},
Expand Down
1 change: 0 additions & 1 deletion pkg/ccl/logictestccl/testdata/logic_test/partitioning_enum
Original file line number Diff line number Diff line change
Expand Up @@ -55,4 +55,3 @@ ALTER TABLE partitioned_table_3 PARTITION BY RANGE (place)

statement ok
SELECT * FROM crdb_internal.tables

2 changes: 1 addition & 1 deletion pkg/cli/testdata/doctor/testcluster
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
doctor cluster
----
debug doctor cluster
Examining 34 descriptors and 35 namespace entries...
Examining 35 descriptors and 36 namespace entries...
Table 53: ParentID 50, ParentSchemaID 29, Name 'foo': not being dropped but no namespace entry found
Examining 1 running jobs...
ERROR: validation failed
9 changes: 6 additions & 3 deletions pkg/cli/testdata/zip/partial1
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ requesting goroutine files for node 1... writing: debug/nodes/1/goroutines.err.t
^- resulted in ...
requesting log file ...
requesting log file ...
requesting ranges... 35 found
requesting ranges... 36 found
writing: debug/nodes/1/ranges/1.json
writing: debug/nodes/1/ranges/2.json
writing: debug/nodes/1/ranges/3.json
Expand Down Expand Up @@ -94,6 +94,7 @@ writing: debug/nodes/1/ranges/32.json
writing: debug/nodes/1/ranges/33.json
writing: debug/nodes/1/ranges/34.json
writing: debug/nodes/1/ranges/35.json
writing: debug/nodes/1/ranges/36.json
writing: debug/nodes/2/status.json
using SQL connection URL for node 2: postgresql://...
retrieving SQL data for crdb_internal.feature_usage... writing: debug/nodes/2/crdb_internal.feature_usage.txt
Expand Down Expand Up @@ -190,7 +191,7 @@ requesting goroutine files for node 3... writing: debug/nodes/3/goroutines.err.t
^- resulted in ...
requesting log file ...
requesting log file ...
requesting ranges... 35 found
requesting ranges... 36 found
writing: debug/nodes/3/ranges/1.json
writing: debug/nodes/3/ranges/2.json
writing: debug/nodes/3/ranges/3.json
Expand Down Expand Up @@ -226,13 +227,14 @@ writing: debug/nodes/3/ranges/32.json
writing: debug/nodes/3/ranges/33.json
writing: debug/nodes/3/ranges/34.json
writing: debug/nodes/3/ranges/35.json
writing: debug/nodes/3/ranges/36.json
requesting list of SQL databases... 3 found
requesting database details for defaultdb... writing: debug/schema/[email protected]
0 tables found
requesting database details for postgres... writing: debug/schema/[email protected]
0 tables found
requesting database details for system... writing: debug/schema/[email protected]
29 tables found
30 tables found
requesting table details for system.public.namespace... writing: debug/schema/system/public_namespace.json
requesting table details for system.public.descriptor... writing: debug/schema/system/public_descriptor.json
requesting table details for system.public.users... writing: debug/schema/system/public_users.json
Expand Down Expand Up @@ -262,5 +264,6 @@ requesting table details for system.public.statement_diagnostics_requests... wri
requesting table details for system.public.statement_diagnostics... writing: debug/schema/system/public_statement_diagnostics.json
requesting table details for system.public.scheduled_jobs... writing: debug/schema/system/public_scheduled_jobs.json
requesting table details for system.public.sqlliveness... writing: debug/schema/system/public_sqlliveness.json
requesting table details for system.public.migrations... writing: debug/schema/system/public_migrations.json
writing: debug/pprof-summary.sh
writing: debug/hot-ranges.sh
9 changes: 6 additions & 3 deletions pkg/cli/testdata/zip/partial1_excluded
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ requesting goroutine files for node 1... writing: debug/nodes/1/goroutines.err.t
^- resulted in ...
requesting log file ...
requesting log file ...
requesting ranges... 35 found
requesting ranges... 36 found
writing: debug/nodes/1/ranges/1.json
writing: debug/nodes/1/ranges/2.json
writing: debug/nodes/1/ranges/3.json
Expand Down Expand Up @@ -94,6 +94,7 @@ writing: debug/nodes/1/ranges/32.json
writing: debug/nodes/1/ranges/33.json
writing: debug/nodes/1/ranges/34.json
writing: debug/nodes/1/ranges/35.json
writing: debug/nodes/1/ranges/36.json
writing: debug/nodes/2.skipped
writing: debug/nodes/3/status.json
using SQL connection URL for node 3: postgresql://...
Expand Down Expand Up @@ -124,7 +125,7 @@ requesting goroutine files for node 3... writing: debug/nodes/3/goroutines.err.t
^- resulted in ...
requesting log file ...
requesting log file ...
requesting ranges... 35 found
requesting ranges... 36 found
writing: debug/nodes/3/ranges/1.json
writing: debug/nodes/3/ranges/2.json
writing: debug/nodes/3/ranges/3.json
Expand Down Expand Up @@ -160,13 +161,14 @@ writing: debug/nodes/3/ranges/32.json
writing: debug/nodes/3/ranges/33.json
writing: debug/nodes/3/ranges/34.json
writing: debug/nodes/3/ranges/35.json
writing: debug/nodes/3/ranges/36.json
requesting list of SQL databases... 3 found
requesting database details for defaultdb... writing: debug/schema/[email protected]
0 tables found
requesting database details for postgres... writing: debug/schema/[email protected]
0 tables found
requesting database details for system... writing: debug/schema/[email protected]
29 tables found
30 tables found
requesting table details for system.public.namespace... writing: debug/schema/system/public_namespace.json
requesting table details for system.public.descriptor... writing: debug/schema/system/public_descriptor.json
requesting table details for system.public.users... writing: debug/schema/system/public_users.json
Expand Down Expand Up @@ -196,5 +198,6 @@ requesting table details for system.public.statement_diagnostics_requests... wri
requesting table details for system.public.statement_diagnostics... writing: debug/schema/system/public_statement_diagnostics.json
requesting table details for system.public.scheduled_jobs... writing: debug/schema/system/public_scheduled_jobs.json
requesting table details for system.public.sqlliveness... writing: debug/schema/system/public_sqlliveness.json
requesting table details for system.public.migrations... writing: debug/schema/system/public_migrations.json
writing: debug/pprof-summary.sh
writing: debug/hot-ranges.sh
9 changes: 6 additions & 3 deletions pkg/cli/testdata/zip/partial2
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ requesting goroutine files for node 1... writing: debug/nodes/1/goroutines.err.t
^- resulted in ...
requesting log file ...
requesting log file ...
requesting ranges... 35 found
requesting ranges... 36 found
writing: debug/nodes/1/ranges/1.json
writing: debug/nodes/1/ranges/2.json
writing: debug/nodes/1/ranges/3.json
Expand Down Expand Up @@ -94,6 +94,7 @@ writing: debug/nodes/1/ranges/32.json
writing: debug/nodes/1/ranges/33.json
writing: debug/nodes/1/ranges/34.json
writing: debug/nodes/1/ranges/35.json
writing: debug/nodes/1/ranges/36.json
writing: debug/nodes/3/status.json
using SQL connection URL for node 3: postgresql://...
retrieving SQL data for crdb_internal.feature_usage... writing: debug/nodes/3/crdb_internal.feature_usage.txt
Expand Down Expand Up @@ -123,7 +124,7 @@ requesting goroutine files for node 3... writing: debug/nodes/3/goroutines.err.t
^- resulted in ...
requesting log file ...
requesting log file ...
requesting ranges... 35 found
requesting ranges... 36 found
writing: debug/nodes/3/ranges/1.json
writing: debug/nodes/3/ranges/2.json
writing: debug/nodes/3/ranges/3.json
Expand Down Expand Up @@ -159,13 +160,14 @@ writing: debug/nodes/3/ranges/32.json
writing: debug/nodes/3/ranges/33.json
writing: debug/nodes/3/ranges/34.json
writing: debug/nodes/3/ranges/35.json
writing: debug/nodes/3/ranges/36.json
requesting list of SQL databases... 3 found
requesting database details for defaultdb... writing: debug/schema/[email protected]
0 tables found
requesting database details for postgres... writing: debug/schema/[email protected]
0 tables found
requesting database details for system... writing: debug/schema/[email protected]
29 tables found
30 tables found
requesting table details for system.public.namespace... writing: debug/schema/system/public_namespace.json
requesting table details for system.public.descriptor... writing: debug/schema/system/public_descriptor.json
requesting table details for system.public.users... writing: debug/schema/system/public_users.json
Expand Down Expand Up @@ -195,5 +197,6 @@ requesting table details for system.public.statement_diagnostics_requests... wri
requesting table details for system.public.statement_diagnostics... writing: debug/schema/system/public_statement_diagnostics.json
requesting table details for system.public.scheduled_jobs... writing: debug/schema/system/public_scheduled_jobs.json
requesting table details for system.public.sqlliveness... writing: debug/schema/system/public_sqlliveness.json
requesting table details for system.public.migrations... writing: debug/schema/system/public_migrations.json
writing: debug/pprof-summary.sh
writing: debug/hot-ranges.sh
3 changes: 2 additions & 1 deletion pkg/cli/testdata/zip/specialnames
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ requesting table details for defaultdb.public."../system"... writing: debug/sche
requesting database details for postgres... writing: debug/schema/[email protected]
0 tables found
requesting database details for system... writing: debug/schema/[email protected]
29 tables found
30 tables found
requesting table details for system.public.namespace... writing: debug/schema/system-1/public_namespace.json
requesting table details for system.public.descriptor... writing: debug/schema/system-1/public_descriptor.json
requesting table details for system.public.users... writing: debug/schema/system-1/public_users.json
Expand Down Expand Up @@ -51,3 +51,4 @@ requesting table details for system.public.statement_diagnostics_requests... wri
requesting table details for system.public.statement_diagnostics... writing: debug/schema/system-1/public_statement_diagnostics.json
requesting table details for system.public.scheduled_jobs... writing: debug/schema/system-1/public_scheduled_jobs.json
requesting table details for system.public.sqlliveness... writing: debug/schema/system-1/public_sqlliveness.json
requesting table details for system.public.migrations... writing: debug/schema/system-1/public_migrations.json
6 changes: 4 additions & 2 deletions pkg/cli/testdata/zip/testzip
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ requesting heap profile for node 1... writing: debug/nodes/1/heap.pprof
requesting heap files for node 1... ? found
requesting goroutine files for node 1... 0 found
requesting log file ...
requesting ranges... 35 found
requesting ranges... 36 found
writing: debug/nodes/1/ranges/1.json
writing: debug/nodes/1/ranges/2.json
writing: debug/nodes/1/ranges/3.json
Expand Down Expand Up @@ -93,13 +93,14 @@ writing: debug/nodes/1/ranges/32.json
writing: debug/nodes/1/ranges/33.json
writing: debug/nodes/1/ranges/34.json
writing: debug/nodes/1/ranges/35.json
writing: debug/nodes/1/ranges/36.json
requesting list of SQL databases... 3 found
requesting database details for defaultdb... writing: debug/schema/[email protected]
0 tables found
requesting database details for postgres... writing: debug/schema/[email protected]
0 tables found
requesting database details for system... writing: debug/schema/[email protected]
29 tables found
30 tables found
requesting table details for system.public.namespace... writing: debug/schema/system/public_namespace.json
requesting table details for system.public.descriptor... writing: debug/schema/system/public_descriptor.json
requesting table details for system.public.users... writing: debug/schema/system/public_users.json
Expand Down Expand Up @@ -129,5 +130,6 @@ requesting table details for system.public.statement_diagnostics_requests... wri
requesting table details for system.public.statement_diagnostics... writing: debug/schema/system/public_statement_diagnostics.json
requesting table details for system.public.scheduled_jobs... writing: debug/schema/system/public_scheduled_jobs.json
requesting table details for system.public.sqlliveness... writing: debug/schema/system/public_sqlliveness.json
requesting table details for system.public.migrations... writing: debug/schema/system/public_migrations.json
writing: debug/pprof-summary.sh
writing: debug/hot-ranges.sh
18 changes: 18 additions & 0 deletions pkg/clusterversion/cockroach_versions.go
Original file line number Diff line number Diff line change
Expand Up @@ -199,6 +199,16 @@ const (
// EmptyArraysInInvertedIndexes is when empty arrays are added to array
// inverted indexes.
EmptyArraysInInvertedIndexes
// MigrationTable introduces the new system.migrations table.
MigrationTable
// TruncatedAndRangeAppliedStateMigration is part of the migration to stop
// using the legacy truncated state within KV. Once it's active, we'll be
// using the unreplicated truncated state and the RangeAppliedState on all
// ranges. In 21.2 we'll now be able to remove any holdover code handling
// the possibility of replicated truncated state.
//
// TODO(irfansharif): Do the above in 21.2.
TruncatedAndRangeAppliedStateMigration

// Step (1): Add new versions here.
)
Expand Down Expand Up @@ -321,6 +331,14 @@ var versionsSingleton = keyedVersions([]keyedVersion{
Key: EmptyArraysInInvertedIndexes,
Version: roachpb.Version{Major: 20, Minor: 2, Internal: 4},
},
{
Key: MigrationTable,
Version: roachpb.Version{Major: 20, Minor: 2, Internal: 6},
},
{
Key: TruncatedAndRangeAppliedStateMigration,
Version: roachpb.Version{Major: 20, Minor: 2, Internal: 8},
},

// Step (2): Add new versions here.
})
Expand Down
6 changes: 4 additions & 2 deletions pkg/clusterversion/key_string.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions pkg/keys/constants.go
Original file line number Diff line number Diff line change
Expand Up @@ -390,6 +390,7 @@ const (
ScheduledJobsTableID = 37
TenantsRangesID = 38 // pseudo
SqllivenessID = 39
MigrationsID = 40

// CommentType is type for system.comments
DatabaseCommentType = 0
Expand Down
39 changes: 30 additions & 9 deletions pkg/kv/batch.go
Original file line number Diff line number Diff line change
Expand Up @@ -227,19 +227,11 @@ func (b *Batch) fillResults(ctx context.Context) {
case *roachpb.DeleteRequest:
row := &result.Rows[k]
row.Key = []byte(args.(*roachpb.DeleteRequest).Key)

case *roachpb.DeleteRangeRequest:
if result.Err == nil {
result.Keys = reply.(*roachpb.DeleteRangeResponse).Keys
}

default:
if result.Err == nil {
result.Err = errors.Errorf("unsupported reply: %T for %T",
reply, args)
}

// Nothing to do for all methods below as they do not generate
// Nothing to do for the methods below as they do not generate
// any rows.
case *roachpb.EndTxnRequest:
case *roachpb.AdminMergeRequest:
Expand All @@ -254,6 +246,7 @@ func (b *Batch) fillResults(ctx context.Context) {
case *roachpb.PushTxnRequest:
case *roachpb.QueryTxnRequest:
case *roachpb.QueryIntentRequest:
case *roachpb.MigrateRequest:
case *roachpb.ResolveIntentRequest:
case *roachpb.ResolveIntentRangeRequest:
case *roachpb.MergeRequest:
Expand All @@ -264,6 +257,11 @@ func (b *Batch) fillResults(ctx context.Context) {
case *roachpb.ImportRequest:
case *roachpb.AdminScatterRequest:
case *roachpb.AddSSTableRequest:
default:
if result.Err == nil {
result.Err = errors.Errorf("unsupported reply: %T for %T",
reply, args)
}
}
// Fill up the resume span.
if result.Err == nil && reply != nil && reply.Header().ResumeSpan != nil {
Expand Down Expand Up @@ -754,3 +752,26 @@ func (b *Batch) addSSTable(
b.appendReqs(req)
b.initResult(1, 0, notRaw, nil)
}

// migrate is only exported on DB.
func (b *Batch) migrate(s, e interface{}, version roachpb.Version) {
begin, err := marshalKey(s)
if err != nil {
b.initResult(0, 0, notRaw, err)
return
}
end, err := marshalKey(e)
if err != nil {
b.initResult(0, 0, notRaw, err)
return
}
req := &roachpb.MigrateRequest{
RequestHeader: roachpb.RequestHeader{
Key: begin,
EndKey: end,
},
Version: version,
}
b.appendReqs(req)
b.initResult(1, 0, notRaw, nil)
}
11 changes: 11 additions & 0 deletions pkg/kv/db.go
Original file line number Diff line number Diff line change
Expand Up @@ -639,6 +639,17 @@ func (db *DB) AddSSTable(
return getOneErr(db.Run(ctx, b), b)
}

// Migrate proactively forces ranges overlapping with the provided keyspace to
// transition out of any legacy modes of operation (as defined by the target
// version).
func (db *DB) Migrate(
ctx context.Context, begin, end interface{}, version roachpb.Version,
) error {
b := &Batch{}
b.migrate(begin, end, version)
return getOneErr(db.Run(ctx, b), b)
}

// sendAndFill is a helper which sends the given batch and fills its results,
// returning the appropriate error which is either from the first failing call,
// or an "internal" error.
Expand Down
1 change: 1 addition & 0 deletions pkg/kv/kvserver/batcheval/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ go_library(
"cmd_lease_request.go",
"cmd_lease_transfer.go",
"cmd_merge.go",
"cmd_migrate.go",
"cmd_push_txn.go",
"cmd_put.go",
"cmd_query_intent.go",
Expand Down
Loading

0 comments on commit c93d65a

Please sign in to comment.