*: introduce pkg/migrations #56107

irfansharif · 2020-10-29T06:41:49Z

Implements the long-running migrations RFC (#48843). This PR builds on top a
few core components we've introduced to crdb over the past few months. They
are (in order):

clusterversion: significantly rework cluster version handling #45455 (re-works cluster version settings to be a standalone component)
kvserver,cli,roachtest,sql: introduce a fully decommissioned bit #50329 (introduces a fully decommissioned bit)
server: introduce join rpc for node id allocation #52526 (introduces the join RPC for node ID allocation, and to hand out
active cluster versions to joining nodes)
server: always create a liveness record before starting up #53842 (always creating a liveness record for nodes, before startup)
server,rpc: block decommissioned node from interacting with the cluster #55286 (blocks decommissioned nodes from interacting with the cluster)

The components above (save for #55286) were included as part of the 20.2
release. That paved the way for us to introduce this machinery for 21.1 (it
also lets us define migrations starting this cycle). For the detailed story
around how it all fits together, read the RFC.

Big picture: going forward SET CLUSTER SETTING version = <whatever> will step
through each intermediate (unstable) version between what's currently active,
and what the intended target is, and execute associated migrations for each
one (if any). For each "step", the migration manager first checks to see that
every node in the cluster is running an appropriate binary version. It then
activates the relevant version gate on every node (through a new RPC, instead
of gossip like we used to previously). Finally it executes the migration
itself, providing it with the guarantee that all .IsActive(VersionTag) will
return true going forward. After successfully migrating through each
intermediate version, the statement returns.

Package migration captures the facilities needed to define and execute
migrations for a crdb cluster. These migrations can be long running, are free
to send out arbitrary requests cluster wide, change internal DB state, and much
more. They're typically reserved for crdb internal operations and state. Each
migration is idempotent in nature, is associated with a specific cluster
version, and executed when the cluster version is made activate on every node
in the cluster.

Examples of migrations that apply would be migrations to move all raft
state from one storage engine to another, or purging all usage of the
replicated truncated state in KV.

To get a sense of the primitives made available to migrations, a rough sketch
of the APIs is presented below.

  type Migration func(context.Context, *Helper) error
  func ExampleMigration (ctx context.Context, h *Helper) error {
    // NB: The harness has already made sure that the version associated with
    // this particular migration this is already active on all nodes in the
    // cluster.

    // Migrations have access to a *kv.DB.
    txn := h.DB().NewTxn(ctx, ...)

    // Migrations can run for arbitrarily long.
    time.Sleep(time.Second)

    // Migrations can run SQL.
    _, err := h.Conn().Queryf(`SELECT 1`)

    // Migrations can paginate through each range descriptor.
    _ = h.IterRangeDescriptors(pageSize, func(descriptors ...roachpb.RangeDescriptor) error {
      for _, desc := range descriptors {
        // Migrations can selective execute below-raft migrations.
        h.DB().Migrate(ctx, desc.StartKey, desc.EndKey, h.Version())
      }

      // Migrations can persist progress to a system table, for introspection.
      h.Logf("paginated through %d descriptors", len(descriptors))
    })

    // Migrations can define/execute server level operations across every node
    // (like forcing all replicas through GC, etc.)
    h.EveryNode(ctx, op)
  }

See individual commits. This PR introduces individual components without it
being wired up to anything first, and then slowly incorporates them into
actual running code. The commits of note (in order) are pulled out below.

  sql: add scaffolding for version upgrade hook
  
  This callback will be called after validating a `SET CLUSTER SETTING
  version` but before executing it. It will be used in future commits to
  execute long-running migrations.

  server,kvserver: introduce the `EveryNode` rpc
  
  ...and populate it with the `AckClusterVersion` op.
  
  EveryNode is the RPC that will powers the long running migrations
  infrastructure. It'll let callers define and execute arbitrary commands
  across every node in the cluster. To motivate what this would look like,
  we introduce alongside it one such command, the `AckClusterVersion`
  operation. It isn't currently hooked up to anything, but this will
  eventually be the mechanism through which we'll propagate cluster
  version bumps across the crdb cluster, replacing our gossip based
  distribution mechanism in-place today. This will let the orchestrator
  bump version gates in a controlled fashion across the cluster.
  
  To achieve this, the stubbed out implementation of AckClusterVersion
  makes use of the same `StoreClusterVersionKey` otherwise used in
  callbacks attached to the gossip handler for cluster version bumps.

  kvserver: introduce GetLivenessesFromKV
  
  Now that we always create a liveness record on start up (#53805), we can
  simply fetch all liveness records from KV when wanting an up-to-date
  view of all nodes in the cluster. We add a helper to do as much,
  which we'll rely on in future commits. 

  It's a bit unfortunate that we're further adding on to the NodeLiveness API
  without changing the underlying look-aside caching structure, but the
  methods fetching records from KV is the world we're hoping to start moving
  towards over time.

  migration: introduce pkg/migrations
  
  Package migration captures the facilities needed to define and execute
  migrations for a crdb cluster. These migrations can be arbitrarily long
  running, are free to send out arbitrary requests cluster wide, change
  internal DB state, and much more. They're typically reserved for crdb
  internal operations and state. Each migration is idempotent in nature,
  is associated with a specific cluster version, and executed when the
  cluster version is made activate on every node in the cluster.
  
  Examples of migrations that apply would be migrations to move all raft
  state from one storage engine to another, or purging all usage of the
  replicated truncated state in KV. A "sister" package of interest is
  pkg/sqlmigrations.
  
  This commit only introduces the basic scaffolding and wiring from
  existing functionality. We'll flesh in the missing bits in future
  commits.

  server,migration: generate EveryNode req/resp helpers
  
  We expect to add multiple req/resp types as individual operations for
  the EveryNode RPC. Each of these operations will correspond to a
  "primitive" of sorts for the (long running) migrations infrastructure.
  It's a bit cumbersome to wield this nested union type, so we
  autogenerate helper code to do it for us. We take precedence from
  api.proto and all the very many batch request/responses.

  server: expose GetLivenessesFromKV in node liveness interfaces
  
  To flesh out the migrations infrastructure (in future commits), we'll
  need a handle on all the nodes present in the system. Now that we always
  create a liveness record on start up (#53805), we can simply fetch all
  liveness records from KV. We add a helper to do so.
  
  This does mean that we're introducing a direct dependency on
  NodeLiveness in the sql layer, and there's improvements to be made here
  around interfaces delineating between "system tenant only" sql code and
  everything else. Only system tenants have the privilege to set cluster
  settings (or at least the version setting specifically), which is what
  this API will look to power.

  server: introduce ValidateTargetClusterVersionRequest
  
  We'll use this primitive in a future commit to introduce additional
  safeguards (not present today) around cluster version upgrades.
  Specifically, we'll use this EveryNode operation to validate that every
  node in the cluster is running a binary that's able to support the
  specified cluster version.

  settings,clusterversion: disconnect cluster version from gossip
  
  ...in favor of direct RPCs to all nodes in the cluster. This commit in
  particular deserves a bit of scrutiny. It uses the building blocks we've
  added thus far to replace the use of gossip to disseminate the cluster
  version. It does so by sending out individual RPCs to each node in the
  cluster, informing them of a version bump, all the while retaining the
  same guarantees provided by our (now previously) gossip-backed mechanism.
  
  This diff has the following "pieces":
  - It disconnecting version setting updates through gossip (by
    disconnecting the setting type within the updater process)
  - It using the EveryNode RPC to send out RPCs to each node in the
    cluster, containing the payload that each node would otherwise receive
    through gossip.
  - It expands the clusterversion.Handle interface to allow setting the
    active version directly through it.
  - It persisting any cluster versions received from other nodes first,
    within keys.StoreClusterVersionKey, before bumping the version gate.
    This was previously achieved by attaching callbacks on the version
    handle (look towards all the diffs around SetBeforeChange). This is an
    invariant we also maintained earlier.
  - It using the active version provided by the join RPC to set the
    version setting directly (after having persisted it first). This too
    was previously achieved through gossip + the callback.
  
  While here, we add a few comments and chip away at the callback hooks
  that are no longer needed.

  sql: introduce system.migrations
  
  We'll use it in a future commit to store migration state, for
  introspection.
  
  > SHOW CREATE system.migrations;
           table_name        |                     create_statement
  ---------------------------+------------------------------------------------------------
    system.public.migrations | CREATE TABLE public.migrations (
                             |     id INT8 NOT NULL DEFAULT unique_rowid(),
                             |     metadata STRING NOT NULL,
                             |     started TIMESTAMP NOT NULL DEFAULT now():::TIMESTAMP,
                             |     progress BYTES NULL,
                             |     CONSTRAINT "primary" PRIMARY KEY (id ASC),
                             |     FAMILY "primary" (id, metadata, started),
                             |     FAMILY progress (progress)
                             | )

Still need to add tests. But it "works".

We'll want to eventually distinguish between sqlmigrations (only run at start up) and general purpose (and possibly long-running) migrations. We'll introduce the latter in future commits, within a new pkg/migration. Release note: None

This callback will be called after validating a `SET CLUSTER SETTING version` but before executing it. It will be used in future commits to execute long-running migrations. Release note: None

cockroach-teamcity · 2020-10-29T06:42:13Z

This change is

This request will be fleshed out and used in future commits that introduce the long running migration orchestrator process. Release note: None

...and populate it with the `AckClusterVersion` op. EveryNode is the RPC that will powers the long running migrations infrastructure. It'll let callers define and execute arbitrary commands across every node in the cluster. To motivate what this would look like, we introduce alongside it one such command, the `AckClusterVersion` operation. It isn't currently hooked up to anything, but this will eventually be the mechanism through which we'll propagate cluster version bumps across the crdb cluster, replacing our gossip based distribution mechanism in-place today. This will let the orchestrator bump version gates in a controlled fashion across the cluster. To achieve this, the stubbed out implementation of AckClusterVersion makes use of the same `StoreClusterVersionKey` otherwise used in callbacks attached to the gossip handler for cluster version bumps. Release note: None

Now that we always create a liveness record on start up (cockroachdb#53805), we can simply fetch all liveness records from KV when wanting an up-to-date view of all nodes in the cluster. We add a helper to do as much, which we'll rely on in future commits. It's a bit unfortunate that we're further adding on to the NodeLiveness API without changing the underlying look-aside caching structure, but the methods fetching records from KV is the world we're hoping to start moving towards over time. Release note: None

Package migration captures the facilities needed to define and execute migrations for a crdb cluster. These migrations can be arbitrarily long running, are free to send out arbitrary requests cluster wide, change internal DB state, and much more. They're typically reserved for crdb internal operations and state. Each migration is idempotent in nature, is associated with a specific cluster version, and executed when the cluster version is made activate on every node in the cluster. Examples of migrations that apply would be migrations to move all raft state from one storage engine to another, or purging all usage of the replicated truncated state in KV. A "sister" package of interest is pkg/sqlmigrations. --- This commit only introduces the basic scaffolding and wiring from existing functionality. We'll flesh in the missing bits in future commits. Release note: None

The migration manager will make use of the EveryNode RPC once it's properly wired up (in future commits). It'll need a node dialer to do so. Release note: None

We expect to add multiple req/resp types as individual operations for the EveryNode RPC. Each of these operations will correspond to a "primitive" of sorts for the (long running) migrations infrastructure. It's a bit cumbersome to wield this nested union type, so we autogenerate helper code to do it for us. We take precedence from api.proto and all the very many batch request/responses. Release note: None

To flesh out the migrations infrastructure (in future commits), we'll need a handle on all the nodes present in the system. Now that we always create a liveness record on start up (cockroachdb#53805), we can simply fetch all liveness records from KV. We add a helper to do so. It's a bit unfortunate that we're further adding on to the NodeLiveness API without changing the caching structure, but the methods fetching records from KV is the world we're hoping to move towards going forward. This does mean that we're introducing a direct dependency on NodeLiveness in the sql layer, and there's improvements to be made here around interfaces delineating between "system tenant only" sql code and everything else. Only system tenants have the privilege to set cluster settings (or at least the version setting specifically), which is what this API will look to power. Release note: None

Plumb in the view into node liveness that was fleshed out earlier. We use it to power the RequiredNodes primitive, that provides a handle on all nodes in the system. Copying from elsewhere: // RequiredNodes returns the node IDs for all nodes that are // currently part of the cluster (i.e. they haven't been // decommissioned away). Migrations have the pre-requisite that all // required nodes are up and running so that we're able to execute // all relevant node-level operations on them. If any of the nodes // are found to be unavailable, an error is returned. Release note: None

This commit is not going to be merged. It was added to test things quickly (in lieu of actual tests) by letting me bump versions willy nilly (ignoring max allowable version). Release note: None

We'll use this primitive in a future commit to introduce additional safeguards (not present today) around cluster version upgrades. Specifically, we'll use this EveryNode operation to validate that every node in the cluster is running a binary that's able to support the specified cluster version.

...in favor of direct RPCs to all nodes in the cluster. This commit in particular deserves a bit of scrutiny. It uses the building blocks we've added thus far to replace the use of gossip to disseminate the cluster version. It does so by sending out individual RPCs to each node in the cluster, informing them of a version bump, all the while retaining the same guarantees provided by our (now previously) gossip-backed mechanism. This diff has the following "pieces": - It disconnecting version setting updates through gossip (by disconnecting the setting type within the updater process) - It using the EveryNode RPC to send out RPCs to each node in the cluster, containing the payload that each node would otherwise receive through gossip. - It expands the clusterversion.Handle interface to allow setting the active version directly through it. - It persisting any cluster versions received from other nodes first, within keys.StoreClusterVersionKey, before bumping the version gate. This was previously achieved by attaching callbacks on the version handle (look towards all the diffs around SetBeforeChange). This is an invariant we also maintained earlier. - It using the active version provided by the join RPC to set the version setting directly (after having persisted it first). This too was previously achieved through gossip + the callback. While here, we add a few comments and chip away at the callback hooks that are no longer needed. Release note: None

Just filling in a few missing dependencies we expect migration code to rely on going forward. Release note: None

It's not currently wired up to anything (there are no real migrations yet), but it's one of the primitives we expect future migrations to rely on (in future commits). Release note: None

We'll use it in a future commit to store migration state, for introspection. > SHOW CREATE system.migrations; table_name | create_statement ---------------------------+------------------------------------------------------------ system.public.migrations | CREATE TABLE public.migrations ( | id INT8 NOT NULL DEFAULT unique_rowid(), | metadata STRING NOT NULL, | started TIMESTAMP NOT NULL DEFAULT now():::TIMESTAMP, | progress BYTES NULL, | CONSTRAINT "primary" PRIMARY KEY (id ASC), | FAMILY "primary" (id, metadata, started), | FAMILY progress (progress) | ) Release note: None

This reverts an earlier commit removing version upgrade safeguards. This commit will also not be merged. Release note: None

pkg/server/server_sql.go

pkg/kv/batch.go

pkg/server/serverpb/admin.proto

56130: opt: updated normalization rules for folding is expressions with null predicate r=mgartner a=jayshrivastava opt: updated normalization rules for folding is expressions with null predicate Previously, for statements such as `SELECT (foo,bar) IS DISTINCT FROM NULL FROM a_table`, the expression `(foo,bar) IS DISTINCT FROM NULL` would not be normalized to `true`. Similarly, if `IS NOT DISTINCT FROM NULL` were used, then the expression would not be normalized to `false`. The previous statement would only normalize if the tuple/array in the statement contained only constants. Given the updates in this commit, normalization will be applied when any arrays or tuples are provided in this situation. Release note: None 56217: sql: clean up uses of Statement r=RaduBerinde a=RaduBerinde This change cleans up the use of `sql.Statement` and reduces some allocations. Specifically: - we create a `Statement` lower in the stack (in `execStmtInOpenState`), and pass only what we need in the higher layers; - we change various functions to take a `tree.Statement` rather than an entire `Statement` when possible; - we move down the `withStatement` context allocation, so that it is avoided in the implicit transaction state transition; - we store a copy rather than a pointer to the Statement in the planner; - we avoid directly using `stmt` fields from `func()` declarations that escape; - we populate `Statement.AnonymizedStr` upfront. The anonymized string is always needed (to update statement stats). ``` name old time/op new time/op delta EndToEnd/kv-read/EndToEnd 153µs ± 1% 154µs ± 2% ~ (p=0.486 n=4+4) EndToEnd/kv-read-no-prep/EndToEnd 216µs ± 1% 217µs ± 1% ~ (p=0.886 n=4+4) EndToEnd/kv-read-const/EndToEnd 111µs ± 1% 113µs ± 1% +1.01% (p=0.029 n=4+4) name old alloc/op new alloc/op delta EndToEnd/kv-read/EndToEnd 25.8kB ± 1% 25.5kB ± 1% ~ (p=0.114 n=4+4) EndToEnd/kv-read-no-prep/EndToEnd 32.2kB ± 1% 31.9kB ± 1% ~ (p=0.686 n=4+4) EndToEnd/kv-read-const/EndToEnd 21.0kB ± 1% 20.7kB ± 2% ~ (p=0.200 n=4+4) name old allocs/op new allocs/op delta EndToEnd/kv-read/EndToEnd 252 ± 1% 250 ± 0% -0.99% (p=0.029 n=4+4) EndToEnd/kv-read-no-prep/EndToEnd 332 ± 0% 330 ± 1% ~ (p=0.229 n=4+4) EndToEnd/kv-read-const/EndToEnd 214 ± 0% 212 ± 0% -0.93% (p=0.029 n=4+4) ``` Release note: None 56243: liveness: introduce GetLivenessesFromKV r=irfansharif a=irfansharif Now that we always create a liveness record on start up (#53805), we can simply fetch all records from KV when wanting an up-to-date view of all nodes that have ever been a part of the cluster. We add a helper to do as much, which we'll rely on when introducing long running migrations (#56107). It's a bit unfortunate that we're further adding on to the liveness API without changing the underlying look-aside cache structure, but the up-to-date records from KV directly is the world we're hoping to start moving towards over time. The TODO added in [1] outlines what the future holds. We'll also expose the GetLivenessesFromKV API we introduced earlier to pkg/sql. We'll rely on it when needing to plumb in the liveness instance into the migration manager process (prototyped in #56107) It should be noted that this will be a relatively meatier form of a dependency on node liveness from pkg/sql than we have currently. Today the only uses are in DistSQL planning and in jobs[2]. As it relates to our multi-tenancy work, the real use of this API will happen only on the system tenant. System tenants alone have the privilege to set cluster settings (or at least the version setting specifically), which is what the migration manager will be wired into. [1]: d631239 [2]: #48795 Release note: None --- First commit is from #56221, and can be ignored here. Co-authored-by: Jayant Shrivastava <[email protected]> Co-authored-by: Radu Berinde <[email protected]> Co-authored-by: irfan sharif <[email protected]>

This callback will be called after validating a `SET CLUSTER SETTING version` but before executing it. It will be used in future PRs to execute arbitrary migrations to allow us to eventually remove code to support legacy behavior. This diff was pulled out of the long-running migrations prototype (cockroachdb#56107). For more details, see the RFC (cockroachdb#48843). Release note: None

The migration manager will need all of the above in order to execute migrations. It'll need: - A `Dialer`, to send out migration RPCs to individual nodes in the cluster. - A handle on `Liveness`, to figure out which nodes are part of the cluster - An `Executor`, for migrations to inspect/set internal SQL state, and to log progress into a dedicated system table - A `kv.DB`, for migrations to inspect/set internal KV state, and to send out Migrate requests to ranges for execute below-Raft migrations For more details, see the RFC (cockroachdb#48843). The fully "fleshed" out version of this manager was originally prototyped in cockroachdb#56107. This PR is simply pulling out the boring bits from there to move things along. Release note: None

The upcoming migration manager (prototyped in cockroachdb#56107) will want to execute a few known RPCs on every node in the cluster. Part of being the "migration infrastructure", we also want authors of individual migrations to be able to define arbitrary node-level operations to execute on each node in the system. To this end we introduce a `Migration` service, and populate it with the two known RPCs the migration manager will want to depend on: - ValidateTargetClusterVersion: used to verify that the target node is running a binary that's able to support the given cluster version. - BumpClusterVersion: used to inform the target node about a (validated) cluster version bump. Both these RPCs are not currently wired up to anything, and BumpClusterVersion will be fleshed out just a tiny bit further in a future PR, but they'll both be used to propagate cluster version bumps across the crdb cluster through direct RPCs, supplanting our existing gossip based distribution mechanism. This will let the migration manager bump version gates in a more controlled fashion. See cockroachdb#56107 for what that will end up looking like, and see the long-running migrations RFC (cockroachdb#48843) for the motivation. Like we mentioned earlier, we expect this service to pick up more RPCs over time to service specific migrations. Release note: None

The migration manager will need all of the above in order to execute migrations. It'll need: - A `Dialer`, to send out migration RPCs to individual nodes in the cluster. - A handle on `Liveness`, to figure out which nodes are part of the cluster - An `Executor`, for migrations to inspect/set internal SQL state, and to log progress into a dedicated system table - A `kv.DB`, for migrations to inspect/set internal KV state, and to send out Migrate requests to ranges for execute below-Raft migrations For more details, see the RFC (cockroachdb#48843). The fully "fleshed" out version of this manager was originally prototyped in cockroachdb#56107. This PR is simply pulling out the boring bits from there to move things along. Release note: None

56368: migration: add scaffolding for the migrations manager r=irfansharif a=irfansharif The commits in this PR were pulled out of our original prototype of the migrations manager (#56107). These are the "boring bits", and simply introduces the scaffolding for the manager without materially hooking it up to anything. It will be fleshed out in future PRs, following the direction set by our prototype above. For more details, see the RFC (#48843). ``` sql: add scaffolding for version upgrade hook This callback will be called after validating a `SET CLUSTER SETTING version` but before executing it. It will be used in future PRs to execute arbitrary migrations to allow us to eventually remove code to support legacy behavior. This diff was pulled out of the long-running migrations prototype (#56107). For more details, see the RFC (#48843). ``` ``` migration: introduce pkg/migrations Package migration captures the facilities needed to define and execute migrations for a crdb cluster. These migrations can be arbitrarily long running, are free to send out arbitrary requests cluster wide, change internal DB state, and much more. They're typically reserved for crdb internal operations and state. Each migration is idempotent in nature, is associated with a specific cluster version, and executed when the cluster version is made activate on every node in the cluster. Examples of migrations that apply would be migrations to move all raft state from one storage engine to another, or purging all usage of the replicated truncated state in KV. A "sister" package of interest is pkg/sqlmigrations. --- This commit only introduces the basic scaffolding and wiring from existing functionality. We'll flesh in the missing bits in future commits. ``` ``` migration: plumb in a dialer, executor, kv.DB, and liveness The migration manager will need all of the above in order to execute migrations. It'll need: - A `Dialer`, to send out migration RPCs to individual nodes in the cluster. - A handle on `Liveness`, to figure out which nodes are part of the cluster - An `Executor`, for migrations to inspect/set internal SQL state, and to log progress into a dedicated system table - A `kv.DB`, for migrations to inspect/set internal KV state, and to send out Migrate requests to ranges for execute below-Raft migrations For more details, see the RFC (#48843). The fully "fleshed" out version of this manager was originally prototyped in #56107. This PR is simply pulling out the boring bits from there to move things along. ``` Co-authored-by: irfan sharif <[email protected]>

The upcoming migration manager (prototyped in cockroachdb#56107) will want to execute a few known RPCs on every node in the cluster. Part of being the "migration infrastructure", we also want authors of individual migrations to be able to define arbitrary node-level operations to execute on each node in the system. To this end we introduce a `Migration` service, and populate it with the two known RPCs the migration manager will want to depend on: - ValidateTargetClusterVersion: used to verify that the target node is running a binary that's able to support the given cluster version. - BumpClusterVersion: used to inform the target node about a (validated) cluster version bump. Both these RPCs are not currently wired up to anything, and BumpClusterVersion will be fleshed out just a tiny bit further in a future PR, but they'll both be used to propagate cluster version bumps across the crdb cluster through direct RPCs, supplanting our existing gossip based distribution mechanism. This will let the migration manager bump version gates in a more controlled fashion. See cockroachdb#56107 for what that will end up looking like, and see the long-running migrations RFC (cockroachdb#48843) for the motivation. Like we mentioned earlier, we expect this service to pick up more RPCs over time to service specific migrations. Release note: None

56476: server: introduce the `Migration` service r=irfansharif a=irfansharif The upcoming migration manager (prototyped in #56107) will want to execute a few known RPCs on every node in the cluster. Part of being the "migration infrastructure", we also want authors of individual migrations to be able to define arbitrary node-level operations to execute on each node in the system. To this end we introduce a `Migration` service, and populate it with the two known RPCs the migration manager will want to depend on: - ValidateTargetClusterVersion: used to verify that the target node is running a binary that's able to support the given cluster version. - BumpClusterVersion: used to inform the target node about a (validated) cluster version bump. Both these RPCs are not currently wired up to anything, and BumpClusterVersion will be fleshed out just a tiny bit further in a future PR, but they'll both be used to propagate cluster version bumps across the crdb cluster through direct RPCs, supplanting our existing gossip based distribution mechanism. This will let the migration manager bump version gates in a more controlled fashion. See #56107 for what that will end up looking like, and see the long-running migrations RFC (#48843) for the motivation. Like we mentioned earlier, we expect this service to pick up more RPCs over time to service specific migrations. Release note: None --- Ignore the first four commits. They're from #56368 and #56474 Co-authored-by: irfan sharif <[email protected]>

56480: settings,migration: disconnect cluster version from gossip r=irfansharif a=irfansharif ...in favor of direct RPCs to all nodes in the cluster. It uses the building blocks we've added thus far to replace the use of gossip to disseminate the cluster version. It does so by sending out individual RPCs to each node in the cluster, informing them of a version bump, all the while retaining the same guarantees provided by our (now previously) gossip-backed mechanism. This is another in the series of PRs to introduce long running migrations (#48843), pulled out of our original prototype in #56107. This diff has the following "pieces": - We disconnect the version setting updates through gossip (by disconnecting the setting type within the updater process) - We use the `Migration` service to send out RPCs to each node in the cluster, containing the payload that each node would otherwise receive through gossip. We do this by first introducing two primitives in pkg/migrations: - `RequiredNodes` retrieves a list of all nodes that are part of the cluster. It's powered by `pkg/../liveness`. - `EveryNode` is a shorthand that allows us to send out node-level migration RPCs to every node in the cluster. We combine these primitives with the RPCs introduced in #56476 (`ValidateTargetClusterVersion`, `BumpClusterVersion`) to actually carry out the version bumps. - We expand the `clusterversion.Handle` interface to allow setting the active version directly through it. We then make use of it in the implementation for `BumpClusterVersion`. - Within `BumpClusterVersion`, we persists the cluster version received from the client node first, within `keys.StoreClusterVersionKey`, before bumping the version gate. This is a required invariant in the system in order for us to not regress our cluster version on restart. It was previously achieved by attaching callbacks on the version handle (`SetBeforeChange`). - We no longer need the callbacks attached to gossip to persist cluster versions to disk. We're doing it as part of the `BumpClusterVersion` RPC. We remove them entirely. - We use the active version provided by the join RPC to set the version setting directly (after having persisted it first). This too was previously achieved through gossip + the callback. Release note: None --- Only the last commit is of interest. All prior commits should be reviewed across #56476, #56474 and #56368. Co-authored-by: irfan sharif <[email protected]>

This PR onboards the first real long-running migration using the infrastructure we've been building up within pkg/migration. It adds in the final missing pieces described in our original RFC (cockroachdb#48843). These components were originally prototyped in cockroachdb#56107. The migration in question (which happens to be a below-Raft one, first attempted in cockroachdb#42822) now lets us do the following: i. Use the RangeAppliedState on all ranges ii. Use the unreplicated TruncatedState on all ranges The missing pieces we introduce along side this migration are: a. The `Migrate` KV request. This forces all ranges overlapping with the request spans to execute the (below-raft) migrations corresponding to the specific version, moving out of any legacy modes they may currently be in. KV waits for this command to durably apply on all the replicas before returning, guaranteeing to the caller that all pre-migration state has been completely purged from the system. b. `IterateRangeDescriptors`. This provides a handle on every range descriptor in the system, which callers can then use to send out arbitrary KV requests to in order to run arbitrary KV-level migrations. These requests will typically just be the `Migrate` request, with added code next to the `Migrate` command implementation to do the specific KV-level things intended for the specified version. c. The `system.migrations` table. We use it to store metadata about ongoing migrations for external visibility/introspection. The schema (listed below) is designed with an eye towards scriptability. We want operators to be able programmatically use this table to control their upgrade processes, if needed. CREATE TABLE public.migrations ( version STRING NOT NULL, status STRING NOT NULL, description STRING NOT NULL, start TIMESTAMP NOT NULL DEFAULT now():::TIMESTAMP, completed TIMESTAMP NULL, progress STRING NULL, CONSTRAINT "primary" PRIMARY KEY (version ASC), FAMILY "primary" (version, status, description, start, completed, progress) ) Release note(general change): Cluster version upgrades, as initiated by `SET CLUSTER SETTING version = <major>-<minor>`, now perform internal maintenance duties that will delay how long it takes for the command to complete. The delay is proportional to the amount of data currently stored in the cluster. The cluster will also experience a small amount of additional load during this period while the upgrade is being finalized. Release note(general change): We introduce a new `system.migrations` table for introspection into crdb internal data migrations. These migrations are the "maintenance duties" mentioned above. The table surfaces the currently ongoing migrations, the previously completed migrations, and in the case of failure, the errors from the last failed attempt.

irfansharif · 2020-12-03T09:05:22Z

This prototype was shredded up into all the various PRs that have since landed on master (see the referring commits above). The few comments here were addressed in those other PRs. The last in-flight one is #57445, which moves past what was prototyped here, so closing this one out. Thanks for the reviews!

This PR onboards the first real long-running migration using the infrastructure we've been building up within pkg/migration. It adds in the final missing pieces described in our original RFC (cockroachdb#48843). These components were originally prototyped in cockroachdb#56107. The migration in question (which happens to be a below-Raft one, first attempted in cockroachdb#42822) now lets us do the following: i. Use the RangeAppliedState on all ranges ii. Use the unreplicated TruncatedState on all ranges The missing pieces we introduce along side this migration are: a. The `Migrate` KV request. This forces all ranges overlapping with the request spans to execute the (below-raft) migrations corresponding to the specific version, moving out of any legacy modes they may currently be in. KV waits for this command to durably apply on all the replicas before returning, guaranteeing to the caller that all pre-migration state has been completely purged from the system. b. `IterateRangeDescriptors`. This provides a handle on every range descriptor in the system, which callers can then use to send out arbitrary KV requests to in order to run arbitrary KV-level migrations. These requests will typically just be the `Migrate` request, with added code next to the `Migrate` command implementation to do the specific KV-level things intended for the specified version. c. The `system.migrations` table. We use it to store metadata about ongoing migrations for external visibility/introspection. The schema (listed below) is designed with an eye towards scriptability. We want operators to be able programmatically use this table to control their upgrade processes, if needed. CREATE TABLE public.migrations ( version STRING NOT NULL, status STRING NOT NULL, description STRING NOT NULL, start TIMESTAMP NOT NULL DEFAULT now():::TIMESTAMP, completed TIMESTAMP NULL, progress STRING NULL, CONSTRAINT "primary" PRIMARY KEY (version ASC), FAMILY "primary" (version, status, description, start, completed, progress) ) Release note (general change): Cluster version upgrades, as initiated by `SET CLUSTER SETTING version = <major>-<minor>`, now perform internal maintenance duties that will delay how long it takes for the command to complete. The delay is proportional to the amount of data currently stored in the cluster. The cluster will also experience a small amount of additional load during this period while the upgrade is being finalized. Release note (general change): We introduce a new `system.migrations` table for introspection into crdb internal data migrations. These migrations are the "maintenance duties" mentioned above. The table surfaces the currently ongoing migrations, the previously completed migrations, and in the case of failure, the errors from the last failed attempt.

irfansharif added 2 commits October 27, 2020 15:13

server: clarify a logging message

711245a

We'll want to eventually distinguish between sqlmigrations (only run at start up) and general purpose (and possibly long-running) migrations. We'll introduce the latter in future commits, within a new pkg/migration. Release note: None

sql: add scaffolding for version upgrade hook

aad0bad

This callback will be called after validating a `SET CLUSTER SETTING version` but before executing it. It will be used in future commits to execute long-running migrations. Release note: None

irfansharif requested review from a team as code owners October 29, 2020 06:41

irfansharif removed request for a team October 29, 2020 06:42

irfansharif added 15 commits October 29, 2020 08:32

kvserver,roachpb: introduce Migrate request type

e27583a

This request will be fleshed out and used in future commits that introduce the long running migration orchestrator process. Release note: None

migration: plumb in node dialer

9ccb3f9

The migration manager will make use of the EveryNode RPC once it's properly wired up (in future commits). It'll need a node dialer to do so. Release note: None

[dnm] clusterversion,heartbeat: hack to bump versions willy nilly

acc6553

This commit is not going to be merged. It was added to test things quickly (in lieu of actual tests) by letting me bump versions willy nilly (ignoring max allowable version). Release note: None

migration: plumb in an internal executor, kv.DB

a43a6bf

Just filling in a few missing dependencies we expect migration code to rely on going forward. Release note: None

migration: implement IterateRangeDecriptors

73bc6f9

It's not currently wired up to anything (there are no real migrations yet), but it's one of the primitives we expect future migrations to rely on (in future commits). Release note: None

[dnm] clusterversion,heartbeat: remove hack to bump versions willy nilly

4244094

This reverts an earlier commit removing version upgrade safeguards. This commit will also not be merged. Release note: None

irfansharif force-pushed the 201022.pkg-migrations branch from 329cb67 to 4244094 Compare October 29, 2020 12:32

irfansharif mentioned this pull request Oct 29, 2020

[wip,dnm,dnr] migration: prototype long-running migrations orchestrator #54903

Closed

irfansharif requested review from tbg, TheSamHuang and knz October 29, 2020 14:16

tbg reviewed Oct 30, 2020

View reviewed changes

pkg/server/server_sql.go Show resolved Hide resolved

tbg reviewed Oct 30, 2020

View reviewed changes

pkg/kv/batch.go Show resolved Hide resolved

tbg reviewed Oct 30, 2020

View reviewed changes

pkg/server/serverpb/admin.proto Show resolved Hide resolved

tbg self-requested a review November 5, 2020 14:07

irfansharif mentioned this pull request Nov 6, 2020

migration: add scaffolding for the migrations manager #56368

Merged

irfansharif mentioned this pull request Nov 10, 2020

server: introduce the Migration service #56476

Merged

irfansharif mentioned this pull request Nov 10, 2020

settings,migration: disconnect cluster version from gossip #56480

Merged

tbg removed their request for review November 11, 2020 08:45

irfansharif mentioned this pull request Dec 3, 2020

[prototype] migration: onboard the first long-running migration #57445

Closed

irfansharif closed this Dec 3, 2020

irfansharif deleted the 201022.pkg-migrations branch December 3, 2020 09:05

irfansharif mentioned this pull request Dec 30, 2020

*: long-running migrations #39182

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

*: introduce pkg/migrations #56107

*: introduce pkg/migrations #56107

irfansharif commented Oct 29, 2020 •

edited

Loading

cockroach-teamcity commented Oct 29, 2020

irfansharif commented Dec 3, 2020

*: introduce pkg/migrations #56107

*: introduce pkg/migrations #56107

Conversation

irfansharif commented Oct 29, 2020 • edited Loading

cockroach-teamcity commented Oct 29, 2020

irfansharif commented Dec 3, 2020

irfansharif commented Oct 29, 2020 •

edited

Loading