rfc: version migration for backwards incompatible functionality #16977

spencerkimball · 2017-07-11T01:50:26Z

Read current version: https://github.com/cockroachdb/cockroach/blob/spencerkimball/version-migrations-rfc/docs/RFCS/20170815_version_migration.md

cockroach-teamcity · 2017-07-11T01:50:31Z

This change is

spencerkimball · 2017-07-11T01:50:33Z

+cc @irfansharif

a-robinson · 2017-07-11T03:39:54Z

Review status: 0 of 1 files reviewed at latest revision, 9 unresolved discussions, some commit checks failed.

docs/RFCS/version_migration.md, line 34 at r1 (raw file):

When satisfied with the new version's stability, the operator
confirms that the new version should become the minimum required, at
which point the nodes begin using the Raft WAL storage, and the window
for downgrading is closed.

The problem, of course, is that there's no way out in situations where the new feature is the cause of instability in the new version. A rollback path could be implemented for some features but probably not all, so this is worth mentioning as a tradeoff.

docs/RFCS/version_migration.md, line 58 at r1 (raw file):

A node's version will be supplied with every RPC

If the minimum version is going to be a cluster-wide setting rather than configured per-RPC type, could it just be checked when opening/accepting new connections? Internal grpcs reconnects might cause problems across server restarts, but seemingly only if we can't trust nodes to behave. Either way, I want to add some similar logic for checking the cluster ID to fix #15801/#15898

docs/RFCS/version_migration.md, line 70 at r1 (raw file):

  which is newer than the receiver's minimum required version, an
  error is returned that indicates the sender should backoff and retry
  (while the receiver waits for new gossip).

Gossip RPCs can't rely on this logic if gossip is needed in order to unblock the logic.

docs/RFCS/version_migration.md, line 82 at r1 (raw file):

message ClusterVersion {
int minimum_required_version = 1 [(gogoproto.nullable) = false];

Nit, but is this really going to be an int? It might get confusing to try mapping the ints back to release versions

docs/RFCS/version_migration.md, line 123 at r1 (raw file):

SET CLUSTER VERSION

Nit, but I don't see why we wouldn't include the word "minimum" here to clarify what the effect it has, i.e. SET CLUSTER SETTING minimum_version = <foo>;

docs/RFCS/version_migration.md, line 126 at r1 (raw file):

A valid version must be equal to
the advertised version of every active node in the cluster

I don't think that should be required, it just must be greater than the previous minimum version and shouldn't be greater than any active node. Meaning that if the running nodes are at version 5 but the current minimum version is 3, it should be legal to bump the version to 4 (to rule out some downgrades but not all).

docs/RFCS/version_migration.md, line 142 at r1 (raw file):

KV layer.

# Drawbacks

Other drawback, as alluded to above: Using a single setting for all new features makes it tough to support rolling back even if the feature could reasonably support being disabled after being turned on (as in #16809)

docs/RFCS/version_migration.md, line 149 at r1 (raw file):

the option to downgrade.

# Alternatives

I think the two most obvious alternatives were laid out in https://github.com/cockroachdb/cockroach/blob/master/docs/RFCS/cluster_upgrade_tool.md#long-term-design:

Have a separate setting/migration that the operator triggers for each new non-backward compatible feature. This allows us to deprecate old code independently and operators to control the introduction of new features somewhat more carefully, at the cost of being more complicated for less sophisticated users.
Automatically do upgrades when they're required for a node to boot up (i.e. once the old code has been removed) if we're sure that all running nodes are at a new enough version to support them. I'd expect this is far too magical to be a good solution for most operators' tastes, though.

docs/RFCS/version_migration.md, line 153 at r1 (raw file):

TODO

# Unresolved questions

This leaves out an explanation of how we deprecate old behavior. New versions of CockroachDB will need to come with a minimum ClusterVersion at which they're able to safely run. In other words, say we remove support for storing the raft log in the same rocksdb instance as user keys. Binaries that require dedicated raft log storage would have to require a minimum ClusterVersion setting before they're able to start. This was a big part of the writeup in the cluster version tool RFC.

Comments from Reviewable

knz

I buy the general impetus behind the RFC but it doesn't seem forward-thinking enough. Let me explain.

To give some context there are two industry domains you already know about where this has been a major issue and where the solution ended up not being based on a version number: CPU compatibility and OS compatibility.

The issue with compatibility checks using a monotonically increasing number is that it shunts any opportunity to do "side-ways" variants of the product with different feature sets. Suppose we want to implement some CCL features that are version-incompatible, the mechanism here wouldn't help. There are other scenarios which I can think of which are not immediately relevant but could become so (such as 3rd-party protocol-compatible extensions)

The solution the industry has commonly adopted is "feature bits". Usually a bitmask, with a centralized, append-only registry that maps bit positions to specific features. With CPUs you will see this in /proc/cpuinfo on linux, with OSs this is what autoconf and gnulib provide.

The rest of your RFC would apply, modulo perhaps using a virtual table to set/expose the currently enabled/required features instead of a single version number.

knz · 2017-07-11T10:39:51Z

docs/RFCS/version_migration.md

+  an error and panics.
+
+The minimum required version will be stored and gossiped as a cluster
+setting. The value of the minimum required version is stored in a


It's not clear to me why we need a protobuf as opposed to (say) a string with semver format.

knz · 2017-07-11T10:40:21Z

docs/RFCS/version_migration.md

+## SQL syntax
+
+The operator can upgrade the cluster's version in order to use new
+features available at that version via the `SET CLUSTER VERSION


Agreed with alex here we should recycle the existing feature set cluster setting instead of inventing yet one new set/show syntax format.

tbg · 2017-07-11T18:01:11Z

Basic direction LGTM. Two comments:

First, I too think that an all-encompassing version number could be too limiting. Even if that's what we expose to users, I think it makes sense internally to think of capabilities more than some "version". It's not much more complexity - there's still enough ordering, and it buys us flexibility. If we have three different features, it would be good to be able to activate them one by one in line with what @knz pointed out in the Github comments.

Second, maybe I missed it but I'm not clear how nodes have the min version available when they start. Some migrations affect the on-disk format or location (think the Raft change), so nodes shouldn't have to connect to Gossip to know where they are. Simple solution: whenever a minversion (or change to the settings if we diversify them) is received, write them to disk.

Mainly as an exercise for myself (but perhaps helpful for driving discussion) here's how the process could look like for the Raft change (assuming that we don't change the Raft on-the-wire format, something we think we can do as just discussed with @irfansharif and @a-robinson):

rolling restart to get cluster to 1.1 binary
require all nodes to run a version that in principle supports dedicated Raft storage: SET CLUSTER SETTING min_version.1.1.dedicated_raft_storage = 'enable';
all nodes receive this and write it to disk
now nodes at 1.0 won't work any more (though we have to think about how that is achieved since we can't change 1.0 at this point)
operator gets feedback that nodes are in need of a restart to operate optimally: ui and periodically logs
rolling restart. on restart, 1.1 binary sees that dedicated raft storage is enabled, so it migrates the raft storage to use dedicated engine.
ui shows that all nodes are happy

15 minutes later: operator decides to roll back since an unrelated change in 1.1 gives them trouble.

hey, this isn't really possible! We need a way to put on disk of the individual nodes the fact that they are supposed to undo the dedicated Raft engine migration. Perhaps the following:

SET CLUSTER SETTING min_version.1.1.dedicated_raft_storage = 'disable';
nodes receive via gossip, write marker to disk
cue operator to rolling restart
nodes migrate back to integrated storage
cluster basically happy, but points out somewhere that a required feature is not enabled
rolling restart to 1.0
cluster happy

This is a far cry from just bumping a "min version". Maybe I'm messing some things up here, but I think that is the basic playbook that we need spelled out for the mechanism proposed here.

Reviewed 1 of 1 files at r1.
Review status: all files reviewed at latest revision, 14 unresolved discussions, some commit checks failed.

docs/RFCS/version_migration.md, line 58 at r1 (raw file):

Every node knows the version it's running, as the version is baked
into the build. A node's version will be supplied with every RPC

We have long-lived connections, so it would be better to make a service that exchanges version information (the absence of that service counts as v1.0 or below).

docs/RFCS/version_migration.md, line 61 at r1 (raw file):

(e.g. Raft, Node, DistSQL, and Gossip). In addition, there will be a
cluster-wide minimum required version, set by the operator. Each node
also sends the minimum required version its operating under with each

it's

docs/RFCS/version_migration.md, line 66 at r1 (raw file):

- RPCs from nodes with versions older than the minimum required
  version will fail with an error that indicates the sender should
  panic.

s/panic/fatal with a descriptive error message/

docs/RFCS/version_migration.md, line 76 at r1 (raw file):

Previously, knz (kena) wrote…

It's not clear to me why we need a protobuf as opposed to (say) a string with semver format.

According to your comment it would make sense, when the version is a more granular type of object that contains information on individual features.

docs/RFCS/version_migration.md, line 153 at r1 (raw file):

Previously, a-robinson (Alex Robinson) wrote…

This leaves out an explanation of how we deprecate old behavior. New versions of CockroachDB will need to come with a minimum ClusterVersion at which they're able to safely run. In other words, say we remove support for storing the raft log in the same rocksdb instance as user keys. Binaries that require dedicated raft log storage would have to require a minimum ClusterVersion setting before they're able to start. This was a big part of the writeup in the cluster version tool RFC.

Yeah, I was assuming that each running version would leave an on-disk marker (have a feeling we're already doing that) about where it's at, so that you couldn't, say, run 1.2 from a 1.0 directory.

Comments from Reviewable

bdarnell · 2017-07-11T19:30:01Z

Review status: all files reviewed at latest revision, 14 unresolved discussions, some commit checks failed.

docs/RFCS/version_migration.md, line 34 at r1 (raw file):

Previously, a-robinson (Alex Robinson) wrote…

When satisfied with the new version's stability, the operator
confirms that the new version should become the minimum required, at
which point the nodes begin using the Raft WAL storage, and the window
for downgrading is closed.

The problem, of course, is that there's no way out in situations where the new feature is the cause of instability in the new version. A rollback path could be implemented for some features but probably not all, so this is worth mentioning as a tradeoff.

There are two steps here (staging the binary and enabling the new features), which can be made reversible independently. If the new features cause problems, the admin should ideally be able to switch back. For example, we've talked about changing the DROP TABLE statement to an alternative faster implementation. This would be controlled by the feature switch, and once you've dropped a table with the new implementation, you can still switch back to the old implementation if you want (e.g. if the new reversion markers cause performance problems). You can't, however, downgrade to the old binary, because the old binary doesn't know how to understand the reversion markers that will still exist on disk.

Of course, none of this helps if the new features have catastrophic bugs, but that's just something we'll have to test for no matter what we do around migrations. (I expect that downgrades will primarily be motivated by performance concerns or incompatibilities with idiosyncratic workloads.) Some features (like the raft log split) are less amenable to this kind of reversible upgrade than others, so it may not always be practical to allow this switch to be reversed.

docs/RFCS/version_migration.md, line 58 at r1 (raw file):

Previously, a-robinson (Alex Robinson) wrote…

A node's version will be supplied with every RPC

If the minimum version is going to be a cluster-wide setting rather than configured per-RPC type, could it just be checked when opening/accepting new connections? Internal grpcs reconnects might cause problems across server restarts, but seemingly only if we can't trust nodes to behave. Either way, I want to add some similar logic for checking the cluster ID to fix #15801/#15898

Yeah, a per-connection handshake would be ideal. I'm not sure how easy it is to fit that into GRPC, though. Attaching the version (as an HTTP/2 header) should be fairly cheap if it's not feasible to do something per-connection.

docs/RFCS/version_migration.md, line 70 at r1 (raw file):

Previously, a-robinson (Alex Robinson) wrote…

Gossip RPCs can't rely on this logic if gossip is needed in order to unblock the logic.

I'm not sure this step is necessary. As long as our version is at least as new as the incoming RPC's minimum, we should be able to handle it correctly.

docs/RFCS/version_migration.md, line 82 at r1 (raw file):

Previously, a-robinson (Alex Robinson) wrote…

Nit, but is this really going to be an int? It might get confusing to try mapping the ints back to release versions

Might be better to make it a bitmask of feature flags. An int64 will last us a while.

docs/RFCS/version_migration.md, line 126 at r1 (raw file):

Previously, a-robinson (Alex Robinson) wrote…

A valid version must be equal to
the advertised version of every active node in the cluster

I don't think that should be required, it just must be greater than the previous minimum version and shouldn't be greater than any active node. Meaning that if the running nodes are at version 5 but the current minimum version is 3, it should be legal to bump the version to 4 (to rule out some downgrades but not all).

I think we can and should allow the minimum version to move backwards. But each version should have a "MINIMUM_MINIMUM_VERSION` and not allow the minimum to be set below that. (so 1.1 would have MINIMUM_MINIMUM_VERSION=1.0, 1.2 would probably have MINIMUM_MINIMUM_VERSION=1.1, etc. Eventually we could open this up to a wider spread. A node would commit suicide if it sees a minimum version below its minimum minimum)

docs/RFCS/version_migration.md, line 142 at r1 (raw file):

Previously, a-robinson (Alex Robinson) wrote…

Other drawback, as alluded to above: Using a single setting for all new features makes it tough to support rolling back even if the feature could reasonably support being disabled after being turned on (as in #16809)

A single setting could still allow for rollbacks, but it wouldn't allow for independent rollbacks (e.g. should you be able to configure the cluster to use fast DROP TABLE but not separate raft storage?) The main UI should definitely be a single "upgrade to 1.1" command, but I'm not sure whether we want to decompose that into separate settings under the hood.

Comments from Reviewable

spencerkimball · 2017-07-12T21:42:06Z

docs/RFCS/version_migration.md, line 58 at r1 (raw file):

Previously, a-robinson (Alex Robinson) wrote…

A node's version will be supplied with every RPC

If the minimum version is going to be a cluster-wide setting rather than configured per-RPC type, could it just be checked when opening/accepting new connections? Internal grpcs reconnects might cause problems across server restarts, but seemingly only if we can't trust nodes to behave. Either way, I want to add some similar logic for checking the cluster ID to fix #15801/#15898

That's a better idea. I've changed it to be more specifically at the root gRPC level and added cluster ID here. I also mentioned issues #15801/#15898 in the RFC.

Comments from Reviewable

spencerkimball · 2017-07-12T22:57:36Z

I think your first point, while valid from the perspective of someone surely appreciating it and wanting to use it, is going to be too complex to live with in practice.

In particular, a fine-grained upgrade path providing per-feature upgrades is going to exponentially multiply the testing paths. Granted, this explosion of testing scenarios assumes there is the potential for interactions between features.

Even if we're just talking about 2 features in a version, we can order the various upgrade paths as 1 (feature 1, skip feature 2), 2 (feature 2, skip feature 1), 12 (feature 1 then feature 2), 21 (feature 2 then feature 1). 3 features is considerably worse: 1, 2, 3, 12, 13, 21, 31, 23, 32, 123, 132, 213, 231, 312, 321.

NOTE: this doesn't even take into consideration the complexity if an operator chooses not to upgrade a particular feature until two or more versions after it was first introduced. Perhaps I'm missing something that would make this simpler for an operator, or for us to provide support for, but from my current conception of the problem, this isn't a viable solution.

I think the right solution here is to simply provide normal cluster settings on a per-feature basis which can be used to enable or disable a particular new feature, if the feature has potential for poor performance or instability. We've already been doing this for quite some time now. I think it's orthogonal to this RFC.

In response to your second point, I've updated the RFC to store ClusterVersion setting as a per-store store-local key, similar to the gossip bootstrap info. This can be used before connecting to gossip.

In response to your example which notes the difficulty of providing a reverse migration: I think that's just going to be the hard fact of life here. See my new diagram recommending an incremental backup before committing to a new required server version. That's the heavyweight disaster recovery option, but it's viable and it requires no additional complexity. With revert, for example, there is no mechanism that's obvious to me that would allow an operator to rollback to the previous version while maintaining consistency.

Review status: all files reviewed at latest revision, 14 unresolved discussions, some commit checks failed.

docs/RFCS/version_migration.md, line 34 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

There are two steps here (staging the binary and enabling the new features), which can be made reversible independently. If the new features cause problems, the admin should ideally be able to switch back. For example, we've talked about changing the DROP TABLE statement to an alternative faster implementation. This would be controlled by the feature switch, and once you've dropped a table with the new implementation, you can still switch back to the old implementation if you want (e.g. if the new reversion markers cause performance problems). You can't, however, downgrade to the old binary, because the old binary doesn't know how to understand the reversion markers that will still exist on disk.

Of course, none of this helps if the new features have catastrophic bugs, but that's just something we'll have to test for no matter what we do around migrations. (I expect that downgrades will primarily be motivated by performance concerns or incompatibilities with idiosyncratic workloads.) Some features (like the raft log split) are less amenable to this kind of reversible upgrade than others, so it may not always be practical to allow this switch to be reversed.

@bdarnell, I think the per-feature switches you mention here should be the kinds of cluster settings we've already been using, and are orthogonal to this document. Just want to make sure that's what you're suggesting in your response to @a-robinson's comment.

As to catastrophic bugs introduced by new features, see the new upgrade flowchart I've added to the RFC. I think our only viable option from a complexity standpoint is to encourage a backup before migration.

docs/RFCS/version_migration.md, line 58 at r1 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

We have long-lived connections, so it would be better to make a service that exchanges version information (the absence of that service counts as v1.0 or below).

I've decided to simplify things here by making a guarantee of correctness in the face of "rogue" nodes a non-goal. Because of our long-lived connections, and the fact that the required_server_version setting may change at any time, we would otherwise have to send information on a per-RPC basis to have iron-clad guarantees. This is further complicated by gossip not being allowed to perform checks with the same strictness we'd need for Raft, DistSQL, and KV RPCs because it's the medium by which we transmit the required_server_version cluster setting to all nodes.

After struggling finding a way forward that is straightforward but also foolproof, I've decided to go with straightforward and not foolproof. The underlying assumptions guiding this decision are:

rogue nodes are rare on upgrades (rogue means re-surface and send RPCs from out-of-date server version).
checks on gossip of updated ClusterVersion will happen quickly enough to ameliorate the risk posed by rogue nodes.

I've added a section to Drawbacks.

docs/RFCS/version_migration.md, line 61 at r1 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

it's

Done.

docs/RFCS/version_migration.md, line 70 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

I'm not sure this step is necessary. As long as our version is at least as new as the incoming RPC's minimum, we should be able to handle it correctly.

I've dropped this section.

docs/RFCS/version_migration.md, line 76 at r1 (raw file):

Previously, knz (kena) wrote…

It's not clear to me why we need a protobuf as opposed to (say) a string with semver format.

I'm expecting this mechanism to gain more detail as the product evolves.

docs/RFCS/version_migration.md, line 82 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Might be better to make it a bitmask of feature flags. An int64 will last us a while.

It now specifies major and minor versions. As mentioned in my larger comment responding to @tschottdorf, I think a bitmask of feature flags is going to be too difficult to test and support. Better to allow that kind of enabled/disabled functionality through the use of feature-specific cluster settings.

docs/RFCS/version_migration.md, line 123 at r1 (raw file):

Previously, knz (kena) wrote…

Agreed with alex here we should recycle the existing feature set cluster setting instead of inventing yet one new set/show syntax format.

I really think we want a verification step here. Is that still possible if we just use the standard set cluster setting syntax?

docs/RFCS/version_migration.md, line 123 at r1 (raw file):

Previously, a-robinson (Alex Robinson) wrote…

SET CLUSTER VERSION

Nit, but I don't see why we wouldn't include the word "minimum" here to clarify what the effect it has, i.e. SET CLUSTER SETTING minimum_version = <foo>;

Done.

docs/RFCS/version_migration.md, line 126 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

I think we can and should allow the minimum version to move backwards. But each version should have a "MINIMUM_MINIMUM_VERSION` and not allow the minimum to be set below that. (so 1.1 would have MINIMUM_MINIMUM_VERSION=1.0, 1.2 would probably have MINIMUM_MINIMUM_VERSION=1.1, etc. Eventually we could open this up to a wider spread. A node would commit suicide if it sees a minimum version below its minimum minimum)

@a-robinson: your suggestion implies an increase in support complexity. For example, we're currently proposing to support only the last two versions of our product. If we allow a minimum version older than the previous version, then we're implicitly promising to support customers who are running a server two or more versions behind. While I can understand the benefit this kind of optionality would provide to some particulars-minded operators, I think it's going to be a practical mistake both for operators and for support.

@bdarnell, I don't understand your proposal to allow the minimum required version to move backwards.

docs/RFCS/version_migration.md, line 142 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

A single setting could still allow for rollbacks, but it wouldn't allow for independent rollbacks (e.g. should you be able to configure the cluster to use fast DROP TABLE but not separate raft storage?) The main UI should definitely be a single "upgrade to 1.1" command, but I'm not sure whether we want to decompose that into separate settings under the hood.

Mentioned in the drawbacks.

docs/RFCS/version_migration.md, line 149 at r1 (raw file):

Previously, a-robinson (Alex Robinson) wrote…

I think the two most obvious alternatives were laid out in https://github.com/cockroachdb/cockroach/blob/master/docs/RFCS/cluster_upgrade_tool.md#long-term-design:

Have a separate setting/migration that the operator triggers for each new non-backward compatible feature. This allows us to deprecate old code independently and operators to control the introduction of new features somewhat more carefully, at the cost of being more complicated for less sophisticated users.

Automatically do upgrades when they're required for a node to boot up (i.e. once the old code has been removed) if we're sure that all running nodes are at a new enough version to support them. I'd expect this is far too magical to be a good solution for most operators' tastes, though.

Done.

docs/RFCS/version_migration.md, line 153 at r1 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

Yeah, I was assuming that each running version would leave an on-disk marker (have a feeling we're already doing that) about where it's at, so that you couldn't, say, run 1.2 from a 1.0 directory.

This is now covered in the document. Each server will have the current version and the prior version baked into the binary. The prior version must be known to account for major version increments. At startup, these two values are compared to the ClusterVersion to ensure that the server is neither too old nor too new (i.e. skipped on or more versions).

Comments from Reviewable

bdarnell · 2017-07-14T17:05:08Z

Review status: 0 of 2 files reviewed at latest revision, 14 unresolved discussions, all commit checks successful.

docs/RFCS/version_migration.md, line 34 at r1 (raw file):

Previously, spencerkimball (Spencer Kimball) wrote…

@bdarnell, I think the per-feature switches you mention here should be the kinds of cluster settings we've already been using, and are orthogonal to this document. Just want to make sure that's what you're suggesting in your response to @a-robinson's comment.

As to catastrophic bugs introduced by new features, see the new upgrade flowchart I've added to the RFC. I think our only viable option from a complexity standpoint is to encourage a backup before migration.

Yes, the feature switches I'm talking about are the same kind of settings that we use today. (although I think it's important that the admin interface for this is one command to "upgrade to 1.1", and not setting N different cluster settings manually)

docs/RFCS/version_migration.md, line 126 at r1 (raw file):

Previously, spencerkimball (Spencer Kimball) wrote…

@a-robinson: your suggestion implies an increase in support complexity. For example, we're currently proposing to support only the last two versions of our product. If we allow a minimum version older than the previous version, then we're implicitly promising to support customers who are running a server two or more versions behind. While I can understand the benefit this kind of optionality would provide to some particulars-minded operators, I think it's going to be a practical mistake both for operators and for support.

@bdarnell, I don't understand your proposal to allow the minimum required version to move backwards.

My proposal here (the "minimum minimum" stuff) is that if we allow the required version to move backwards, we should not allow it to move backwards too far. (And this also lays the groundwork for a future in which we may support upgrading more than one step at a time)

I think we should allow the required version to move backwards (to make the upgrade process as reversible as possible). This may not reverse everything (for example, setting the required version back to 1.0 from 1.1 would revert "drop table" to its original implementation but would probably leave the raft logs in the new format), and it is explicitly not a goal to make it possible to move back to an older binary, but in case there are bugs in a new feature (or performance degradation, etc), it's best to be able to revert the change (since cluster settings propagate nearly instantly across the cluster without a chance for canarying).

docs/RFCS/version_migration.md, line 74 at r2 (raw file):

message Version {
    int major = 1 [(gogoproto.nullable) = false];
    int minor = 2 [(gogoproto.nullable) = false];

What about clusters that track the master branch or alpha releases (e.g. our own test clusters)? Such a cluster may need to be upgraded multiple times during the 1.x development process, so I think we need to be able to express versions in between full releases here. (This is one of the advantages of the bitmask).

docs/RFCS/version_migration.md, line 97 at r2 (raw file):

less than a node's `minimum_version`, the node will panic.

## Upgrade process

Document the bootstrapping process: When a new cluster is initialized, its minimum_version is set to the version used to do the bootstrapping. When version 1.1 is executed for the first time on an existing store, it initializes the minimum version to 1.0.

Comments from Reviewable

spencerkimball · 2017-07-14T20:34:07Z

Review status: 0 of 2 files reviewed at latest revision, 14 unresolved discussions, all commit checks successful.

docs/RFCS/version_migration.md, line 66 at r1 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

s/panic/fatal with a descriptive error message/

Done.

docs/RFCS/version_migration.md, line 126 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

My proposal here (the "minimum minimum" stuff) is that if we allow the required version to move backwards, we should not allow it to move backwards too far. (And this also lays the groundwork for a future in which we may support upgrading more than one step at a time)

I think we should allow the required version to move backwards (to make the upgrade process as reversible as possible). This may not reverse everything (for example, setting the required version back to 1.0 from 1.1 would revert "drop table" to its original implementation but would probably leave the raft logs in the new format), and it is explicitly not a goal to make it possible to move back to an older binary, but in case there are bugs in a new feature (or performance degradation, etc), it's best to be able to revert the change (since cluster settings propagate nearly instantly across the cluster without a chance for canarying).

This is iffy because it leaves the cluster and the operator in a hard-to-fathom partial state, where some features might still be active and others not active.

We do have the concept of a minimum minimum version as described in the Detailed Design section. In addition to the baked-in current server version, there's a minimum supported version, which keeps a too-new binary from reading too-old data at startup. It seems like we could support downgrade of the ClusterVersion.minimum_version to that value and no lower.

In order to accommodate the downgrades you're suggesting here, I've added a new use_version to ClusterVersion. This can be decreased no lower than a node's baked-in minimum supported version. ClusterVersion.minimum_version on the other hand, can only increase monotonically. This prevents nodes from downgrading the binary such that it can no longer understand side effects introduced while the cluster had use_version set higher.

docs/RFCS/version_migration.md, line 74 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

What about clusters that track the master branch or alpha releases (e.g. our own test clusters)? Such a cluster may need to be upgraded multiple times during the 1.x development process, so I think we need to be able to express versions in between full releases here. (This is one of the advantages of the bitmask).

I've added a micro field to the Version message.

docs/RFCS/version_migration.md, line 97 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Document the bootstrapping process: When a new cluster is initialized, its minimum_version is set to the version used to do the bootstrapping. When version 1.1 is executed for the first time on an existing store, it initializes the minimum version to 1.0.

This is documented in the "Rollout" section of this document. My idea was to have the ClusterVersion that's gossiped bootstrapped to minimum_version = Version{major=0, minor=0, micro=0}.

Comments from Reviewable

bdarnell · 2017-07-15T17:09:04Z

Review status: 0 of 2 files reviewed at latest revision, 13 unresolved discussions, all commit checks successful.

docs/RFCS/version_migration.md, line 97 at r2 (raw file):

Previously, spencerkimball (Spencer Kimball) wrote…

This is documented in the "Rollout" section of this document. My idea was to have the ClusterVersion that's gossiped bootstrapped to minimum_version = Version{major=0, minor=0, micro=0}.

I don't see anything that says that when a brand new cluster is initialized, it gets a non-zero cluster version (because there's no reason to allow downgrades in a brand new cluster, and new features should be usable immediately in that case)

Comments from Reviewable

tbg · 2017-07-16T00:57:05Z

It makes me nervous that operators can *never* burn in the actual new features, but the mechanism here should be good enough to migrate the migration process if need be, and reduction in scope is good at this point. Just to be sure I understand this right -- the Raft change would migrate storage right away or only when booting with min version 1.1? Latter case requires an extra rolling restart (awkward). Can you walk me through that example? Note that that feature also adds a field to ReplicatedEvalResult (that part can definitely not be on until after the min version bump). Perhaps @irfansharif can chime in as well.

a-robinson · 2017-07-17T14:14:25Z

Reviewed 1 of 1 files at r3.
Review status: 1 of 2 files reviewed at latest revision, 19 unresolved discussions, all commit checks successful.

docs/RFCS/version_migration.md, line 123 at r1 (raw file):

Previously, spencerkimball (Spencer Kimball) wrote…

I really think we want a verification step here. Is that still possible if we just use the standard set cluster setting syntax?

Yeah, we can create arbitrary validation functions. See RegisterValidatedStringSetting at https://github.com/cockroachdb/cockroach/blob/master/pkg/settings/string.go#L88

docs/RFCS/version_migration.md, line 6 at r2 (raw file):

- Authors: Spencer Kimball
- RFC PR: (PR # after acceptance of initial draft)
- Cockroach Issue(s): [#15801]((https://github.com/cockroachdb/cockroach/issues/15801) [#15898](https://github.com/cockroachdb/cockroach/issues/15898)

Saying these are related to version migration or that either of them motivated these plans is quite a stretch

docs/RFCS/version_migration.md, line 61 at r3 (raw file):

into the build. Nodes gossip a `NodeDescriptor`, which will be
augmented to include the baked-in server version. Nodes also have a
minimum-supported version baked-in, which is used to sanity check that

I think s/minimum-supported version/minimum-supported minimum version/ would be clearer

docs/RFCS/version_migration.md, line 75 at r3 (raw file):

    int major = 1 [(gogoproto.nullable) = false];
    int minor = 2 [(gogoproto.nullable) = false];
    int micro = 3 [(gogoproto.nullable) = false];

s/micro/patch (a la http://semver.org/)

docs/RFCS/version_migration.md, line 88 at r3 (raw file):

`minimum_version` is greater than the baked-in server version. Each
store persists `ClusterVersion` to disk at a store-local key (similar
to gossip bootstrap info). At startup, each node reads the

I think the RFC would benefit from a brief explanation of why this is needed rather than doing the validation in an initial connection handshake. I assume it's due to potential incompatibilities with the data stored on disk, but that should be called out more explicitly to justify this extra logic.

Also, I'm not sure that each node's ClusterVersion is needed, just the max / most recent ClusterVersion seen.

docs/RFCS/version_migration.md, line 114 at r3 (raw file):

- Verify 1.1 stability without using features requiring 1.1.
- Successful burn-in? **NO**: perform a rolling downgrade to 1.0.
- Otherwise, set the cluster-wide minimum version to 1.1,

For what it's worth, another backup before this step would be a good idea

docs/RFCS/version_migration.md, line 142 at r3 (raw file):

New features which require a particular version are disallowed if that
version is less than the `use_version` set for the cluster. Each

s/less/greater/, right? The features are only disallowed if they require a newer version than what the cluster is using.

docs/RFCS/version_migration.md, line 150 at r3 (raw file):

This is the case with both the Revert and Raft log
examples mentioned in the Motivation section of this document.

Not a big deal, but it's not necessarily the case for the raft log example

Comments from Reviewable

bdarnell · 2017-07-17T15:12:39Z

Review status: 1 of 2 files reviewed at latest revision, 19 unresolved discussions, all commit checks successful.

docs/RFCS/version_migration.md, line 123 at r1 (raw file):

Previously, a-robinson (Alex Robinson) wrote…

Yeah, we can create arbitrary validation functions. See RegisterValidatedStringSetting at https://github.com/cockroachdb/cockroach/blob/master/pkg/settings/string.go#L88

The validation we want here is not just "is this a well-formed version" but "is the cluster ready to be set to this version" (i.e. are there any nodes known in gossip that are older). Can the setting validation function do that?

docs/RFCS/version_migration.md, line 75 at r3 (raw file):

Previously, a-robinson (Alex Robinson) wrote…

s/micro/patch (a la http://semver.org/)

This is distinctly not the semver patch component, but we should be clearer about that. It's an extra value to be used during development, since we'll have intermediate versions between 1.0.x and 1.1.0. (or could we use semver's patch version starting with a negative value? so we'd be at 1.1.-100 now, then 1.1.-99 for the first migration, etc)

Comments from Reviewable

a-robinson · 2017-07-17T16:37:18Z

Review status: 1 of 2 files reviewed at latest revision, 18 unresolved discussions, all commit checks successful.

docs/RFCS/version_migration.md, line 123 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

The validation we want here is not just "is this a well-formed version" but "is the cluster ready to be set to this version" (i.e. are there any nodes known in gossip that are older). Can the setting validation function do that?

It would be somewhat awkward to ensure the validation function has access to gossip, but certainly possible if we thought the user experience was better.

docs/RFCS/version_migration.md, line 75 at r3 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

This is distinctly not the semver patch component, but we should be clearer about that. It's an extra value to be used during development, since we'll have intermediate versions between 1.0.x and 1.1.0. (or could we use semver's patch version starting with a negative value? so we'd be at 1.1.-100 now, then 1.1.-99 for the first migration, etc)

Ah, ok. That make sense, it'd just be good to indicate that then.

Comments from Reviewable

irfansharif · 2017-07-17T17:24:26Z

I echo @tschottdorf's comments in that it'd be nice to be walked through an example of what this would look like with respect to the raft log changes (also saves me some work, ha). Left some comments on the bits where it is unclear to me.

Review status: 1 of 2 files reviewed at latest revision, 30 unresolved discussions, all commit checks successful.

docs/RFCS/version_migration.md, line 126 at r1 (raw file):

Previously, spencerkimball (Spencer Kimball) wrote…

This is iffy because it leaves the cluster and the operator in a hard-to-fathom partial state, where some features might still be active and others not active.

We do have the concept of a minimum minimum version as described in the Detailed Design section. In addition to the baked-in current server version, there's a minimum supported version, which keeps a too-new binary from reading too-old data at startup. It seems like we could support downgrade of the ClusterVersion.minimum_version to that value and no lower.

In order to accommodate the downgrades you're suggesting here, I've added a new use_version to ClusterVersion. This can be decreased no lower than a node's baked-in minimum supported version. ClusterVersion.minimum_version on the other hand, can only increase monotonically. This prevents nodes from downgrading the binary such that it can no longer understand side effects introduced while the cluster had use_version set higher.

Worth repeating these comments in the RFC about how use_version is to be used.
Additionally it is unclear to me between point releases how many use_version "steps" we go through.

docs/RFCS/version_migration.md, line 28 at r3 (raw file):

One example is the use of different storage for Raft's write ahead log
(#16809). Having nodes begin using this new functionality means they

Could be hyperlinked to the PR.

docs/RFCS/version_migration.md, line 29 at r3 (raw file):

One example is the use of different storage for Raft's write ahead log
(#16809). Having nodes begin using this new functionality means they
cannot be downgraded without potential loss of data. To address this,

nit: following the discussions here "downgrade" seems overloaded (for me at least, are we staging older binaries or are we turning new feature flags off?). Worth being explicit here (or a clarifying note above given its use throughout), because in #16809, during the TRANSITIONING period, you can "migrate backwards" safely without loss of data.

docs/RFCS/version_migration.md, line 80 at r3 (raw file):

message ClusterVersion {
    Version minimum_version = 1 [(gogoproto.nullable) = false];
    Version use_version = 2 [(gogoproto.nullable) = false];

Worth collating some documentation here re: {minimum,use}_version, especially for use_version. It's scattered throughout the rest of the RFC.

docs/RFCS/version_migration.md, line 86 at r3 (raw file):

Nodes will listen for changes to the gossiped `ClusterVersion` value
and fatal with a descriptive error in the event that the
`minimum_version` is greater than the baked-in server version. Each

nit: s/.../the baked-in server version is less than the minimum version, this ordering is used below.

docs/RFCS/version_migration.md, line 88 at r3 (raw file):

I think the RFC would benefit from a brief explanation of why this is needed rather than doing the validation in an initial connection handshake.

ditto.

docs/RFCS/version_migration.md, line 94 at r3 (raw file):

the node's baked-in server version is less than the `minimum_version`
or the node's baked-in minimum-supported server version is greater
than the `minimum_version`, the node will exit with an error.

Perhaps just me but can you clarify this? i.e. if the node's baked-in minimum-supported server version is greater than the gossiped minimum_version, why should it exit with an error? Naively (I know less than little about gossip), I would expect this to happen as the differing ClusterVersions propagate through gossip addressing each other. Or perhaps I'm not understanding exactly what the "roll-out" process is.

docs/RFCS/version_migration.md, line 98 at r3 (raw file):

If a gossiped ClusterVersion is received where minimum_version

This has been repeated three times now.

docs/RFCS/version_migration.md, line 112 at r3 (raw file):

  without upping the cluster-wide minimum version, no
  features requiring 1.1 can be run.
- Verify 1.1 stability without using features requiring 1.1.

What are we really verifying here if not using the new features?

docs/RFCS/version_migration.md, line 114 at r3 (raw file):

- Verify 1.1 stability without using features requiring 1.1.
- Successful burn-in? **NO**: perform a rolling downgrade to 1.0.
- Otherwise, set the cluster-wide minimum version to 1.1,

Again, naively as someone with no familiarity with gossip, if the new minimum_version setting has been persisted on node A, is it then possible for node A to receive gossip from node B with a older, smaller minimum_version giving rise to the "exit with error" described above?

docs/RFCS/version_migration.md, line 118 at r3 (raw file):

- In the event of a catastrophic failure or corruption due to usage of
  new features requiring 1.1, the only option is to restore from
  backup.  This is a two step process: revert nodes to run version 1.0

Double space here.

docs/RFCS/version_migration.md, line 119 at r3 (raw file):

  new features requiring 1.1, the only option is to restore from
  backup.  This is a two step process: revert nodes to run version 1.0
  and restore from the backup(s).

How would you deal with the new on-disk data generated during an earlier (failed) migration process? The term "reversion markers" was used elsewhere, wouldn't this be a problem if there's another attempt to migrate up?
FWIW I don't have an answer for what a good "clean-up" solution would be, but I can see why it'd be important. Having the new features disambiguate between on-disk state left over from an earlier failed migration vs. a node restart (running the new version before and after) is non-ideal. Or maybe I'm missing something.

docs/RFCS/version_migration.md, line 128 at r3 (raw file):

A valid version must be equal to the advertised version of every active node in the cluster

Document what exactly this means now given the new use_version.

docs/RFCS/version_migration.md, line 142 at r3 (raw file):

Previously, a-robinson (Alex Robinson) wrote…

s/less/greater/, right? The features are only disallowed if they require a newer version than what the cluster is using.

@a-robinson: I think 'less' is right, from what I can tell each feature is now gated on a particular use_version so unless the gossiped cluster wide use_version is higher than the feature's use_version, it is disabled.
@spencerkimball, is this the case? Each new feature between point releases is essentially "assigned" a use_version to get gated on?

nit: Don't like use_version as a name though I don't know of anything better.

docs/RFCS/version_migration.md, line 146 at r3 (raw file):

before being run, and returning an error if not supported.

In some cases, this will be done at the SQL layer; in others, at the

One thing that's unclear to me is when and where exactly does the actual "migration" take place? By that I mean consider #16809, there needs to be an explicit store level migration to copy over all raft data from the old engine to the new (for obvious correctness reasons), when and where will this actually take place? This was a large motivator in the intermediary TRANSITIONING state I experimented with in #16809.
(I imagine there will be future changes too where we'll need some form of store/node level migration. Just came across #12154 which, though minor, fits the bill me thinks.)

docs/RFCS/version_migration.md, line 176 at r3 (raw file):

sufficiently granular for some operators, who would prefer to enable
new functionality feature by feature to isolate potential impact on
performance and stability.

drive-by comment: if we do have backwards incompatible changes lined up for future releases gated behind cluster settings/env vars, we can always encourage the operator try it out first (on staged environments, presumably) on late patch releases (think GO15VENDOREXPERIMENT).

Comments from Reviewable

spencerkimball · 2017-07-17T20:58:46Z

Yes, you'd only start using the new Raft storage after running UPGRADE TO 1.1. That step ups the minimum_version and the use_version to 1.1. If you followed up with a downgrade via UPGRADE TO 1.0, minimum_version would remain at 1.1, but use_version would be 1.0. The new Raft log storage would still be in use as we are not planning a downgrade path.

@irfansharif, it would be great if you could contribute more detail on the Raft log work because I'm not nearly as familiar with the complexities of the migration.

Review status: 1 of 2 files reviewed at latest revision, 30 unresolved discussions, all commit checks successful.

docs/RFCS/version_migration.md, line 123 at r1 (raw file):

Previously, a-robinson (Alex Robinson) wrote…

It would be somewhat awkward to ensure the validation function has access to gossip, but certainly possible if we thought the user experience was better.

I think this is crucial. Things could get very messy without some cluster-wide sanity checking.

docs/RFCS/version_migration.md, line 126 at r1 (raw file):

Previously, irfansharif (irfan sharif) wrote…

Worth repeating these comments in the RFC about how use_version is to be used.
Additionally it is unclear to me between point releases how many use_version "steps" we go through.

I added comments to the proposed protobuf. The number of times we require UPGRADE TO commands between point releases depends on how many features require migrations. Not sure why you single out use_version in this comment though. In all but very unusual cases, I'd expect use_version will always be equal to minimum_version.

docs/RFCS/version_migration.md, line 6 at r2 (raw file):

Previously, a-robinson (Alex Robinson) wrote…

Saying these are related to version migration or that either of them motivated these plans is quite a stretch

Yeah, since I decided not to send information with every RPC, I'm removing these mentions.

docs/RFCS/version_migration.md, line 97 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

I don't see anything that says that when a brand new cluster is initialized, it gets a non-zero cluster version (because there's no reason to allow downgrades in a brand new cluster, and new features should be usable immediately in that case)

Done.

docs/RFCS/version_migration.md, line 28 at r3 (raw file):

Previously, irfansharif (irfan sharif) wrote…

Could be hyperlinked to the PR.

Done.

docs/RFCS/version_migration.md, line 29 at r3 (raw file):

Previously, irfansharif (irfan sharif) wrote…

nit: following the discussions here "downgrade" seems overloaded (for me at least, are we staging older binaries or are we turning new feature flags off?). Worth being explicit here (or a clarifying note above given its use throughout), because in #16809, during the TRANSITIONING period, you can "migrate backwards" safely without loss of data.

In this case "downgrade" is referring to an earlier version of the binary which does not understand the new log storage. Doing that would likely result in data loss and corruption. I clarified the meaning.

docs/RFCS/version_migration.md, line 61 at r3 (raw file):

Previously, a-robinson (Alex Robinson) wrote…

I think s/minimum-supported version/minimum-supported minimum version/ would be clearer

That's a mouthful. I'm also not convinced it's an accurate clarification. "minimum-supported version" means that's the earliest version that the server binary can understand data from. Which means if you start the binary with a data directory which contains a ClusterVersion with minimum_version=1.5 and use_version=1.4, the binary must understand data from version 1.4.

docs/RFCS/version_migration.md, line 75 at r3 (raw file):

Previously, a-robinson (Alex Robinson) wrote…

Ah, ok. That make sense, it'd just be good to indicate that then.

Added a comment.

docs/RFCS/version_migration.md, line 80 at r3 (raw file):

Previously, irfansharif (irfan sharif) wrote…

Worth collating some documentation here re: {minimum,use}_version, especially for use_version. It's scattered throughout the rest of the RFC.

Done.

docs/RFCS/version_migration.md, line 86 at r3 (raw file):

Previously, irfansharif (irfan sharif) wrote…

nit: s/.../the baked-in server version is less than the minimum version, this ordering is used below.

Done.

docs/RFCS/version_migration.md, line 88 at r3 (raw file):

Previously, irfansharif (irfan sharif) wrote…

I think the RFC would benefit from a brief explanation of why this is needed rather than doing the validation in an initial connection handshake.

ditto.

I futzed with the comments a bit for clarity.

@a-robinson: not sure what you mean by the last sentence in your comment. The idea here is that each node records the max ClusterVersion it has seen via gossip.

docs/RFCS/version_migration.md, line 94 at r3 (raw file):

Previously, irfansharif (irfan sharif) wrote…

Perhaps just me but can you clarify this? i.e. if the node's baked-in minimum-supported server version is greater than the gossiped minimum_version, why should it exit with an error? Naively (I know less than little about gossip), I would expect this to happen as the differing ClusterVersions propagate through gossip addressing each other. Or perhaps I'm not understanding exactly what the "roll-out" process is.

The section you're referencing here is about verifying that the binary supports the data it encounters at startup, not gossip. The baked-in minimum-supported version is the earliest version of data the binary can understand. This value advances when we phase out old ways of doing things. When that happens, the minimum-supported version advances to the version of whatever replaced what you're removing support for.

docs/RFCS/version_migration.md, line 98 at r3 (raw file):

Previously, irfansharif (irfan sharif) wrote…

If a gossiped ClusterVersion is received where minimum_version

This has been repeated three times now.

The second usage you're referring to is talking about checks made at startup before gossip. I've condensed the other two.

docs/RFCS/version_migration.md, line 112 at r3 (raw file):

Previously, irfansharif (irfan sharif) wrote…

What are we really verifying here if not using the new features?

General stability.

docs/RFCS/version_migration.md, line 114 at r3 (raw file):

Previously, a-robinson (Alex Robinson) wrote…

For what it's worth, another backup before this step would be a good idea

Done.

docs/RFCS/version_migration.md, line 114 at r3 (raw file):

Previously, irfansharif (irfan sharif) wrote…

Again, naively as someone with no familiarity with gossip, if the new minimum_version setting has been persisted on node A, is it then possible for node A to receive gossip from node B with a older, smaller minimum_version giving rise to the "exit with error" described above?

Because this information is gossiped by the leader of the system settings range where the data resides, and the clocks between two successive leaseholders are forced into monotonicity, there's no chance that the recipient of a newer ClusterVersion will accept a gossip of an older ClusterVersion.

docs/RFCS/version_migration.md, line 118 at r3 (raw file):

Previously, irfansharif (irfan sharif) wrote…

Double space here.

Done.

docs/RFCS/version_migration.md, line 119 at r3 (raw file):

Previously, irfansharif (irfan sharif) wrote…

How would you deal with the new on-disk data generated during an earlier (failed) migration process? The term "reversion markers" was used elsewhere, wouldn't this be a problem if there's another attempt to migrate up?
FWIW I don't have an answer for what a good "clean-up" solution would be, but I can see why it'd be important. Having the new features disambiguate between on-disk state left over from an earlier failed migration vs. a node restart (running the new version before and after) is non-ideal. Or maybe I'm missing something.

Perhaps this is unclear. You'd want to wipe the on-disk state, revert to the previous binary, and restore from backup. The new on-disk data generated during the failed migration process would be lost.

docs/RFCS/version_migration.md, line 128 at r3 (raw file):

Previously, irfansharif (irfan sharif) wrote…

A valid version must be equal to the advertised version of every active node in the cluster

Document what exactly this means now given the new use_version.

Done.

docs/RFCS/version_migration.md, line 142 at r3 (raw file):

Previously, irfansharif (irfan sharif) wrote…

@a-robinson: I think 'less' is right, from what I can tell each feature is now gated on a particular use_version so unless the gossiped cluster wide use_version is higher than the feature's use_version, it is disabled.
@spencerkimball, is this the case? Each new feature between point releases is essentially "assigned" a use_version to get gated on?

nit: Don't like use_version as a name though I don't know of anything better.

@a-robinson you're correct, it should be "greater". @irfansharif your summation is also correct; you're saying the same thing. And yes, every feature requiring migration or backwards-incompatible interpretation of on-disk state will be "assigned" a required version.

docs/RFCS/version_migration.md, line 146 at r3 (raw file):

Previously, irfansharif (irfan sharif) wrote…

One thing that's unclear to me is when and where exactly does the actual "migration" take place? By that I mean consider #16809, there needs to be an explicit store level migration to copy over all raft data from the old engine to the new (for obvious correctness reasons), when and where will this actually take place? This was a large motivator in the intermediary TRANSITIONING state I experimented with in #16809.
(I imagine there will be future changes too where we'll need some form of store/node level migration. Just came across #12154 which, though minor, fits the bill me thinks.)

The particulars of that are outside the scope of this document, as I believe each feature will need to do its own work here. Perhaps these are done range-by-range once the use_version is set to the required version.

docs/RFCS/version_migration.md, line 150 at r3 (raw file):

Previously, a-robinson (Alex Robinson) wrote…

This is the case with both the Revert and Raft log
examples mentioned in the Motivation section of this document.

Not a big deal, but it's not necessarily the case for the raft log example

Are we currently contemplating building a reverse migration from the separate raft log back to the combined RocksDB keyspace?

docs/RFCS/version_migration.md, line 176 at r3 (raw file):

Previously, irfansharif (irfan sharif) wrote…

drive-by comment: if we do have backwards incompatible changes lined up for future releases gated behind cluster settings/env vars, we can always encourage the operator try it out first (on staged environments, presumably) on late patch releases (think GO15VENDOREXPERIMENT).

I incorporated this comment into the "Upgrade process" section.

Comments from Reviewable

Pending cockroachdb#16977, this gives a crude way of landing changes that will need a proper migration. The "problem" with this change is that this means that tests won't be run with the new code completely active. We can change that, but it would approximately double time spent in CI, so that I'm not convinced it's worth it at this point. Instead, the "new mode" should be tested thoroughly when it can properly be migrated into.

a-robinson · 2017-07-17T22:11:33Z

Reviewed 1 of 2 files at r2, 1 of 1 files at r4.
Review status: all files reviewed at latest revision, 19 unresolved discussions, some commit checks failed.

docs/RFCS/version_migration.md, line 123 at r1 (raw file):

Previously, spencerkimball (Spencer Kimball) wrote…

I think this is crucial. Things could get very messy without some cluster-wide sanity checking.

Agreed that verification is crucial, I was just pointing out that it would be possible to do arbitrary validation for a set cluster setting value, so it's purely a choice of which syntax you think provides a better UX.

docs/RFCS/version_migration.md, line 88 at r3 (raw file):

Previously, spencerkimball (Spencer Kimball) wrote…

I futzed with the comments a bit for clarity.

@a-robinson: not sure what you mean by the last sentence in your comment. The idea here is that each node records the max ClusterVersion it has seen via gossip.

This LGTM, I got tripped up by the "for each store" wording in the previous version, but it makes sense since a process isn't guaranteed to always be started with the same set of stores.

docs/RFCS/version_migration.md, line 150 at r3 (raw file):

Previously, spencerkimball (Spencer Kimball) wrote…

Are we currently contemplating building a reverse migration from the separate raft log back to the combined RocksDB keyspace?

There was discussion of it on that PR. If you already have the code for copying the raft log from one rocksdb engine to another, doing the reverse might be as simple as just swapping the src and dst arguments.

docs/RFCS/version_migration.md, line 66 at r4 (raw file):

minimum-supported version baked-in, which is used to sanity check that
a too-new server is not run with a too-old data directory (i.e. the
`minimum_version` set in `ClusterVersion` cannot be less than the

Shouldn't this be the use_version rather than the minimum_version given your comment above (which I agree with)?

Comments from Reviewable

petermattis · 2017-07-17T22:18:49Z

General idea looks good to me, though you might want to place even more emphasis on how often this will be used. For example, @jordanlewis's work on adding a new KV command to speed up schema changes needs this. So does @tschottdorf's work to change Raft log truncation and to fix a bug with the generation of Raft hard state during splits.

Review status: all files reviewed at latest revision, 20 unresolved discussions, some commit checks failed.

docs/RFCS/version_migration.md, line 75 at r3 (raw file):

Previously, spencerkimball (Spencer Kimball) wrote…

Added a comment.

I'm wondering what this will look like in code. My suspicion is that we'll have a half-dozen to dozen features on each release which need to be gated on the cluster version. Does each feature get its own micro version? Let's say the Revert command gets micro version 1 and the Raft log truncation functionality gets micro version 2. Are they then specifying 1.0.1 and 1.0.2 as the required version for those features? That seems confusingly close to the patch releases we've done for the 1.0 release. I'd prefer something that is more distinct here and/or tied to our alpha version numbering. Perhaps rename micro to unstable and make it indicate a date? Apologies for the bikeshedding in advvance.

Comments from Reviewable

This uses the previously added bootstrap version which is populated when a cluster running >1.0 is bootstrapped to populate the settings table. At the same time, prevent `SET CLUSTER SETTING version = 'x'` from working until that migration has run. This solves one remaining headache for [version migrations][1] by giving it authoritative information on the currently running cluster version during upgrades. Fixes cockroachdb#17389. [1]: cockroachdb#16977

bdarnell · 2017-08-16T21:26:52Z

LGTM

Review status: 4 of 5 files reviewed at latest revision, 20 unresolved discussions, some commit checks failed.

docs/RFCS/20170815_version_migration.md, line 67 at r8 (raw file):

Now let's do the actual update. For that, the operator

1. performs a rolling restart into the `v1.1` binary,

For future upgrades (e.g. 1.1 to 1.2), before the rolling restart, the operator should ensure (with SHOW CLUSTER SETTINGS) that the version setting matches the version in use.

docs/RFCS/20170815_version_migration.md, line 68 at r8 (raw file):

1. performs a rolling restart into the `v1.1` binary,
1. checks that all nodes are *really* running `v1.1` (in particular, auto-scaling etc are disabled),

Auto-scaling doesn't need to be disabled as long as you can confirm that auto-scaling will no longer use v1.0.

docs/RFCS/20170815_version_migration.md, line 237 at r8 (raw file):

Tagging the unstable versions individually has the advantage that we can
properly migrate our test clusters simply through a rolling restart, and then a
version bump to `<major>.<minor>-<old_unstable+1>`.

We can skip unstable versions, right? So we can go to <major>.<minor>-<new_unstable> instead of just <old_unstable+1>.

docs/RFCS/20170815_version_migration.md, line 255 at r8 (raw file):

  version at an inopportune time.
- Create a (full or incremental) backup of the cluster.
- Rolling upgrade of nodes to next version.

We should explicitly mention that if any node is down at this point, you must either bring it back up and upgrade it before continuing, or ensure that it won't restart automatically and attempt to rejoin with its old version.

docs/RFCS/20170815_version_migration.md, line 262 at r8 (raw file):

- Verify basic cluster stability. If problems occur, a rolling downgrade is
  still an option.
- Depending on how much time has passed, another incremental update could be

s/update/backup/

docs/RFCS/20170815_version_migration.md, line 277 at r8 (raw file):

This section runs through the moving parts involved in the implementation. Note
that at the time of writing and contrary to how the RFC process should work, it

I don't think we need this caveat - this writing may have been post-implementation, but most of it was hashed out in the original RFC before implementation.

docs/RFCS/20170815_version_migration.md, line 585 at r8 (raw file):

  health up to user error.
- we can't roll back upgrades, which will make many users nervous
    - in particular, it discounts the OSS version of CockroachDB

If cockroach dump is not an acceptable backup option at moderate scale, we have a problem that we should fix. We shouldn't just act like the OSS version doesn't have a suitable backup option.

docs/RFCS/20170815_version_migration.md, line 595 at r8 (raw file):

The main concessions we make in this design are

1. no support for downgrades. This was discussed early in the life of this RFC

"No support" is too harsh. The binary upgrade can be rolled back, it's only the features controlled by the new setting that have an irreversible upgrade step. And the design has some support for downgrades via the UseVersion field. It's not complete (and it's not being implemented in this version), but it's something.

Comments from Reviewable

a-robinson · 2017-08-17T01:16:58Z

This came out fairly complex, which isn't great, but nice job dealing with it all.

Reviewed 5 of 5 files at r7, 1 of 1 files at r8.
Review status: all files reviewed at latest revision, 28 unresolved discussions, some commit checks failed.

docs/RFCS/20170815_version_migration.md, line 203 at r7 (raw file):

Note that a server always accepts features that require the new version, even if
it hasn't received note that they are safe to use, when their use is prompted by
another node. For instance, if a new intra-node RPC is introduced, nodes should

s/intra-node/inter-node/ - intra-node would mean an RPC within a node (from itself to itself)

docs/RFCS/20170815_version_migration.md, line 341 at r7 (raw file):

### Server Configuration

The "binary version" is `ServerVersion` (type `roachpb.Version`) and is part of

For what it's worth, this part was fairly unclear on first (and second) read. What is the "binary version" and how does it fit into anything? What's it used for? Quoting a term that isn't used or referred to anywhere else doesn't help comprehension.

docs/RFCS/20170815_version_migration.md, line 347 at r7 (raw file):

Similarly a `Server` is configured with `MinimumSupportedVersion`. If a server
starts up with a store that has a persisted smaller than this or larger than its

a persisted what?

docs/RFCS/20170815_version_migration.md, line 359 at r7 (raw file):

Once a node has received a cluster-wide minimum version from the settings table
via Gossip, it is be used as the authoritative version to use (unless the binary

s/is be/is/

docs/RFCS/20170815_version_migration.md, line 382 at r7 (raw file):

boot into them from 1.0) or conflicting information. Roughly speaking,
bootstrapping stores can afford to wait for an authoritative version from gossip
and uses that, and whenever we ask a store about its version and it has none

s/uses/use/

docs/RFCS/20170815_version_migration.md, line 383 at r7 (raw file):

bootstrapping stores can afford to wait for an authoritative version from gossip
and uses that, and whenever we ask a store about its version and it has none
persisted, it counts as a store at `v1.0`. We make sure to write the version

What does it mean for a store to be "at" v1.0? Is this referring to its BinaryServerVersion or its knowledge of the cluster's configured MinimumVersion/UseVersion?

docs/RFCS/20170815_version_migration.md, line 540 at r7 (raw file):

table, use the "suspected" local version during cluster version setting updates.

- this is incorrect when, say, adding a v3.0 node to a v1.0 cluster and

But doesn't the v3.0 node have to connect through gossip to the cluster before it can do anything else? And ouldn't it learn via the initial sharing of gossip info that none of the nodes in the gossip network have a version?

edit: I see you discuss this below, but I'm leaving it here just as a comment that this section immediately brought up that question.

docs/RFCS/20170815_version_migration.md, line 255 at r8 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

We should explicitly mention that if any node is down at this point, you must either bring it back up and upgrade it before continuing, or ensure that it won't restart automatically and attempt to rejoin with its old version.

Have we not yet implemented protection against this in the connection-opening phase?

docs/RFCS/20170815_version_migration.md, line 606 at r8 (raw file):

   There are ways in which this could be approximately inferred, but again it
   was deemed to complex given the time frame. Besides, operators may prefer to

s/to/too

Comments from Reviewable

tbg · 2017-08-17T18:55:34Z

TFTRs! Addressed comments.

This RFC is now entering its final comment period.

Review status: 2 of 5 files reviewed at latest revision, 28 unresolved discussions.

docs/RFCS/20170815_version_migration.md, line 203 at r7 (raw file):

Previously, a-robinson (Alex Robinson) wrote…

s/intra-node/inter-node/ - intra-node would mean an RPC within a node (from itself to itself)

Done.

docs/RFCS/20170815_version_migration.md, line 341 at r7 (raw file):

Previously, a-robinson (Alex Robinson) wrote…

For what it's worth, this part was fairly unclear on first (and second) read. What is the "binary version" and how does it fit into anything? What's it used for? Quoting a term that isn't used or referred to anywhere else doesn't help comprehension.

How's this?

docs/RFCS/20170815_version_migration.md, line 347 at r7 (raw file):

Previously, a-robinson (Alex Robinson) wrote…

a persisted what?

Done.

docs/RFCS/20170815_version_migration.md, line 359 at r7 (raw file):

Previously, a-robinson (Alex Robinson) wrote…

s/is be/is/

Done.

docs/RFCS/20170815_version_migration.md, line 382 at r7 (raw file):

Previously, a-robinson (Alex Robinson) wrote…

s/uses/use/

Done.

docs/RFCS/20170815_version_migration.md, line 383 at r7 (raw file):

Previously, a-robinson (Alex Robinson) wrote…

What does it mean for a store to be "at" v1.0? Is this referring to its BinaryServerVersion or its knowledge of the cluster's configured MinimumVersion/UseVersion?

Clarified (maybe?)

docs/RFCS/20170815_version_migration.md, line 540 at r7 (raw file):

Previously, a-robinson (Alex Robinson) wrote…

But doesn't the v3.0 node have to connect through gossip to the cluster before it can do anything else? And ouldn't it learn via the initial sharing of gossip info that none of the nodes in the gossip network have a version?

edit: I see you discuss this below, but I'm leaving it here just as a comment that this section immediately brought up that question.

I tried to clarify, can you double check that I'm providing the info you wish you had on your first read?

docs/RFCS/20170815_version_migration.md, line 67 at r8 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

For future upgrades (e.g. 1.1 to 1.2), before the rolling restart, the operator should ensure (with SHOW CLUSTER SETTINGS) that the version setting matches the version in use.

Good point, done (made the example about 1.1 -> 1.2).

docs/RFCS/20170815_version_migration.md, line 68 at r8 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Auto-scaling doesn't need to be disabled as long as you can confirm that auto-scaling will no longer use v1.0.

Done.

docs/RFCS/20170815_version_migration.md, line 237 at r8 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

We can skip unstable versions, right? So we can go to <major>.<minor>-<new_unstable> instead of just <old_unstable+1>.

Yep, done.

docs/RFCS/20170815_version_migration.md, line 255 at r8 (raw file):

Previously, a-robinson (Alex Robinson) wrote…

Have we not yet implemented protection against this in the connection-opening phase?

I wish we did, but the tests I have written indicate that we don't yet. This is something that makes me antsy as there are now various new paths that can Fatal, and I don't want a {newer, older} node fataling an {older, newer} cluster. Something I plan to look into before calling this thing done.

docs/RFCS/20170815_version_migration.md, line 262 at r8 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

s/update/backup/

Done.

docs/RFCS/20170815_version_migration.md, line 277 at r8 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

I don't think we need this caveat - this writing may have been post-implementation, but most of it was hashed out in the original RFC before implementation.

Done. I think it is debatable how much of the actual design was hashed out in the original RFC, but you're right that it doesn't matter here.

docs/RFCS/20170815_version_migration.md, line 585 at r8 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

If cockroach dump is not an acceptable backup option at moderate scale, we have a problem that we should fix. We shouldn't just act like the OSS version doesn't have a suitable backup option.

I agree, but anything actionable here? I haven't seriously used cockroach dump but the last time I did, it was a pain because it didn't resolve references etc in the right order (so the backups were basically broken and I needed to do copy-paste dep resolution). The SQLers are aware of this, but... yeah, that's not ideal!

docs/RFCS/20170815_version_migration.md, line 595 at r8 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

"No support" is too harsh. The binary upgrade can be rolled back, it's only the features controlled by the new setting that have an irreversible upgrade step. And the design has some support for downgrades via the UseVersion field. It's not complete (and it's not being implemented in this version), but it's something.

Good point, done.

docs/RFCS/20170815_version_migration.md, line 606 at r8 (raw file):

Previously, a-robinson (Alex Robinson) wrote…

s/to/too

Done.

Comments from Reviewable

a-robinson · 2017-08-17T19:27:51Z

Reviewed 3 of 3 files at r9.
Review status: all files reviewed at latest revision, 24 unresolved discussions, all commit checks successful.

docs/RFCS/20170815_version_migration.md, line 341 at r7 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

How's this?

Better, thanks 👍

docs/RFCS/20170815_version_migration.md, line 383 at r7 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

Clarified (maybe?)

Yup, thanks

docs/RFCS/20170815_version_migration.md, line 540 at r7 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

I tried to clarify, can you double check that I'm providing the info you wish you had on your first read?

It's fine, yeah

docs/RFCS/20170815_version_migration.md, line 359 at r9 (raw file):

Similarly a `Server` is configured with `MinimumSupportedVersion` (which, again,
is hard-coded in release builds to equal `cluster.BinaryServerVersion`, and

It's not actually set to equal cluster.BinaryServerVersion, is it? That would make it impossible to ever start a server with a store from the previous release. In the code it currently gets set to cluster.VersionBase

Comments from Reviewable

tbg · 2017-08-17T19:34:30Z

Review status: all files reviewed at latest revision, 24 unresolved discussions, all commit checks successful.

docs/RFCS/20170815_version_migration.md, line 359 at r9 (raw file):

Previously, a-robinson (Alex Robinson) wrote…

It's not actually set to equal cluster.BinaryServerVersion, is it? That would make it impossible to ever start a server with a store from the previous release. In the code it currently gets set to cluster.VersionBase

You're right, this is wrong - it typically trails cluster.BinaryServerVersion by a minor version (which is implied by the rest of the sentence). Fixed.

Comments from Reviewable

@m-schneider

This finishes up the bulk of the remaining work for the [version migation RFC][rfc]. There are two main changes: 1. Add `*ExposedClusterVersion` to the `cluster.Settings` struct. Internally, it wraps the `version` cluster setting, but it exposes it indirectly to synchronize version updates with the node's engines. More precisely, - `*ExposedClusterVersion` is initialized with the persisted version early in the `Node` startup sequence - the `Node` registers itself for notifications whenever the `version` changes - `*ExposedClusterVersion` only exposes a change in the underlying `version` variable to callers *after* calling the `Node` callback (which in turn persists the update to all engines). - The result is that whenever `IsActive()` or `Version()` is called, the returned version is safe to use or to migrate to without having to worry about backwards compatibility. 1. Add code that synthesizes from and writes the version to the engines. This mostly picks up #17155 with some code golf that refactors the node startup sequence appropriately, making sure that even when there are stores that need (asynchronous) bootstrapping, they'll still have the cluster version persisted at the end. TODO: - [ ] Testing. - [ ] Update the RFC. - [ ] Pave the way for future use of `UseVersion < MinVersion`, if still required. - [ ] Migration that makes sure there's always a cluster version settings able entry (@m-schneider). - [ ] Disallow version bump if settings table has no value entry. - [ ] Consider removing the reliance on gossip callback ordering. Gossip does not *guarantee* that version updates arrive in-order, and we currently persist the cluster version blindly. `*ExposedClusterVersion` would have to do its own synthesis to refuse illegal changes backwards. This may be a non-issue but needs some consideration (all of the cluster settings would be vulnerable to it). [rfc]: cockroachdb/cockroach#16977

spencerkimball requested a review from bdarnell July 11, 2017 01:50

irfansharif mentioned this pull request Jul 11, 2017

[wip,dnm] storage: introduced dedicated raft storage #16809

Closed

knz reviewed Jul 11, 2017

View reviewed changes

spencerkimball force-pushed the spencerkimball/version-migrations-rfc branch from 0dd1bb6 to fa759d3 Compare July 12, 2017 22:57

spencerkimball force-pushed the spencerkimball/version-migrations-rfc branch from fa759d3 to 5e7b4b5 Compare July 14, 2017 20:34

tbg mentioned this pull request Jul 17, 2017

storage: don't send log truncations through Raft #16993

Merged

tbg mentioned this pull request Jul 17, 2017

util: add migration stub #17068

Merged

spencerkimball force-pushed the spencerkimball/version-migrations-rfc branch from 5e7b4b5 to 62da85b Compare July 17, 2017 20:58

spencerkimball mentioned this pull request Jul 17, 2017

rfc: Revert Command #16294

Closed

tbg mentioned this pull request Aug 16, 2017

migrations: populate initial version setting #17694

Merged

tbg force-pushed the spencerkimball/version-migrations-rfc branch from be8f9cd to 05e3f9d Compare August 16, 2017 20:51

spencerkimball and others added 8 commits August 17, 2017 14:55

rfc: version migration for backwards incompatible functionality

f636d6c

Update version migration doc filename

f145e2f

Jam the doc into the new RFC format

b601f9a

*: misc version migration FIXMEs

bc72782

RFCS: rework the version migration RFC

fd2e406

Address version table entry bootstrapping

cac1f51

Address store bootstrap on join

6d57c7f

Link to yet another PR

f3bf635

tbg force-pushed the spencerkimball/version-migrations-rfc branch from 7e385fa to c3e5efe Compare August 17, 2017 18:55

Address feedback

fac527a

tbg force-pushed the spencerkimball/version-migrations-rfc branch from c3e5efe to fac527a Compare August 17, 2017 19:34

tbg merged commit abc32a9 into master Aug 22, 2017

tbg deleted the spencerkimball/version-migrations-rfc branch August 22, 2017 14:32

jseldess mentioned this pull request Sep 18, 2017

Improve cockroach quit and rolling upgrade docs cockroachdb/docs#1899

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rfc: version migration for backwards incompatible functionality #16977

rfc: version migration for backwards incompatible functionality #16977

spencerkimball commented Jul 11, 2017 •

edited by tbg

Loading

cockroach-teamcity commented Jul 11, 2017

spencerkimball commented Jul 11, 2017

a-robinson commented Jul 11, 2017

knz left a comment

knz Jul 11, 2017

knz Jul 11, 2017

tbg commented Jul 11, 2017

bdarnell commented Jul 11, 2017

spencerkimball commented Jul 12, 2017

spencerkimball commented Jul 12, 2017

bdarnell commented Jul 14, 2017

spencerkimball commented Jul 14, 2017

bdarnell commented Jul 15, 2017

tbg commented Jul 16, 2017 via email •

edited

Loading

a-robinson commented Jul 17, 2017

bdarnell commented Jul 17, 2017

a-robinson commented Jul 17, 2017

irfansharif commented Jul 17, 2017

spencerkimball commented Jul 17, 2017

a-robinson commented Jul 17, 2017

petermattis commented Jul 17, 2017

bdarnell commented Aug 16, 2017

a-robinson commented Aug 17, 2017

tbg commented Aug 17, 2017

a-robinson commented Aug 17, 2017

tbg commented Aug 17, 2017

rfc: version migration for backwards incompatible functionality #16977

rfc: version migration for backwards incompatible functionality #16977

Conversation

spencerkimball commented Jul 11, 2017 • edited by tbg Loading

cockroach-teamcity commented Jul 11, 2017

spencerkimball commented Jul 11, 2017

a-robinson commented Jul 11, 2017

knz left a comment

Choose a reason for hiding this comment

knz Jul 11, 2017

Choose a reason for hiding this comment

knz Jul 11, 2017

Choose a reason for hiding this comment

tbg commented Jul 11, 2017

bdarnell commented Jul 11, 2017

spencerkimball commented Jul 12, 2017

spencerkimball commented Jul 12, 2017

bdarnell commented Jul 14, 2017

spencerkimball commented Jul 14, 2017

bdarnell commented Jul 15, 2017

tbg commented Jul 16, 2017 via email • edited Loading

a-robinson commented Jul 17, 2017

bdarnell commented Jul 17, 2017

a-robinson commented Jul 17, 2017

irfansharif commented Jul 17, 2017

spencerkimball commented Jul 17, 2017

a-robinson commented Jul 17, 2017

petermattis commented Jul 17, 2017

bdarnell commented Aug 16, 2017

a-robinson commented Aug 17, 2017

tbg commented Aug 17, 2017

a-robinson commented Aug 17, 2017

tbg commented Aug 17, 2017

spencerkimball commented Jul 11, 2017 •

edited by tbg

Loading

tbg commented Jul 16, 2017 via email •

edited

Loading