-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pkg/migration: support waiting for minimum engine version #76085
Conversation
e374d61
to
66db578
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, the reason we need additional infrastructure here is because we cannot satisfy this requirement of cluster version feature gating:
cockroach/pkg/clusterversion/cockroach_versions.go
Lines 50 to 84 in fc1957a
// At the same time, with requests/RPCs originating at other crdb nodes, the | |
// initiator of the request gets to decide what's supported. A node should | |
// not refuse functionality on the grounds that its view of the version gate | |
// is as yet inactive. Consider the sender: | |
// | |
// func invokeSomeRPC(req) { | |
// if (specific-version is active) { | |
// // Like mentioned above, this implies that all nodes in the | |
// // cluster are running binaries that can handle this new | |
// // feature. We may have learned about this fact before the | |
// // node on the other end. This is due to the fact that migration | |
// // manager informs each node about the specific-version being | |
// // activated active concurrently. See BumpClusterVersion for | |
// // where that happens. Still, it's safe for us to enable the new | |
// // feature flags as we trust the recipient to know how to deal | |
// // with it. | |
// req.NewFeatureFlag = true | |
// } | |
// send(req) | |
// } | |
// | |
// And consider the recipient: | |
// | |
// func someRPC(req) { | |
// if !req.NewFeatureFlag { | |
// // Legacy behavior... | |
// } | |
// // There's no need to even check if the specific-version is active. | |
// // If the flag is enabled, the specific-version must have been | |
// // activated, even if we haven't yet heard about it (we will pretty | |
// // soon). | |
// } | |
// | |
// See clusterversion.Handle.IsActive and usage of some existing versions | |
// below for more clues on the matter. |
When it comes to low-level engine features, the receiver cannot respect request-level feature requests until its own engine is upgraded.
My understanding is still a little muddled though. In pkg/clusterversion
there's:
// Handle is the interface through which callers access the active cluster
// version and this binary's version details.
type Handle interface {
// ActiveVersion returns the cluster's current active version: the minimum
// cluster version the caller may assume is in effect.
//
// ActiveVersion fatals if the cluster version setting has not been
// initialized (through `Initialize()`).
ActiveVersion(context.Context) ClusterVersion
// ...
}
I'm not sure how exactly to interpret "the minimum cluster version the caller may assume is in effect." The way I thought all this worked was the cluster starts at version X. Every node applies migrations for version X+1. Then, every node calls WriteClusterVersion
/(storage.Engine).SetMinVersion
to persist the new active cluster version of X+1. If that's the case, we could put the ratcheting to pebble.FormatBlockPropertyCollector
in a migration that precedes clusterversion.PebbleFormatVersionBlockProperties
. That would ensure that a PebbleFormatVersionBlockProperties
cluster version implies all stores are running the pebble.FormatBlockPropertyCollector
format major version.
Reviewed 3 of 16 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal and @sumeerbhola)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When it comes to low-level engine features, the receiver cannot respect request-level feature requests until its own engine is upgraded.
Yeah, this was the motivation. It was the following line that had me worried:
we will pretty soon
The word "soon". If the receiver was seeing an SSTable with a new table format, I'm not sure it's enough to just accept it without having ratcheted the store, even if the binary has the version to support it?
There was also this comment:
// If this returns true then all nodes in the cluster will eventually see
// this version. However, this is not atomic because version gates (for a
// given version) are pushed through to each node concurrently. Because of
// this, nodes should not be gating proper handling of remotely initiated
// requests that their binary knows how to handle on this state. The
// following example shows why this is important:
The word "eventually". Which seems particularly relevant to the backup and ingestion case. My thinking was that it's necessary that IsActive
returns true
at the desired version, but not sufficient, in that while a node may be running a new binary that supports the newer table version, internal to Pebble (both on disk and the fields on the *DB
in memory) we're running through the code-paths for the old version, which could choke on the newer features.
If that's the case, we could put the ratcheting to pebble.FormatBlockPropertyCollector in a migration that precedes clusterversion.PebbleFormatVersionBlockProperties. That would ensure that a PebbleFormatVersionBlockProperties cluster version implies all stores are running the pebble.FormatBlockPropertyCollector format major version.
Same thing here. Isn't this only saying that the binary version is up-to-date? Isn't there still an element of asynchronicity in the way in which the store is ratcheted?
In talking with @sumeerbhola and @itsbilal, it sounded as if we needed some form of "barrier", after which point we know for certain that all stores have actually been ratcheted up. The two-phase version approach was floated, but the "eventually" / "soon" thing had me questioning it.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal and @sumeerbhola)
As part of the MVCC Bulk Operations project, [Pebble range keys][1] will need to be enabled on _all_ engines of a cluster before nodes can start using the feature to read and write SSTables that contain the range key features (a backward-incompatible change). Adding a cluster version is necessary, but not sufficient in guaranteeing that nodes are ready to generate, but importantly _receive_ SSTabbles with range key support. Specifically, there exists a race condition where nodes are told to update their engines as part of the existing Pebble major format update process, but there is no coordination between the nodes. One node (a sender) may complete its engine upgrade and enable the new SSTable features _before_ another node (the receiver). The latter will panic on receipt of an SSTable with the newer features written by the former. Add an server RPC endpoint that provides a means of waiting on a node to update its store to a version that is compatible with a cluster version. This endpoint is used as part of a system migration to ensure that all nodes in a cluster are running with an engine version that is at least compatible with a given cluster version. Expose the table format major version on `storage.Engine`. This will be used elsewhere in Cockroach (for example, SSTable generation for ingest and backup). Add a `WaitForCompatibleEngineVersion` function on the `storage.Engine` interface that provides a mechanism to block until an engine is running at a format major version that compatible with a given cluster version. Expose the engine format major version as a `storage.TestingKnob` to allow tests to alter the Pebble format major version. Add a new cluster version to coordinate the upgrade of all engines in a cluster to `pebble.FormatBlockPropertyCollector` (Pebble,v1), and the system migration required for coordinating the engine upgrade to the latest Pebble table format version. This patch also fixes an existing issue where a node may write SSTables with block properties as part of a backup that are then ingested by an older node. This patch provides the infrastructure necessary for making these "cluster-external" SSTable operations engine-aware. Nodes should only use a table format version that other nodes in the cluster understand. Informs cockroachdb/pebble#1339. [1]: cockroachdb/pebble#1339 Release note: None
66db578
to
bdbd428
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this only saying that the binary version is up-to-date?
That's not my understanding. My understanding is once WriteClusterVersion(x)
is invoked, you're guaranteed that all nodes have applied all migrations preceding x.
Isn't there still an element of asynchronicity in the way in which the store is ratcheted?
The call to WriteClusterVersion(x)
is performed asynchronous, but if we performed the ratcheting in a migration that precedes x, it doesn't matter. We can always use the persisted cluster version as the indicator of what table format to use. (Note that the inverse is not true: This scheme means that we cannot read the local engine format major version and then decide that it's okay to send that format major version's table format to other nodes. I think it's slightly cleaner to use the cluster version rather than the local engine's format major version anyway, because the cluster version is the mechanism intended to control versioning of inter-node communication more generally.)
Tagging @tbg and @irfansharif for thoughts, because they helped me understand the existing infrastructure previously.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal, @jbowens, and @sumeerbhola)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I found what I was originally looking for (but I missed the edit in the node additions section of the RFC, on adding the fenced versions), in the comments in (*migrationmanager.Manager).Migrate()
, here.
We loop over all nodes in the cluster first applying the fenced version, and then the "real" version. The "two-phase" approach that @sumeerbhola and I had originally talked about should work, and we don't need this extra RPC dance to wait for the engine versions.
I'm also wondering if we can get away with using the internal "fenced" version as our first version (Pebble would bump it's store version when it reaches the fenced version), and the second ("real") version would actually enable the feature (at which point we know all stores have been ratcheted).
I think we can close this PR in favor of the original approach, though I'll keep open until I get confirmation from @tbg and / or @irfansharif on the approach.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal, @jbowens, and @sumeerbhola)
Instead of using the "fenced" versions, which I think should really only be used for pkg/migration's internal book-keeping, could we instead introduce two cluster versions? It's entirely possible/sound to use multiple cluster versions to chain a sequence of things in order, given the migrations infra guarantees that all nodes will see vX before even attempting to move up to vX+1. The migration infra also lets you attach arbitrary checks as part of vX or vX+1, so you could reasonably ensure that everything is up to snuff. One recent example of using multiple versions can be found here: cockroach/pkg/kv/kvserver/store.go Lines 2118 to 2139 in e7540a1
I'm not fully following the specifics of the PR, but happy to jump over call and brainstorm. |
Yep, we can, and shall do.
Great. This is the guarantee we need.
Thank you for this! Super useful. Closing. TFTRs! |
As part of the MVCC Bulk Operations project, Pebble range keys will
need to be enabled on all engines of a cluster before nodes can start
using the feature to read and write SSTables that contain the range key
features (a backward-incompatible change).
Adding a cluster version is necessary, but not sufficient in
guaranteeing that nodes are ready to generate, but importantly receive
SSTabbles with range key support. Specifically, there exists a race
condition where nodes are told to update their engines as part of the
existing Pebble major format update process, but there is no
coordination between the nodes. One node (a sender) may complete its
engine upgrade and enable the new SSTable features before another node
(the receiver). The latter will panic on receipt of an SSTable with the
newer features written by the former.
Add an server RPC endpoint that provides a means of waiting on a node to
update its store to a version that is compatible with a cluster version.
This endpoint is used as part of a system migration to ensure that all
nodes in a cluster are running with an engine version that is at least
compatible with a given cluster version.
Expose the table format major version on
storage.Engine
. This will beused elsewhere in Cockroach (for example, SSTable generation for ingest
and backup).
Add a
WaitForCompatibleEngineVersion
function on thestorage.Engine
interface that provides a mechanism to block until an engine is running
at a format major version that compatible with a given cluster version.
Expose the engine format major version as a
storage.TestingKnob
toallow tests to alter the Pebble format major version.
Add a new cluster version to coordinate the upgrade of all engines in a
cluster to
pebble.FormatBlockPropertyCollector
(Pebble,v1), and thesystem migration required for coordinating the engine upgrade to the
latest Pebble table format version.
This patch also fixes an existing issue where a node may write SSTables
with block properties as part of a backup that are then ingested by an
older node. This patch provides the infrastructure necessary for making
these "cluster-external" SSTable operations engine-aware. Nodes should
only use a table format version that other nodes in the cluster
understand.
Informs cockroachdb/pebble#1339.
Release note: None