Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
server: simplify bootstrap via fatter init server
\## Motivation Today, starting up a server is complicated. This is especially true when bootstrap is necessary. By this, we mean that either - no NodeID is known. This implies that none of the engines were initialized yet - the NodeID is known (i.e. at least one store is initialized), but a new engine was added. When the process first starts, and a NodeID is known, a ClusterID is also known (since they get persisted together, on every store). For the same reason, a persisted cluster version is known. In this case, we can start a fully initialized cluster, so this is the easy case. It is more difficult when no NodeID is known, in which case the server is just starting up for the first time, with its engines all blank. It needs to somehow allocate NodeIDs (and StoreIDs) which are then written to the engines. It also needs to, at least initially, use the lowest possible cluster version it can tolerate (after all, the cluster may actually run at that lowest version). Right now, there is a delicate dance: we thread late-bound ClusterID and NodeID containers all over the place, spin up a mostly dysfunctional Node, use its KV client to allocate NodeID and StoreIDs once this is possible - we need Gossip to have connected first - and update the containers. It is complex, error prone, and ossifies any code it touches. Cluster versions deserve an extra shout-out for complexity. Even if a cluster version was persisted, the node may have been decommissioned and the cluster upgraded. Much work went into our RPC layer to prevent connections between incompatible nodes, but there is no boot-time check that results in a swift and descriptive error - there will be a fatal error, originating from the RPC subsystem. One aim of our work is to simplify this by checking the version via an RPC before setting too many gears in motion. \## Context This marks the beginning of a series of PRs aiming at improving the startup code. Ultimately, the goal is to have the Node and all Stores bootstrapped as early as possible, but in particular before starting the KV or SQL server subsystems. Furthermore, we want to achieve this without relying on Gossip, to prepare for a split between the SQL and KV servers in the context of multitenancy support (SQL server won't be able to rely on Gossip, but will still need to announce itself to the KV servers). \## This PR This PR is an initial simplifying refactor that can help achieve these goals. The init server (which hosts the Bootstrap RPC) is given more responsibilities: it is now directly in charge of determining which, if any, engines are bootstrapped, and explicitly listens to Gossip as well as the Bootstrap RPC. It returns the cluster ID to the main server start-up goroutine when it is available. As a result, the main startup code has simplified, and a thread to be pulled on further has appeared and is called out in TODOs. Down the road (i.e. in later PRs), the init server will bootstrap a NodeID and StoreIDs even when joining an existing cluster. It will initially mimic/front-load the strategy taken by Node today, i.e. use a `kv.DB`, but ultimately will bypass Gossip completely and use a simple RPC call to ask the existing cluster to assign these IDs as needed. This RPC will also establish the active cluster version, which is required for SQL multi-tenancy, and generally follows the ideas in cockroachdb#32574. Release note: None
- Loading branch information