Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
39640: storage: unify replica addition and removal paths r=nvanbenschoten a=tbg This continues the reworking of the various replication change APIs with the goal of allowing a) testing of general atomic replication changes b) issuing replica swaps from the replicate queue (in 19.2). For previous steps, see: #39485 #39611 This change is not a pure plumbing PR. Instead, it unifies `(*Replica).addReplica` and `(*Replica).removeReplica` into a method that can do both, `(*Replica).addAndRemoveReplicas`. Given a slice of ReplicationChanges, this method first adds learner replicas corresponding to the desired new voters. After having sent snapshots to all of them, the method issues a configuration change that atomically - upgrades all learners to voters - removes any undesired replicas. Note that no atomic membership changes are *actually* carried out yet. This is because the callers of `addAndRemoveReplicas` pass in only a single change (i.e. an addition or removal), which the method also verifies. Three pieces are missing after this PR: First, we need to be able to instruct raft to carry out atomic configuration changes: https://github.com/cockroachdb/cockroach/blob/2e8db6ca53c59d3d281e64939f79d937195403d4/pkg/storage/replica_proposal_buf.go#L448-L451 which in particular requires being able to store the ConfState corresponding to a joint configuration in the unreplicated local state (under a new key). Second, we must pass the slice of changes handed to `AdminChangeReplicas` through to `addAndRemoveReplicas` without unrolling it first, see: https://github.com/cockroachdb/cockroach/blob/3b316bac6ef342590ddc68d2989714d6e126371a/pkg/storage/replica_command.go#L870-L891 and https://github.com/cockroachdb/cockroach/blob/3b316bac6ef342590ddc68d2989714d6e126371a/pkg/storage/replica.go#L1314-L1325 Third, we must to teach the replicate queue to issue the "atomic swaps"; this is the reason we're introducing atomic membership changes in the first place. Release note: None 39656: kv: init heartbeat txn log tag later r=nvanbenschoten a=tbg At init() time, the txn proto has not been populated yet. Found while investigating #39652. This change strikes me as clunky, but I don't have the bandwidth to dig deeper right now. Release note: None 39666: testutils/lint/passes: disable under nightly stress r=mjibson a=mjibson Under stress these error with "go build a: failed to cache compiled Go files". Fixes #39616 Fixes #39541 Fixes #39479 Release note: None 39669: rpc: use gRPC enforced minimum keepalive timeout r=knz a=ajwerner Before this commit we'd experience the following annoying log message from gRPC every time we create a new connection telling us that our setting is being ignored. ``` Adjusting keepalive ping interval to minimum period of 10s ``` Release note: None Co-authored-by: Tobias Schottdorf <[email protected]> Co-authored-by: Matt Jibson <[email protected]> Co-authored-by: Andrew Werner <[email protected]>
- Loading branch information