Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
kvserver: wait for replication before sending snapshots
This patch handles an edge case config changes: 1. Replica.ChangeReplicas() is called on r1 to add a replica to the range 2. ChangeReplicas runs a transaction that modifies the range descriptor 3. Say that the replica doing this loses the lease. Or it never had it to begin with - although at least most of the time ChangeReplicas is called on a leaseholder. 4. ChangeReplicas attempts to send a snapshot to the newly-added replica (which has been added as a learner, but that doesn’t matter here) 5. If r1 has not applied the descriptor change yet, then the snapshot it produces is invalid because it doesn't contain the recipient in the descriptor (the desc inside the snapshot). We'll then attempt to remove the new replica(s) and fail the whole ChangeReplicas. This patch makes snapshot generation wait until the sender replicas has caught up with the descriptor that is has previously written. At that point, it can send the snapshot just fine even if it isn't the leaseholder. I think this is generally a sane thing to do, but what prompted it is a test that was directly calling r.ChangeReplicas() on a replica that's not necessarily the leaseholder. This test becomes flaky with #55148 because the replica cannot always just take the lease as it did before that change: with #55148 it now cannot take the lease if it's not the leader, and so it ends up executing the ChangeReplicas() without holding a lease. That was triggering the scenario described above. I'm separately improving that one test to direct its request at the leaseholder explicitly, but I'm afraid there might be others. Release note: None
- Loading branch information