Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
kvserver: don't allow raft forwarding of lease requests
This patch aims to improve the behavior in scenarios where a follower replica is behind, unaware of the latest lease, and it tries to acquire a lease in its ignorant state. That lease acquisition request is bound to fail (because the lease that it's based on is stale), but while it fails (or, rather, until the behind replica finds out that it failed) local requests are blocked. This blocking can last for a very long time in situations where a snapshot is needed to catch up the follower, and the snapshot is queued up behind many other snapshots (e.g. after a node has been down for a while and gets restarted). This patch tries an opinionated solution: never allow followers to acquire leases. If there is a leader, it's a better idea for the leader to acquire the lease. The leader might have a lease anyway or, even if it doesn't, having the leader acquire it saves a leadership transfer (leadership follows the lease). We change the proposal path to recognize lease requests and reject them early if the current replica is a follower and the leader is known. The rejection points to the leader, which causes the request that triggered the lease acquisition to make its way to the leader and attempt to acquire a lease over there. Fixes #37906 As described in #37906, the badness caused by requests blocking behind a doomed lease acq request could be reproduced with a 100-warehouse tpcc workload (--no-wait) on a 3 node cluster. Turning off a node for 10 minutes and then turning it back on would result in the cluster being wedged for a few minutes until all the snapshots are transferred. I've verified that this patch fixes it. Release note (bug fix): A bug causing queries sent to a freshly-restarted node to sometimes hang for a long time while the node catches up with replication has been fixed.
- Loading branch information