kvserver: remove special-casing for lease index 0 in refreshProposalsLocked #74711

tbg · 2022-01-11T22:15:40Z

Touches #33007.

Is your feature request related to a problem? Please describe.

We have this code that essentially nukes any lease-applied-index-zero proposal that shows up in a refresh:

cockroach/pkg/kv/kvserver/replica_raft.go

Lines 1097 to 1109 in f68a5a7

    
           if p.command.MaxLeaseIndex == 0 { 
        
           	// Commands without a MaxLeaseIndex cannot be reproposed, as they might 
        
           	// apply twice. We also don't want to ask the proposer to retry these 
        
           	// special commands. 
        
           	r.cleanupFailedProposalLocked(p) 
        
           	log.VEventf(p.ctx, 2, "refresh (reason: %s) returning AmbiguousResultError for command "+ 
        
           		"without MaxLeaseIndex: %v", reason, p.command) 
        
           	p.finishApplication(ctx, proposalResult{Err: roachpb.NewError( 
        
           		roachpb.NewAmbiguousResultError( 
        
           			fmt.Sprintf("unknown status for command without MaxLeaseIndex "+ 
        
           				"at refreshProposalsLocked time (refresh reason: %s)", reason)))}) 
        
           	continue 
        
           }

The only such proposals are in fact leases:

cockroach/pkg/kv/kvserver/replica_proposal_buf.go

Lines 603 to 609 in 95a7671

    
           // Request a new max lease applied index for any request that isn't itself 
        
           // a lease request. Lease requests don't need unique max lease index values 
        
           // because their max lease indexes are ignored. See checkForcedErr. 
        
           if !p.Request.IsLeaseRequest() { 
        
           	b.assignedLAI++ 
        
           } 
        
           lai := b.assignedLAI

Some problems with terminating these proposals early:

the proposal's age isn't checked, so this might nuke a perfectly good lease proposal that was literally just inserted
it's getting in the way in kvserver: circuit-break requests to unavailable ranges #71806: we are trying to use the existence of old proposals as a trigger for the per-replica circuit breaker, but if there isn't a lease, then the only proposal is going to be a lease, and it's going to be terminated every ~3s which is much more aggressive than the timeout we're setting for what it means to be "old". So the breaker never trips but it should. (This is why I am filing this issue)

Describe the solution you'd like

I believe that leases have their own replay protection through the lease
sequence number. There is a surprising amount of logic in the lease code and so
I can't be entirely sure. Perhaps there are issues where multiple leases might
be proposed with the same sequence number (due to being Equivalent) but get
reordered? Equivalent is not commutative and it checks that the "later" lease
has a longer expiration (in the case of expiration-based leases) so there it
seems fine. And for epoch-based leases, basically only the epoch and start time
matter, which are always the same, so there isn't a problem either.

If we believe lease-related requests can safely be reproposed, since they are
the only requests special-cased in this way in refreshProposalsLocked, the
simplest solution would be to remove the special casing, or to defer their
removal until the breaker is tripped.

The text was updated successfully, but these errors were encountered:

Fixes cockroachdb#33007. Closes cockroachdb#61311. This PR uses the circuit breaker package introduced in cockroachdb#73641 to address issue cockroachdb#33007: When a replica becomes unavailable, it should eagerly refuse traffic that it believes would simply hang. Concretely, every request to the Replica gets a cancelable context that is sensitive to the circuit breaker tripping (relying on context cancellation makes sure we don't end up with a second channel that needs to be plumbed to everyone and their dog; Go makes it really difficult to join two cancellation chains); if the breaker is tripped when the request arrives, it is refused right away. In either case, the outgoing error is augmented to carry information about the tripped breaker. This isn't 100% trivial, since retreating from in-flight replication typically causes an `AmbiguousResultError`, and while we could avoid it in some cases we can't in all. A similar problem occurs when the canceled request was waiting on a lease, in which case the result is a NotLeaseholderError. For any request that made it in, if it enters replication but does not manage to succeed within `base.SlowRequestThreshold` (15s at time of writing), the breaker is tripped, cancelling all inflight and future requests until the breaker heals. Perhaps surprisingly, the "slowness" existing detection (the informational "have been waiting ... for proposal" warning in `executeWriteBatch`) was moved deeper into replication (`refreshProposalsLocked`), where it now trips the breaker. This has the advantage of providing a unified path for lease requests (which don't go through `executeWriteBatch`) and pipelined writes (which return before waiting on the inflight replication process). To make this work, we need to pick a little fight with how leases are (not) refreshed (cockroachdb#74711) and we need to remember the ticks at which a proposal was first inserted (despite potential reproposals). A tripped breaker deploys a background probe which sends a `ProbeRequest` (introduced in cockroachdb#72972) to determine when to heal; this is roughly the case whenever replicating a command through Raft is possible for the Replica, either by appending to the log as the Raft leader, or by forwarding to the Raft leader. A tiny bit of special casing allows requests made by the probe to bypass the breaker. As of this PR, all of this is off via the `kv.replica_circuit_breakers.enabled` cluster setting while we determine which of the many follow-up items to target for 22.1 (see cockroachdb#74705). Primarily though, the breaker-sensitive context cancelation is a toy implementation that comes with too much of a performance overhead (one extra goroutine per request); cockroachdb#74707 will address that. Other things not done here that we certainly want in the 22.1 release are - UI work (cockroachdb#74713) - Metrics (cockroachdb#74505) The release note is deferred to cockroachdb#74705 (where breakers are turned on). Release note: None

Fixes cockroachdb#33007. Closes cockroachdb#61311. This PR uses the circuit breaker package introduced in cockroachdb#73641 to address issue cockroachdb#33007: When a replica becomes unavailable, it should eagerly refuse traffic that it believes would simply hang. Concretely, every request to the Replica gets a cancelable context that is sensitive to the circuit breaker tripping (relying on context cancellation makes sure we don't end up with a second channel that needs to be plumbed to everyone and their dog; Go makes it really difficult to join two cancellation chains); if the breaker is tripped when the request arrives, it is refused right away. In either case, the outgoing error is augmented to carry information about the tripped breaker. This isn't 100% trivial, since retreating from in-flight replication typically causes an `AmbiguousResultError`, and while we could avoid it in some cases we can't in all. A similar problem occurs when the canceled request was waiting on a lease, in which case the result is a NotLeaseholderError. For any request that made it in, if it enters replication but does not manage to succeed within `base.SlowRequestThreshold` (15s at time of writing), the breaker is tripped, cancelling all inflight and future requests until the breaker heals. Perhaps surprisingly, the existing "slowness" detection (the informational "have been waiting ... for proposal" warning in `executeWriteBatch`) was moved deeper into replication (`refreshProposalsLocked`), where it now trips the breaker. This has the advantage of providing a unified path for lease requests (which don't go through `executeWriteBatch`) and pipelined writes (which return before waiting on the inflight replication process). To make this work, we need to pick a little fight with how leases are (not) refreshed (cockroachdb#74711) and we need to remember the ticks at which a proposal was first inserted (despite potential reproposals). A tripped breaker deploys a background probe which sends a `ProbeRequest` (introduced in cockroachdb#72972) to determine when to heal; this is roughly the case whenever replicating a command through Raft is possible for the Replica, either by appending to the log as the Raft leader, or by forwarding to the Raft leader. A tiny bit of special casing allows requests made by the probe to bypass the breaker. As of this PR, all of this is opt-in via the `kv.replica_circuit_breakers.enabled` cluster setting while we determine which of the many follow-up items to target for 22.1 (see cockroachdb#74705). For example, the breaker-sensitive context cancelation is a toy implementation that comes with too much of a performance overhead (one extra goroutine per request); cockroachdb#74707 will address that. Other things not done here that we certainly want in the 22.1 release are UI work (cockroachdb#74713) and metrics (cockroachdb#74505). The release note is deferred to cockroachdb#74705 (where breakers are turned on). Release note: None

Fixes cockroachdb#33007. Closes cockroachdb#61311. This PR uses the circuit breaker package introduced in cockroachdb#73641 to address issue cockroachdb#33007: When a replica becomes unavailable, it should eagerly refuse traffic that it believes would simply hang. Concretely, every request to the Replica gets a cancelable context that is sensitive to the circuit breaker tripping (relying on context cancellation makes sure we don't end up with a second channel that needs to be plumbed to everyone and their dog; Go makes it really difficult to join two cancellation chains); if the breaker is tripped when the request arrives, it is refused right away. In either case, the outgoing error is augmented to carry information about the tripped breaker. This isn't 100% trivial, since retreating from in-flight replication typically causes an `AmbiguousResultError`, and while we could avoid it in some cases we can't in all. A similar problem occurs when the canceled request was waiting on a lease, in which case the result is a NotLeaseholderError. For any request that made it in, if it enters replication but does not manage to succeed within the timeout specified by the `kv.replica_circuit_breaker.slow_replication_threshold` cluster setting, the breaker is tripped, cancelling all inflight and future requests until the breaker heals. Perhaps surprisingly, the existing "slowness" detection (the informational "have been waiting ... for proposal" warning in `executeWriteBatch`) was moved deeper into replication (`refreshProposalsLocked`), where it now trips the breaker. This has the advantage of providing a unified path for lease requests (which don't go through `executeWriteBatch`) and pipelined writes (which return before waiting on the inflight replication process). To make this work, we need to pick a little fight with how leases are (not) refreshed (cockroachdb#74711) and we need to remember the ticks at which a proposal was first inserted (despite potential reproposals). A tripped breaker deploys a background probe which sends a `ProbeRequest` (introduced in cockroachdb#72972) to determine when to heal; this is roughly the case whenever replicating a command through Raft is possible for the Replica, either by appending to the log as the Raft leader, or by forwarding to the Raft leader. A tiny bit of special casing allows requests made by the probe to bypass the breaker. As of this PR, the cluster setting defaults to zero (disabling the entire mechanism) until some necessary follow-up items have been addressed (see cockroachdb#74705). For example, the breaker-sensitive context cancelation is a toy implementation that comes with too much of a performance overhead (one extra goroutine per request); cockroachdb#74707 will address that. Other things not done here that we certainly want in the 22.1 release are UI work (cockroachdb#74713) and metrics (cockroachdb#74505). The release note is deferred to cockroachdb#74705 (where breakers are turned on). Release note: None

Fixes cockroachdb#33007. Closes cockroachdb#61311. This PR uses the circuit breaker package introduced in cockroachdb#73641 to address issue cockroachdb#33007: When a replica becomes unavailable, it should eagerly refuse traffic that it believes would simply hang. Concretely, every request to the Replica gets a cancelable context that is sensitive to the circuit breaker tripping (relying on context cancellation makes sure we don't end up with a second channel that needs to be plumbed to everyone and their dog; Go makes it really difficult to join two cancellation chains); if the breaker is tripped when the request arrives, it is refused right away. In either case, the outgoing error is augmented to carry information about the tripped breaker. This isn't 100% trivial, since retreating from in-flight replication typically causes an `AmbiguousResultError`, and while we could avoid it in some cases we can't in all. A similar problem occurs when the canceled request was waiting on a lease, in which case the result is a NotLeaseholderError. For any request that made it in, if it enters replication but does not manage to succeed within the timeout specified by the `kv.replica_circuit_breaker.slow_replication_threshold` cluster setting, the breaker is tripped, cancelling all inflight and future requests until the breaker heals. Perhaps surprisingly, the existing "slowness" detection (the informational "have been waiting ... for proposal" warning in `executeWriteBatch`) was moved deeper into replication (`refreshProposalsLocked`), where it now trips the breaker. This has the advantage of providing a unified path for lease requests (which don't go through `executeWriteBatch`) and pipelined writes (which return before waiting on the inflight replication process). To make this work, we need to pick a little fight with how leases are (not) refreshed (cockroachdb#74711) and we need to remember the ticks at which a proposal was first inserted (despite potential reproposals). Perhaps surprisingly, when the breaker is tripped, *all* traffic to the Replica gets the fail-fast behavior, not just mutations. This is because even though a read may look good to serve based on the lease, we would also need to check for latches, and in particular we would need to fail-fast if there is a transitive dependency on any write (since that write is presumably stuck). This is not trivial and so we don't do it in this first iteration (see cockroachdb#74799). A tripped breaker deploys a background probe which sends a `ProbeRequest` (introduced in cockroachdb#72972) to determine when to heal; this is roughly the case whenever replicating a command through Raft is possible for the Replica, either by appending to the log as the Raft leader, or by forwarding to the Raft leader. A tiny bit of special casing allows requests made by the probe to bypass the breaker. As of this PR, the cluster setting defaults to zero (disabling the entire mechanism) until some necessary follow-up items have been addressed (see cockroachdb#74705). For example, the breaker-sensitive context cancelation is a toy implementation that comes with too much of a performance overhead (one extra goroutine per request); cockroachdb#74707 will address that. Other things not done here that we certainly want in the 22.1 release are UI work (cockroachdb#74713) and metrics (cockroachdb#74505). The release note is deferred to cockroachdb#74705 (where breakers are turned on). Release note: None touchie

71806: kvserver: circuit-break requests to unavailable ranges r=erikgrinaker a=tbg Fixes #33007. Closes #61311. This PR uses the circuit breaker package introduced in #73641 to address issue #33007: When a replica becomes unavailable, it should eagerly refuse traffic that it believes would simply hang. Concretely, every request to the Replica gets a cancelable context that is sensitive to the circuit breaker tripping (relying on context cancellation makes sure we don't end up with a second channel that needs to be plumbed to everyone and their dog; Go makes it really difficult to join two cancellation chains); if the breaker is tripped when the request arrives, it is refused right away. In either case, the outgoing error is augmented to carry information about the tripped breaker. This isn't 100% trivial, since retreating from in-flight replication typically causes an `AmbiguousResultError`, and while we could avoid it in some cases we can't in all. A similar problem occurs when the canceled request was waiting on a lease, in which case the result is a NotLeaseholderError. For any request that made it in, if it enters replication but does not manage to succeed within `base.SlowRequestThreshold` (15s at time of writing), the breaker is tripped, cancelling all inflight and future requests until the breaker heals. Perhaps surprisingly, the existing "slowness" detection (the informational "have been waiting ... for proposal" warning in `executeWriteBatch`) was moved deeper into replication (`refreshProposalsLocked`), where it now trips the breaker. This has the advantage of providing a unified path for lease requests (which don't go through `executeWriteBatch`) and pipelined writes (which return before waiting on the inflight replication process). To make this work, we need to pick a little fight with how leases are (not) refreshed (#74711) and we need to remember the ticks at which a proposal was first inserted (despite potential reproposals). A tripped breaker deploys a background probe which sends a `ProbeRequest` (introduced in #72972) to determine when to heal; this is roughly the case whenever replicating a command through Raft is possible for the Replica, either by appending to the log as the Raft leader, or by forwarding to the Raft leader. A tiny bit of special casing allows requests made by the probe to bypass the breaker. As of this PR, all of this is opt-in via the `kv.replica_circuit_breakers.enabled` cluster setting while we determine which of the many follow-up items to target for 22.1 (see #74705). For example, the breaker-sensitive context cancelation is a toy implementation that comes with too much of a performance overhead (one extra goroutine per request); #74707 will address that. Other things not done here that we certainly want in the 22.1 release are UI work (#74713) and metrics (#74505). The release note is deferred to #74705 (where breakers are turned on). Release note: None Co-authored-by: Tobias Grieger <[email protected]>

@nvanbenschoten

74810: kvserver: remove unnecessary special casing of lease for LAI r=nvanbenschoten a=tbg Touches #33007 (this simplifies #71806). We were previously "not" assigning a LeaseAppliedIndex to lease request proposals. But we were doing so very half-heartedly: instead of always assigning zero, we assigned "whatever was assigned to the previous command". This meant that in practice, the first lease on a range would get zero, and any subsequent leases would get some nonzero numbers. This wasn't particularly principled and raises eyebrows. For testing convenience and clarity it is helpful to assign a zero MLAI to leases, which is what this commit does. We then turn to the special-casing in refreshProposalsLocked. `(*Replica).refreshProposalsLocked` previously refused to repropose commands that had a zero `AppliedLeaseIndex` (LAI). This was done on grounds of needing to provide replay protection for these requests. Upon inspection it became clear that a zero MLAI could only ever apply to lease requests; they are the only proposals with an exception, and as we saw above it would not usually result in a zero, but could (and especially in testing, where most leases are the first lease on their respective range). We [discussed] this internally and concluded that leases can in fact be reproposed, since they have their own replay protection mediated through the lease's sequence number. This is great since as stated above, we did in fact repropose (some) leases and have for years. This commit removes the special casing. Fixes #74711. As suggested by @nvanbenschoten I ran the `kvserver` tests with this diff that reproposes every lease request: ```diff diff --git a/pkg/kv/kvserver/replica_proposal_buf.go b/pkg/kv/kvserver/replica_proposal_buf.go index 95765d8efd..1b66745866 100644 --- a/pkg/kv/kvserver/replica_proposal_buf.go +++ b/pkg/kv/kvserver/replica_proposal_buf.go @@ -576,6 +576,11 @@ func (b *propBuf) FlushLockedWithRaftGroup( ents = append(ents, raftpb.Entry{ Data: p.encodedCommand, }) + if p.Request.IsLeaseRequest() { + ents = append(ents, raftpb.Entry{ + Data: p.encodedCommand, + }) + } } } if firstErr != nil { ``` This inspired the subsequent commits. [discussed]: https://cockroachlabs.slack.com/archives/G01G8LK77DK/p1641559185021100 Release note: None 74923: sql: pre-split hash sharded indexes ranges before DELETE_AND_WRITE_ONLY r=chengxiong-ruan a=chengxiong-ruan Fixes #74558 Pre-split ranges on shard boundaries before SchemaChanger move new hash sharded indexes from DELETE_ONLY to DELETE_AND_WRITE_ONLY state. Release note (bug fix): None Co-authored-by: Tobias Grieger <[email protected]> Co-authored-by: Chengxiong Ruan <[email protected]>

tbg added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Jan 11, 2022

tbg mentioned this issue Jan 11, 2022

kvserver: consider additional triggers for replica circuit breakers #74712

Closed

tbg mentioned this issue Jan 11, 2022

kvserver: circuit-break requests to unavailable ranges #71806

Merged

tbg self-assigned this Jan 14, 2022

tbg mentioned this issue Jan 14, 2022

kvserver: circuit-break requests to unavailable ranges #33007

Closed

tbg mentioned this issue Jan 17, 2022

kvserver: remove unnecessary special casing of lease for LAI #74810

Merged

craig bot closed this as completed in 852a80c Jan 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kvserver: remove special-casing for lease index 0 in refreshProposalsLocked #74711

kvserver: remove special-casing for lease index 0 in refreshProposalsLocked #74711

tbg commented Jan 11, 2022 •

edited

Loading

kvserver: remove special-casing for lease index 0 in refreshProposalsLocked #74711

kvserver: remove special-casing for lease index 0 in refreshProposalsLocked #74711

Comments

tbg commented Jan 11, 2022 • edited Loading

tbg commented Jan 11, 2022 •

edited

Loading