kvserver: don't allow raft forwarding of lease requests #55148

andreimatei · 2020-10-01T22:30:08Z

This patch aims to improve the behavior in scenarios where a follower
replica is behind, unaware of the latest lease, and it tries to acquire
a lease in its ignorant state. That lease acquisition request is bound
to fail (because the lease that it's based on is stale), but while it
fails (or, rather, until the behind replica finds out that it failed)
local requests are blocked. This blocking can last for a very long time
in situations where a snapshot is needed to catch up the follower, and
the snapshot is queued up behind many other snapshots (e.g. after a node
has been down for a while and gets restarted).

This patch tries an opinionated solution: never allow followers to
acquire leases. If there is a leader, it's a better idea for the leader
to acquire the lease. The leader might have a lease anyway or, even if
it doesn't, having the leader acquire it saves a leadership transfer
(leadership follows the lease).
We change the proposal path to recognize lease requests and reject them
early if the current replica is a follower and the leader is known. The
rejection points to the leader, which causes the request that triggered
the lease acquisition to make its way to the leader and attempt to
acquire a lease over there.

The next commit takes this further by short-circuiting the lease
proposal even sooner - but that patch is more best-effort.

Fixes #37906

Release note (bug fix): A bug causing queries sent to a
freshly-restarted node to sometimes hang for a long time while the node
catches up with replication has been fixed.

cockroach-teamcity · 2020-10-01T22:30:17Z

This change is

andreimatei · 2020-10-01T22:31:25Z

Patch is not quite ready and I will work on tests too, but I wanna see if it raises any discussion.
Particularly from @bdarnell; the approach in the first commit was discussed with the rest of the gang.

nvanbenschoten

Reviewed 5 of 5 files at r1, 2 of 2 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @bdarnell, @knz, @nvanbenschoten, and @tbg)

pkg/kv/kvserver/replica_proposal_buf.go, line 163 at r2 (raw file):

	// rejectProposalWithRedirectLocked rejects a proposal and redirects the
	// proposer to try it on another node. This is used to sometimes reject lease
	// acquisitions when the another replica is the leader; the intended

"the another"

pkg/kv/kvserver/replica_proposal_buf.go, line 411 at r2 (raw file):

	ents := make([]raftpb.Entry, 0, used)

	var follower bool

Give this block a comment.

pkg/kv/kvserver/replica_proposal_buf.go, line 703 at r2 (raw file):

}

// rejectProposalWithRedirectLocked is part of the proposer interface.

I think we also need to call cleanupFailedProposalLocked in this function before calling finishApplication. That will ensure that the proposal is removed from the proposals map and that it will release quota if it acquired any.

pkg/kv/kvserver/replica_proposal_buf.go, line 708 at r2 (raw file):

) {
	r := (*Replica)(rp)
	rangeDesc := r.Desc()

Doesn't this grab a read lock? Read locks aren't re-entrant. I think you want descRLocked

pkg/kv/kvserver/replica_proposal_buf.go, line 709 at r2 (raw file):

	r := (*Replica)(rp)
	rangeDesc := r.Desc()
	replicaDesc, err := r.getReplicaDescriptorRLocked()

If you just want the StoreID, why go through all this effort. Is r.store.StoreID() not sufficient?

pkg/kv/kvserver/replica_proposal_buf_test.go, line 74 at r2 (raw file):

}

func (t *testProposer) rejectProposalWithRedirectLocked(

It seems like we should be able to write some tests for this fairly easily.

pkg/kv/kvserver/replica_range_lease.go, line 212 at r3 (raw file):

	// every request from making it too far down the proposal path - in particular
	// to avoid the heartbeating of the liveness record done below.
	leader := p.repl.mu.leaderID

I'm not sold on this. We want to backport this PR, so we'd like it to be as small as possible. And we're planning on using master as a staging ground for the backport, so we shouldn't merge anything on top that we don't intend to backport in case it hides issues with the portion we do intend to backport.

So I'm going to assume that if this is in this PR, we want to backport it too. If so, then there's a high bar for changes, especially in such subtle areas as this. This is only an optimization, right? Have we seen that it's anything more than theoretical? Have we seen the liveness heartbeats discussed above cause issues? As of 20.2, liveness heartbeats are coalesced, so I'd be surprised if they were.

I'm primarily worried about the additional complexity this adds in exchange for the unclear optimization. The PR itself is already making a potentially destabilizing change, so I'm wary of making any issues that fall out from it harder to debug or increases the chance that we got it wrong.

bdarnell

r2 looks good to me; i share Nathan's reservations about r3.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @bdarnell, @knz, and @tbg)

pkg/kv/kvserver/replica_proposal_buf.go, line 421 at r2 (raw file):

			leader = roachpb.ReplicaID(status.Lead)
			if leader == b.p.replicaID() {
				// !!! this assertion fails, apparently in single-replica groups, so I

Yeah, I remember that single-replica groups short-circuit a lot of logic and don't maintain a lot of things that look like invariants in multi-replica groups. I don't remember anything well enough to give you useful advice right now, though.

pkg/kv/kvserver/replica_proposal_buf.go, line 463 at r2 (raw file):

		// - if the leader is known, we reject this proposal and make sure the
		// request that needed the lease is redirected to the leaseholder;
		// - if the leader is not known, we don't do anything special here to

We don't have to do anything special here because it will get dropped later, but could we just reject the proposal here whether we know the leader or not? Would avoid questions like the one in the assertion above.

nvanbenschoten

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @bdarnell, @knz, and @tbg)

pkg/kv/kvserver/replica_proposal_buf.go, line 463 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

We don't have to do anything special here because it will get dropped later, but could we just reject the proposal here whether we know the leader or not? Would avoid questions like the one in the assertion above.

I was going to respond that we don't want to immediately reject the proposal because the case where !leaderKnown is the case where we'd rather wait until the current election completes so we know who to redirect to, but now I'm confused. When will we ever be a follower and not know the current leader? I thought you'll always be a candidate/pre-candidate if you don't know the leader.

bdarnell

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @knz, @nvanbenschoten, and @tbg)

pkg/kv/kvserver/replica_proposal_buf.go, line 416 at r2 (raw file):

	if raftGroup != nil {
		status := raftGroup.BasicStatus()
		follower = status.RaftState == raft.StateFollower

Should this be state == follower or state != leader? I think candidates and pre-candidates (and learners, although they just use StateFollower) should be handled the same way as followers here.

pkg/kv/kvserver/replica_proposal_buf.go, line 463 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I was going to respond that we don't want to immediately reject the proposal because the case where !leaderKnown is the case where we'd rather wait until the current election completes so we know who to redirect to, but now I'm confused. When will we ever be a follower and not know the current leader? I thought you'll always be a candidate/pre-candidate if you don't know the leader.

The initial state of a raft instance is to be a follower with unknown leader. And there are a few ways to get back to that state. For example, if an election fails without a majority winner, the candidates become followers (with no leaders) as they back off before starting a new election.

nvanbenschoten

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @bdarnell, @knz, and @tbg)

pkg/kv/kvserver/replica_proposal_buf.go, line 463 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

The initial state of a raft instance is to be a follower with unknown leader. And there are a few ways to get back to that state. For example, if an election fails without a majority winner, the candidates become followers (with no leaders) as they back off before starting a new election.

Thanks for the explanation, that's very helpful. Then I agree with your other comment that we should invert the follower boolean to be a leader boolean and then check if !leader && p.Request.IsLeaseRequest() {.

I still think we should only redirect if the leader is known, though, for the reasons listed in #37906 (comment).

bdarnell

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @knz, @nvanbenschoten, and @tbg)

pkg/kv/kvserver/replica_proposal_buf.go, line 463 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Thanks for the explanation, that's very helpful. Then I agree with your other comment that we should invert the follower boolean to be a leader boolean and then check if !leader && p.Request.IsLeaseRequest() {.

I still think we should only redirect if the leader is known, though, for the reasons listed in #37906 (comment).

I don't feel strongly about what happens when there is no leader - it's a very rare case and either bouncing among random replicas or waiting synchronously would be acceptable to me. If we could get rid of the synchronous wait code altogether that would be nice, although I guess that's still useful in the normal cold start path.

andreimatei

PTAL at the test

I hit a problem though: replicas on draining nodes will not take leases. They will become leaders, though, which can lead to deadlock since, if a leader is draining, nobody else will attempt to take the lease and it also won’t.
The logic is here:

cockroach/pkg/kv/kvserver/replica_range_lease.go

Lines 619 to 623 in 785aea7

    
           if r.store.IsDraining() { 
        
           	// We've retired from active duty. 
        
           	return r.mu.pendingLeaseRequest.newResolvedHandle(roachpb.NewError( 
        
           		newNotLeaseHolderError(nil, r.store.StoreID(), r.mu.state.Desc))) 
        
           }

I’m also thinking if this is already a deadlock condition: is there anything preventing all the nodes with replicas for a range from draining at the same time? If that happens, nobody will want the lease?
I'm thinking of allowing a leader to acquire the lease even if it's draining.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @bdarnell, @knz, @nvanbenschoten, and @tbg)

pkg/kv/kvserver/replica_proposal_buf.go, line 163 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

"the another"

done

pkg/kv/kvserver/replica_proposal_buf.go, line 411 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Give this block a comment.

done

pkg/kv/kvserver/replica_proposal_buf.go, line 416 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Should this be state == follower or state != leader? I think candidates and pre-candidates (and learners, although they just use StateFollower) should be handled the same way as followers here.

done

pkg/kv/kvserver/replica_proposal_buf.go, line 463 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

I don't feel strongly about what happens when there is no leader - it's a very rare case and either bouncing among random replicas or waiting synchronously would be acceptable to me. If we could get rid of the synchronous wait code altogether that would be nice, although I guess that's still useful in the normal cold start path.

I think I'm in the "let's wait" camp too. Seems cleaner to me... I don't really like NotLeaseholderErrors that don't have a redirection in them.

pkg/kv/kvserver/replica_proposal_buf.go, line 703 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I think we also need to call cleanupFailedProposalLocked in this function before calling finishApplication. That will ensure that the proposal is removed from the proposals map and that it will release quota if it acquired any.

done

pkg/kv/kvserver/replica_proposal_buf.go, line 708 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Doesn't this grab a read lock? Read locks aren't re-entrant. I think you want descRLocked

removed

pkg/kv/kvserver/replica_proposal_buf.go, line 709 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

If you just want the StoreID, why go through all this effort. Is r.store.StoreID() not sufficient?

done, 10x

Before this patch, a draining node would not take any new leases once it's draining. This is a problem in case all replicas of a range are draining at the same time(*) - nobody wants the lease. Particularly because the liveness range is expiration-based (and thus permanently in need of new leases to be granted), this quickly results in nodes failing their liveness. It also becomes more of a problem with cockroachdb#55148, where we start refusing to take the lease on replicas that are not the leader - so if the leader is draining, we deadlock. We have code that tries to transfer the leadership away from draining nodes - which is a way to solve these deadlocks - but that code is pretty best-effort (and also useless if all the replicas are draining? not sure). This patch makes an exception for leaders, which now no longer refuse the lease even when they're draining. The reasonsing being that it's too easy to deadlock otherwise. (*) Draining multiple nodes at once might be questionable, but I think it's an acceptible practice. At least aspirationally we should allow an operator to drain as many nodes as it wants concurrently, as long as care is taken to not actually stop critical nodes. The way it ought to work is that you start draining 3 nodes, and then the draining becomes successful on one node at a time - this way allowing you to rotate them all. Even with this patch there's multiple holes in that story, one of them being the fact that draining multiple nodes at once will generally not succeed on any node because each node will be stuck with some leases. Release note: None

bdarnell

Reviewed 2 of 5 files at r1, 1 of 4 files at r4, 5 of 5 files at r5.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @bdarnell, @knz, @nvanbenschoten, and @tbg)

pkg/kv/kvserver/client_raft_test.go, line 1157 at r5 (raw file):

	log.Infof(ctx, "test: truncating log... done")

	// !!! assert lease expired

Don't forget this.

nvanbenschoten

Reviewed 5 of 5 files at r5.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @andreimatei, @bdarnell, @knz, and @tbg)

pkg/kv/kvserver/client_raft_test.go, line 1054 at r5 (raw file):

// the leader does not attempt to acquire a lease and, thus, does not block until
// it figures out that it cannot, in fact, take the lease.
// !!! more words

!!!

pkg/kv/kvserver/client_raft_test.go, line 1110 at r5 (raw file):

	// Wait until the leadership is transferred away from the partitioned replica.
	testutils.SucceedsSoon(t, func() error {
		mtc.advanceClock(ctx)

How long does this test take to run? Does advanceClock actually speed up the election process? I wouldn't expect it to make a replica tick faster.

pkg/kv/kvserver/client_raft_test.go, line 1175 at r5 (raw file):

	// Now we're going to send a request to the behind replica, and we expect it
	// to not block; we expect a redirection to the leader.

Did you confirm that this blocked before the introduction of your fix?

pkg/kv/kvserver/replica_proposal_buf.go, line 418 at r5 (raw file):

	// requests.
	var iAmTheLeader bool
	var leaderKnown bool

nit: var leaderKnown, iAmTheLeader bool

Before this patch, a draining node would not take any new leases once it's draining. This is a problem in case all replicas of a range are draining at the same time (e.g. when draining a single-node cluster, or when draining multiple nodes at the same time perhaps by mistake) - nobody wants the lease. Particularly because the liveness range is expiration-based (and thus permanently in need of new leases to be granted), this quickly results in nodes failing their liveness. It also becomes more of a problem with cockroachdb#55148, where we start refusing to take the lease on replicas that are not the leader - so if the leader is draining, we deadlock. This patch makes an exception for leaders, which now no longer refuse the lease even when they're draining. The reasonsing being that it's too easy to deadlock otherwise. Release note: None

55624: kvserver: don't always refuse to take leases when draining r=andreimatei a=andreimatei Before this patch, a draining node would not take any new leases once it's draining. This is a problem in case all replicas of a range are draining at the same time (e.g. when draining a single-node cluster, or when draining multiple nodes at the same time perhaps by mistake) - nobody wants the lease. Particularly because the liveness range is expiration-based (and thus permanently in need of new leases to be granted), this quickly results in nodes failing their liveness. It also becomes more of a problem with #55148, where we start refusing to take the lease on replicas that are not the leader - so if the leader is draining, we deadlock. This patch makes an exception for leaders, which now no longer refuse the lease even when they're draining. The reasonsing being that it's too easy to deadlock otherwise. Release note: None 56004: build: add instructions to rectify code that is not regenerated r=irfansharif a=otan Make it easier for the developer to know how to fix un-generated code by specifying the required commands. Release note: None 56016: backfill: remove unused bound account from column backfiller r=yuzefovich a=adityamaru Previously, we were creating a bound account in the column backfiller which was not being used to track anything. This change removes that. Release note: None Co-authored-by: Andrei Matei <[email protected]> Co-authored-by: Oliver Tan <[email protected]> Co-authored-by: Aditya Maru <[email protected]>

andreimatei

Should be ready to go. PTAL at the test.

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @bdarnell, @knz, @nvanbenschoten, and @tbg)

pkg/kv/kvserver/client_raft_test.go, line 1054 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

!!!

done

pkg/kv/kvserver/client_raft_test.go, line 1110 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

How long does this test take to run? Does advanceClock actually speed up the election process? I wouldn't expect it to make a replica tick faster.

moved to TestServer. And tweaked some knobs. It takes ~3.5s

pkg/kv/kvserver/client_raft_test.go, line 1157 at r5 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Don't forget this.

done

pkg/kv/kvserver/client_raft_test.go, line 1175 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Did you confirm that this blocked before the introduction of your fix?

yup

pkg/kv/kvserver/replica_proposal_buf.go, line 421 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Yeah, I remember that single-replica groups short-circuit a lot of logic and don't maintain a lot of things that look like invariants in multi-replica groups. I don't remember anything well enough to give you useful advice right now, though.

gone

pkg/kv/kvserver/replica_proposal_buf.go, line 418 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: var leaderKnown, iAmTheLeader bool

done

pkg/kv/kvserver/replica_proposal_buf_test.go, line 74 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

It seems like we should be able to write some tests for this fairly easily.

added test

pkg/kv/kvserver/replica_range_lease.go, line 212 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I'm not sold on this. We want to backport this PR, so we'd like it to be as small as possible. And we're planning on using master as a staging ground for the backport, so we shouldn't merge anything on top that we don't intend to backport in case it hides issues with the portion we do intend to backport.

So I'm going to assume that if this is in this PR, we want to backport it too. If so, then there's a high bar for changes, especially in such subtle areas as this. This is only an optimization, right? Have we seen that it's anything more than theoretical? Have we seen the liveness heartbeats discussed above cause issues? As of 20.2, liveness heartbeats are coalesced, so I'd be surprised if they were.

I'm primarily worried about the additional complexity this adds in exchange for the unclear optimization. The PR itself is already making a potentially destabilizing change, so I'm wary of making any issues that fall out from it harder to debug or increases the chance that we got it wrong.

removed commit

tbg · 2020-11-03T15:10:14Z

Test failure: pkg/kv/kvserver/client_raft_test.go:1090:11: this value of err is never used (SA4006)

@nvanbenschoten you've got this PR, right?

nvanbenschoten

Reviewed 2 of 8 files at r6, 9 of 11 files at r7.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei, @bdarnell, @knz, @nvanbenschoten, and @tbg)

pkg/kv/kvserver/client_raft_test.go, line 1283 at r7 (raw file):

// the lease and then blocking traffic for a long time until they find out
// whether they successfully took the lease or not.
func TestRequestsOnLaggingReplicaXXX(t *testing.T) {

Looks like this test name is off. Did you mean to remove this version with multiTestContext?

pkg/kv/kvserver/replica_proposal_buf.go, line 189 at r7 (raw file):

}

// setTestingKnobs should only be called immediately after Init(); it cannot be

nit: I would skip the unexported setter method and just assign directly like we do for the other testing knobs, but this is also fine.

pkg/kv/kvserver/replica_proposal_buf_test.go, line 74 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

added test

Did you want to add a test to this file? I think that would be the easiest way to unit test this change. It should be pretty satisfying to do.

andreimatei

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @bdarnell, @knz, @nvanbenschoten, and @tbg)

pkg/kv/kvserver/client_raft_test.go, line 1283 at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Looks like this test name is off. Did you mean to remove this version with multiTestContext?

done

pkg/kv/kvserver/replica_proposal_buf.go, line 189 at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: I would skip the unexported setter method and just assign directly like we do for the other testing knobs, but this is also fine.

done

pkg/kv/kvserver/replica_proposal_buf_test.go, line 74 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Did you want to add a test to this file? I think that would be the easiest way to unit test this change. It should be pretty satisfying to do.

done, PTAL

This patch backpedals a little bit on the logic introduced in cockroachdb#55148. That patch said that, if a leader is known, every other replica refuses to propose a lease acquisition. Instead, the replica in question redirects whomever was triggering the lease acquisition to the leader, thinking that the leader should take the lease. That patch introduced a deadlock: some replicas refuse to take the lease because they are not VOTER_FULL (see CheckCanReceiveLease()). To fix the deadlock, this patch incorporates that check in the proposal buffer's decision about whether or not to reject a proposal: if the leader is believed to refuse to take the lease, then we again forward our own lease request. An edge case to the edge case is when the leader is not even part of the proposer's range descriptor. This can happen if the proposer is far behind. In this case, we assume that the leader is eligible. If it isn't, the deadlock will resolve once the proposer catches up. A future patch will relax the conditions under which a replica agrees to take the lease. VOTER_INCOMING replicas should take the lease. VOTER_DEMOTING are more controversial. Fixes cockroachdb#57798 Release note: None

58722: kvserver: don't refuse to fwd lease proposals in some edge cases r=andreimatei a=andreimatei This patch backpedals a little bit on the logic introduced in #55148. That patch said that, if a leader is known, every other replica refuses to propose a lease acquisition. Instead, the replica in question redirects whomever was triggering the lease acquisition to the leader, thinking that the leader should take the lease. That patch introduced a deadlock: some replicas refuse to take the lease because they are not VOTER_FULL (see CheckCanReceiveLease()). To fix the deadlock, this patch incorporates that check in the proposal buffer's decision about whether or not to reject a proposal: if the leader is believed to refuse to take the lease, then we again forward our own lease request. An edge case to the edge case is when the leader is not even part of the proposer's range descriptor. This can happen if the proposer is far behind. In this case, we assume that the leader is eligible. If it isn't, the deadlock will resolve once the proposer catches up. A future patch will relax the conditions under which a replica agrees to take the lease. VOTER_INCOMING replicas should take the lease. VOTER_DEMOTING are more controversial. Fixes #57798 Release note: None 59087: util/log: new output format 'crdb-v2' r=itsbilal a=knz Fixes #50166. This new format intends to address all the known shortcomings with `crdb-v1` while remaining compatible with entry parsers designed for the previous version. See the user-facing release note below for a summary of changes; and the included reference documentation for details. Example TTY output with colors: ![image](https://user-images.githubusercontent.com/642886/104824568-261e9380-5853-11eb-9ad9-e5936f0890fd.png) Example for a single-line unstructured entry. Before: ``` I210116 22:17:03.736236 57 cli/start.go:681 ⋮ node startup completed: ``` After: ``` I210116 22:17:03.736236 57 cli/start.go:681 ⋮ [-] 12 node startup completed: tag field now always included ^^^ entry counter now always included ^^^ ``` Example for a multi-line unstructured entry. Before: ``` I210116 22:15:38.105666 452 gossip/gossip.go:567 ⋮ [n1] 74 gossip status (ok, 1 node‹›) gossip client (0/3 cur/max conns) gossip connectivity n1 [sentinel]; ``` (subsequent lines lack a log entry prefix; hard to determine where entries start and end) After: ``` I210116 22:15:38.105666 452 gossip/gossip.go:567 ⋮ [n1] 74 gossip status (ok, 1 node‹›) I210116 22:15:38.105666 452 gossip/gossip.go:567 ⋮ [n1] 74 +gossip client (0/3 cur/max conns) I210116 22:15:38.105666 452 gossip/gossip.go:567 ⋮ [n1] 74 +gossip connectivity I210116 22:15:38.105666 452 gossip/gossip.go:567 ⋮ [n1] 74 + n1 [sentinel]; ^^^ common prefix repeated for each msg line same entry counter for every subsequent line ^^^ continuation marker "+" on susequent lines ^^ ``` Example for a structured entry. Before: ``` I210116 22:14:38.175829 469 util/log/event_log.go:32 ⋮ [n1] Structured entry: {...} ``` After: ``` I210116 22:14:38.175829 469 util/log/event_log.go:32 ⋮ [n1] 21 ={...} entry counter always present ^^ equal sign "=" denotes structured entry ^^ ``` Release note (cli change): The default output format for `file-group` and `stderr` sinks has been changed to `crdb-v2`. This new format is non-ambiguous and makes it possible to reliably parse log files. Refer to the format's documentation for details. Additionally, it prevents single log lines from exceeding a large size; this problem is inherent to the `crdb-v1` format and can prevent `cockroach debug zip` from retrieving v1 log files. The new format has also been designed so that existinglog file analyzers for the `crdb-v1` format can read entries written the new format. However, this conversion may be imperfect. Again, refer to the new format's documentation for details. In case of incompatibility, users can force the previous format by using `format: crdb-v1` in their logging configuration. 59141: ui: upgrade admin-ui-components to new dep r=dhartunian a=dhartunian We renamed the `admin-ui-components` package to `cluster-ui`. Release note: None 59388: build,bazel: remove references to `gofmt` in bazel build r=rickystewart a=rickystewart This was cargo-culted from the `Makefile`, but isn't necessary to get the build to succeed, and interferes with hermiticity because it requires `gofmt` to be globally installed. It's simpler to just remove these references entirely. Release note: None Co-authored-by: Andrei Matei <[email protected]> Co-authored-by: Raphael 'kena' Poss <[email protected]> Co-authored-by: David Hartunian <[email protected]> Co-authored-by: Ricky Stewart <[email protected]>

…ts"" This reverts commit 49c8337. This is a revert of a revert, reintroducing the backport of the main commit from cockroachdb#55148. The backport was reverted because of a deadlock, which is not being fixed by the next commit. Release note (bug fix): A bug causing queries sent to a freshly-restarted node to sometimes hang for a long time while the node catches up with replication has been fixed.

This patch backpedals a little bit on the logic introduced in cockroachdb#55148. That patch said that, if a leader is known, every other replica refuses to propose a lease acquisition. Instead, the replica in question redirects whomever was triggering the lease acquisition to the leader, thinking that the leader should take the lease. That patch introduced a deadlock: some replicas refuse to take the lease because they are not VOTER_FULL (see CheckCanReceiveLease()). To fix the deadlock, this patch incorporates that check in the proposal buffer's decision about whether or not to reject a proposal: if the leader is believed to refuse to take the lease, then we again forward our own lease request. An edge case to the edge case is when the leader is not even part of the proposer's range descriptor. This can happen if the proposer is far behind. In this case, we assume that the leader is eligible. If it isn't, the deadlock will resolve once the proposer catches up. A future patch will relax the conditions under which a replica agrees to take the lease. VOTER_INCOMING replicas should take the lease. VOTER_DEMOTING are more controversial. Fixes cockroachdb#57798 Release note: None

61982: kvserver: allow lease extensions from followers in propBuf r=nvanbenschoten,andreimatei,tbg a=erikgrinaker The Raft proposal buffer currently rejects lease acquisition proposals from followers if there is an eligible leader, to avoid interfering with it trying to claim the lease. However, this logic unintentionally rejected lease extension proposals from followers as well, which could cause a leaseholder to lose its lease. This patch changes the proposal buffer to allow lease extension proposals from followers for extension-based leases as long as they still hold a valid lease. Resolves #59179. Follow-up to #55148. Release note: None Co-authored-by: Erik Grinaker <[email protected]>

Previously, we were only updating the `leases.error` (metric measuring the number of failed lease requests) if a lease request failed during command evaluation. However after cockroachdb#55148, lease requests can be rejected by the replica proposal buffer if there is a known raft leader that is different from the replica requesting the lease. We were not recording such rejections as failed lease requests in the metrics. This commit fixes this oversight. Release note: None

64531: kvserver: fix metrics collection for rejected lease requests r=aayushshah15 a=aayushshah15 Previously, we were only updating the `leases.error` (metric measuring the number of failed lease requests) if a lease request failed during command evaluation. However after #55148, lease requests can be rejected by the replica proposal buffer if there is a known raft leader that is different from the replica requesting the lease. We were not recording such rejections as failed lease requests in the metrics. This commit fixes this oversight. Release note: None Co-authored-by: Aayush Shah <[email protected]>

andreimatei requested review from bdarnell, knz, tbg and nvanbenschoten October 1, 2020 22:30

nvanbenschoten reviewed Oct 2, 2020

View reviewed changes

bdarnell reviewed Oct 8, 2020

View reviewed changes

nvanbenschoten reviewed Oct 8, 2020

View reviewed changes

bdarnell reviewed Oct 9, 2020

View reviewed changes

nvanbenschoten reviewed Oct 9, 2020

View reviewed changes

bdarnell reviewed Oct 13, 2020

View reviewed changes

andreimatei force-pushed the kv.lease-acq-forwarding branch from e27baf6 to b31e727 Compare October 14, 2020 23:57

andreimatei commented Oct 14, 2020

View reviewed changes

andreimatei mentioned this pull request Oct 15, 2020

kvserver: don't always refuse to take leases when draining #55624

Merged

bdarnell approved these changes Oct 19, 2020

View reviewed changes

nvanbenschoten approved these changes Oct 21, 2020

View reviewed changes

andreimatei commented Oct 30, 2020

View reviewed changes

andreimatei force-pushed the kv.lease-acq-forwarding branch from b31e727 to 6d60b64 Compare October 30, 2020 19:45

andreimatei requested a review from a team as a code owner October 30, 2020 19:45

nvanbenschoten reviewed Nov 3, 2020

View reviewed changes

andreimatei commented Nov 5, 2020

View reviewed changes

andreimatei mentioned this pull request Jan 11, 2021

kvserver: don't refuse to fwd lease proposals in some edge cases #58722

Merged

aayushshah15 mentioned this pull request Jan 12, 2021

roachtest: rebalance/by-load/replicas failed #57944

Closed

knz mentioned this pull request Jan 25, 2021

kvserver: properly handle invalid leases when draining #58613

Closed

erikgrinaker mentioned this pull request Mar 14, 2021

kvserver: allow lease extensions from followers in propBuf #61982

Merged

andreimatei deleted the kv.lease-acq-forwarding branch March 15, 2021 16:04

andreimatei mentioned this pull request Mar 18, 2021

release-20.2: backport fixes for "wedged replicas in need of snapshot" #62204

Merged

erikgrinaker mentioned this pull request Mar 24, 2021

release-21.1: kvserver: allow lease extensions from followers in propBuf #62517

Merged

irfansharif mentioned this pull request Apr 23, 2021

kvserver: circuit-break requests to unavailable ranges #33007

Closed

aayushshah15 mentioned this pull request May 1, 2021

kvserver: fix metrics collection for rejected lease requests #64531

Merged

tbg mentioned this pull request Dec 14, 2021

kv: partial/asymmetric partition that does not isolate node liveness does not force range lease revocation #49220

Open

tbg mentioned this pull request Jan 6, 2022

kvserver: simplify lease forwarding constraints #74506

Closed

This was referenced May 19, 2022

kv: understand, prevent, and recover quickly from a leaseholder needing a Raft snapshot #81561

Closed

kv: harden lease transfer protection to prevent a leaseholder from needing a Raft snapshot #81763

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kvserver: don't allow raft forwarding of lease requests #55148

kvserver: don't allow raft forwarding of lease requests #55148

andreimatei commented Oct 1, 2020 •

edited

Loading

cockroach-teamcity commented Oct 1, 2020

andreimatei commented Oct 1, 2020

nvanbenschoten left a comment

bdarnell left a comment

nvanbenschoten left a comment

bdarnell left a comment

nvanbenschoten left a comment

bdarnell left a comment

andreimatei left a comment

bdarnell left a comment

nvanbenschoten left a comment

andreimatei left a comment

tbg commented Nov 3, 2020

nvanbenschoten left a comment

andreimatei left a comment

	if r.store.IsDraining() {
	// We've retired from active duty.
	return r.mu.pendingLeaseRequest.newResolvedHandle(roachpb.NewError(
	newNotLeaseHolderError(nil, r.store.StoreID(), r.mu.state.Desc)))
	}

kvserver: don't allow raft forwarding of lease requests #55148

kvserver: don't allow raft forwarding of lease requests #55148

Conversation

andreimatei commented Oct 1, 2020 • edited Loading

cockroach-teamcity commented Oct 1, 2020

andreimatei commented Oct 1, 2020

nvanbenschoten left a comment

Choose a reason for hiding this comment

bdarnell left a comment

Choose a reason for hiding this comment

nvanbenschoten left a comment

Choose a reason for hiding this comment

bdarnell left a comment

Choose a reason for hiding this comment

nvanbenschoten left a comment

Choose a reason for hiding this comment

bdarnell left a comment

Choose a reason for hiding this comment

andreimatei left a comment

Choose a reason for hiding this comment

bdarnell left a comment

Choose a reason for hiding this comment

nvanbenschoten left a comment

Choose a reason for hiding this comment

andreimatei left a comment

Choose a reason for hiding this comment

tbg commented Nov 3, 2020

nvanbenschoten left a comment

Choose a reason for hiding this comment

andreimatei left a comment

Choose a reason for hiding this comment

andreimatei commented Oct 1, 2020 •

edited

Loading