Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: don't allow raft forwarding of lease requests #55148

Merged
merged 3 commits into from
Nov 13, 2020

Conversation

andreimatei
Copy link
Contributor

@andreimatei andreimatei commented Oct 1, 2020

This patch aims to improve the behavior in scenarios where a follower
replica is behind, unaware of the latest lease, and it tries to acquire
a lease in its ignorant state. That lease acquisition request is bound
to fail (because the lease that it's based on is stale), but while it
fails (or, rather, until the behind replica finds out that it failed)
local requests are blocked. This blocking can last for a very long time
in situations where a snapshot is needed to catch up the follower, and
the snapshot is queued up behind many other snapshots (e.g. after a node
has been down for a while and gets restarted).

This patch tries an opinionated solution: never allow followers to
acquire leases. If there is a leader, it's a better idea for the leader
to acquire the lease. The leader might have a lease anyway or, even if
it doesn't, having the leader acquire it saves a leadership transfer
(leadership follows the lease).
We change the proposal path to recognize lease requests and reject them
early if the current replica is a follower and the leader is known. The
rejection points to the leader, which causes the request that triggered
the lease acquisition to make its way to the leader and attempt to
acquire a lease over there.

The next commit takes this further by short-circuiting the lease
proposal even sooner - but that patch is more best-effort.

Fixes #37906

Release note (bug fix): A bug causing queries sent to a
freshly-restarted node to sometimes hang for a long time while the node
catches up with replication has been fixed.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@andreimatei
Copy link
Contributor Author

Patch is not quite ready and I will work on tests too, but I wanna see if it raises any discussion.
Particularly from @bdarnell; the approach in the first commit was discussed with the rest of the gang.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 5 of 5 files at r1, 2 of 2 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @bdarnell, @knz, @nvanbenschoten, and @tbg)


pkg/kv/kvserver/replica_proposal_buf.go, line 163 at r2 (raw file):

	// rejectProposalWithRedirectLocked rejects a proposal and redirects the
	// proposer to try it on another node. This is used to sometimes reject lease
	// acquisitions when the another replica is the leader; the intended

"the another"


pkg/kv/kvserver/replica_proposal_buf.go, line 411 at r2 (raw file):

	ents := make([]raftpb.Entry, 0, used)

	var follower bool

Give this block a comment.


pkg/kv/kvserver/replica_proposal_buf.go, line 703 at r2 (raw file):

}

// rejectProposalWithRedirectLocked is part of the proposer interface.

I think we also need to call cleanupFailedProposalLocked in this function before calling finishApplication. That will ensure that the proposal is removed from the proposals map and that it will release quota if it acquired any.


pkg/kv/kvserver/replica_proposal_buf.go, line 708 at r2 (raw file):

) {
	r := (*Replica)(rp)
	rangeDesc := r.Desc()

Doesn't this grab a read lock? Read locks aren't re-entrant. I think you want descRLocked


pkg/kv/kvserver/replica_proposal_buf.go, line 709 at r2 (raw file):

	r := (*Replica)(rp)
	rangeDesc := r.Desc()
	replicaDesc, err := r.getReplicaDescriptorRLocked()

If you just want the StoreID, why go through all this effort. Is r.store.StoreID() not sufficient?


pkg/kv/kvserver/replica_proposal_buf_test.go, line 74 at r2 (raw file):

}

func (t *testProposer) rejectProposalWithRedirectLocked(

It seems like we should be able to write some tests for this fairly easily.


pkg/kv/kvserver/replica_range_lease.go, line 212 at r3 (raw file):

	// every request from making it too far down the proposal path - in particular
	// to avoid the heartbeating of the liveness record done below.
	leader := p.repl.mu.leaderID

I'm not sold on this. We want to backport this PR, so we'd like it to be as small as possible. And we're planning on using master as a staging ground for the backport, so we shouldn't merge anything on top that we don't intend to backport in case it hides issues with the portion we do intend to backport.

So I'm going to assume that if this is in this PR, we want to backport it too. If so, then there's a high bar for changes, especially in such subtle areas as this. This is only an optimization, right? Have we seen that it's anything more than theoretical? Have we seen the liveness heartbeats discussed above cause issues? As of 20.2, liveness heartbeats are coalesced, so I'd be surprised if they were.

I'm primarily worried about the additional complexity this adds in exchange for the unclear optimization. The PR itself is already making a potentially destabilizing change, so I'm wary of making any issues that fall out from it harder to debug or increases the chance that we got it wrong.

Copy link
Contributor

@bdarnell bdarnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r2 looks good to me; i share Nathan's reservations about r3.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @bdarnell, @knz, and @tbg)


pkg/kv/kvserver/replica_proposal_buf.go, line 421 at r2 (raw file):

			leader = roachpb.ReplicaID(status.Lead)
			if leader == b.p.replicaID() {
				// !!! this assertion fails, apparently in single-replica groups, so I

Yeah, I remember that single-replica groups short-circuit a lot of logic and don't maintain a lot of things that look like invariants in multi-replica groups. I don't remember anything well enough to give you useful advice right now, though.


pkg/kv/kvserver/replica_proposal_buf.go, line 463 at r2 (raw file):

		// - if the leader is known, we reject this proposal and make sure the
		// request that needed the lease is redirected to the leaseholder;
		// - if the leader is not known, we don't do anything special here to

We don't have to do anything special here because it will get dropped later, but could we just reject the proposal here whether we know the leader or not? Would avoid questions like the one in the assertion above.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @bdarnell, @knz, and @tbg)


pkg/kv/kvserver/replica_proposal_buf.go, line 463 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

We don't have to do anything special here because it will get dropped later, but could we just reject the proposal here whether we know the leader or not? Would avoid questions like the one in the assertion above.

I was going to respond that we don't want to immediately reject the proposal because the case where !leaderKnown is the case where we'd rather wait until the current election completes so we know who to redirect to, but now I'm confused. When will we ever be a follower and not know the current leader? I thought you'll always be a candidate/pre-candidate if you don't know the leader.

Copy link
Contributor

@bdarnell bdarnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @knz, @nvanbenschoten, and @tbg)


pkg/kv/kvserver/replica_proposal_buf.go, line 416 at r2 (raw file):

	if raftGroup != nil {
		status := raftGroup.BasicStatus()
		follower = status.RaftState == raft.StateFollower

Should this be state == follower or state != leader? I think candidates and pre-candidates (and learners, although they just use StateFollower) should be handled the same way as followers here.


pkg/kv/kvserver/replica_proposal_buf.go, line 463 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I was going to respond that we don't want to immediately reject the proposal because the case where !leaderKnown is the case where we'd rather wait until the current election completes so we know who to redirect to, but now I'm confused. When will we ever be a follower and not know the current leader? I thought you'll always be a candidate/pre-candidate if you don't know the leader.

The initial state of a raft instance is to be a follower with unknown leader. And there are a few ways to get back to that state. For example, if an election fails without a majority winner, the candidates become followers (with no leaders) as they back off before starting a new election.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @bdarnell, @knz, and @tbg)


pkg/kv/kvserver/replica_proposal_buf.go, line 463 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

The initial state of a raft instance is to be a follower with unknown leader. And there are a few ways to get back to that state. For example, if an election fails without a majority winner, the candidates become followers (with no leaders) as they back off before starting a new election.

Thanks for the explanation, that's very helpful. Then I agree with your other comment that we should invert the follower boolean to be a leader boolean and then check if !leader && p.Request.IsLeaseRequest() {.

I still think we should only redirect if the leader is known, though, for the reasons listed in #37906 (comment).

Copy link
Contributor

@bdarnell bdarnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @knz, @nvanbenschoten, and @tbg)


pkg/kv/kvserver/replica_proposal_buf.go, line 463 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Thanks for the explanation, that's very helpful. Then I agree with your other comment that we should invert the follower boolean to be a leader boolean and then check if !leader && p.Request.IsLeaseRequest() {.

I still think we should only redirect if the leader is known, though, for the reasons listed in #37906 (comment).

I don't feel strongly about what happens when there is no leader - it's a very rare case and either bouncing among random replicas or waiting synchronously would be acceptable to me. If we could get rid of the synchronous wait code altogether that would be nice, although I guess that's still useful in the normal cold start path.

@andreimatei andreimatei force-pushed the kv.lease-acq-forwarding branch from e27baf6 to b31e727 Compare October 14, 2020 23:57
Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL at the test

I hit a problem though: replicas on draining nodes will not take leases. They will become leaders, though, which can lead to deadlock since, if a leader is draining, nobody else will attempt to take the lease and it also won’t.
The logic is here:

if r.store.IsDraining() {
// We've retired from active duty.
return r.mu.pendingLeaseRequest.newResolvedHandle(roachpb.NewError(
newNotLeaseHolderError(nil, r.store.StoreID(), r.mu.state.Desc)))
}

I’m also thinking if this is already a deadlock condition: is there anything preventing all the nodes with replicas for a range from draining at the same time? If that happens, nobody will want the lease?
I'm thinking of allowing a leader to acquire the lease even if it's draining.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @bdarnell, @knz, @nvanbenschoten, and @tbg)


pkg/kv/kvserver/replica_proposal_buf.go, line 163 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

"the another"

done


pkg/kv/kvserver/replica_proposal_buf.go, line 411 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Give this block a comment.

done


pkg/kv/kvserver/replica_proposal_buf.go, line 416 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Should this be state == follower or state != leader? I think candidates and pre-candidates (and learners, although they just use StateFollower) should be handled the same way as followers here.

done


pkg/kv/kvserver/replica_proposal_buf.go, line 463 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

I don't feel strongly about what happens when there is no leader - it's a very rare case and either bouncing among random replicas or waiting synchronously would be acceptable to me. If we could get rid of the synchronous wait code altogether that would be nice, although I guess that's still useful in the normal cold start path.

I think I'm in the "let's wait" camp too. Seems cleaner to me... I don't really like NotLeaseholderErrors that don't have a redirection in them.


pkg/kv/kvserver/replica_proposal_buf.go, line 703 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I think we also need to call cleanupFailedProposalLocked in this function before calling finishApplication. That will ensure that the proposal is removed from the proposals map and that it will release quota if it acquired any.

done


pkg/kv/kvserver/replica_proposal_buf.go, line 708 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Doesn't this grab a read lock? Read locks aren't re-entrant. I think you want descRLocked

removed


pkg/kv/kvserver/replica_proposal_buf.go, line 709 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

If you just want the StoreID, why go through all this effort. Is r.store.StoreID() not sufficient?

done, 10x

andreimatei added a commit to andreimatei/cockroach that referenced this pull request Oct 15, 2020
Before this patch, a draining node would not take any new leases once
it's draining. This is a problem in case all replicas of a range are
draining at the same time(*) - nobody wants the lease. Particularly
because the liveness range is expiration-based (and thus permanently in
need of new leases to be granted), this quickly results in nodes failing
their liveness.
It also becomes more of a problem with cockroachdb#55148, where we start refusing
to take the lease on replicas that are not the leader - so if the leader
is draining, we deadlock. We have code that tries to transfer the
leadership away from draining nodes - which is a way to solve these
deadlocks - but that code is pretty best-effort (and also useless if all
the replicas are draining? not sure).

This patch makes an exception for leaders, which now no longer refuse
the lease even when they're draining. The reasonsing being that it's too
easy to deadlock otherwise.

(*) Draining multiple nodes at once might be questionable, but I think
it's an acceptible practice. At least aspirationally we should allow an
operator to drain as many nodes as it wants concurrently, as long as
care is taken to not actually stop critical nodes. The way it ought to
work is that you start draining 3 nodes, and then the draining becomes
successful on one node at a time - this way allowing you to rotate them
all. Even with this patch there's multiple holes in that story, one of
them being the fact that draining multiple nodes at once will generally
not succeed on any node because each node will be stuck with some
leases.

Release note: None
Copy link
Contributor

@bdarnell bdarnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 5 files at r1, 1 of 4 files at r4, 5 of 5 files at r5.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @bdarnell, @knz, @nvanbenschoten, and @tbg)


pkg/kv/kvserver/client_raft_test.go, line 1157 at r5 (raw file):

	log.Infof(ctx, "test: truncating log... done")

	// !!! assert lease expired

Don't forget this.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 5 of 5 files at r5.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @andreimatei, @bdarnell, @knz, and @tbg)


pkg/kv/kvserver/client_raft_test.go, line 1054 at r5 (raw file):

// the leader does not attempt to acquire a lease and, thus, does not block until
// it figures out that it cannot, in fact, take the lease.
// !!! more words

!!!


pkg/kv/kvserver/client_raft_test.go, line 1110 at r5 (raw file):

	// Wait until the leadership is transferred away from the partitioned replica.
	testutils.SucceedsSoon(t, func() error {
		mtc.advanceClock(ctx)

How long does this test take to run? Does advanceClock actually speed up the election process? I wouldn't expect it to make a replica tick faster.


pkg/kv/kvserver/client_raft_test.go, line 1175 at r5 (raw file):

	// Now we're going to send a request to the behind replica, and we expect it
	// to not block; we expect a redirection to the leader.

Did you confirm that this blocked before the introduction of your fix?


pkg/kv/kvserver/replica_proposal_buf.go, line 418 at r5 (raw file):

	// requests.
	var iAmTheLeader bool
	var leaderKnown bool

nit: var leaderKnown, iAmTheLeader bool

andreimatei added a commit to andreimatei/cockroach that referenced this pull request Oct 23, 2020
Before this patch, a draining node would not take any new leases once
it's draining. This is a problem in case all replicas of a range are
draining at the same time (e.g. when draining a single-node cluster, or
when draining multiple nodes at the same time perhaps by mistake) -
nobody wants the lease. Particularly because the liveness range is
expiration-based (and thus permanently in need of new leases to be
granted), this quickly results in nodes failing their liveness.
It also becomes more of a problem with cockroachdb#55148, where we start refusing
to take the lease on replicas that are not the leader - so if the leader
is draining, we deadlock.

This patch makes an exception for leaders, which now no longer refuse
the lease even when they're draining. The reasonsing being that it's too
easy to deadlock otherwise.

Release note: None
andreimatei added a commit to andreimatei/cockroach that referenced this pull request Oct 26, 2020
Before this patch, a draining node would not take any new leases once
it's draining. This is a problem in case all replicas of a range are
draining at the same time (e.g. when draining a single-node cluster, or
when draining multiple nodes at the same time perhaps by mistake) -
nobody wants the lease. Particularly because the liveness range is
expiration-based (and thus permanently in need of new leases to be
granted), this quickly results in nodes failing their liveness.
It also becomes more of a problem with cockroachdb#55148, where we start refusing
to take the lease on replicas that are not the leader - so if the leader
is draining, we deadlock.

This patch makes an exception for leaders, which now no longer refuse
the lease even when they're draining. The reasonsing being that it's too
easy to deadlock otherwise.

Release note: None
craig bot pushed a commit that referenced this pull request Oct 27, 2020
55624: kvserver: don't always refuse to take leases when draining r=andreimatei a=andreimatei

Before this patch, a draining node would not take any new leases once
it's draining. This is a problem in case all replicas of a range are
draining at the same time (e.g. when draining a single-node cluster, or
when draining multiple nodes at the same time perhaps by mistake) -
nobody wants the lease. Particularly because the liveness range is
expiration-based (and thus permanently in need of new leases to be
granted), this quickly results in nodes failing their liveness.
It also becomes more of a problem with #55148, where we start refusing
to take the lease on replicas that are not the leader - so if the leader
is draining, we deadlock.

This patch makes an exception for leaders, which now no longer refuse
the lease even when they're draining. The reasonsing being that it's too
easy to deadlock otherwise.

Release note: None

56004: build: add instructions to rectify code that is not regenerated r=irfansharif a=otan

Make it easier for the developer to know how to fix un-generated code by
specifying the required commands.

Release note: None

56016: backfill: remove unused bound account from column backfiller r=yuzefovich a=adityamaru

Previously, we were creating a bound account in the column backfiller
which was not being used to track anything. This change removes that.

Release note: None

Co-authored-by: Andrei Matei <[email protected]>
Co-authored-by: Oliver Tan <[email protected]>
Co-authored-by: Aditya Maru <[email protected]>
Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be ready to go. PTAL at the test.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @bdarnell, @knz, @nvanbenschoten, and @tbg)


pkg/kv/kvserver/client_raft_test.go, line 1054 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

!!!

done


pkg/kv/kvserver/client_raft_test.go, line 1110 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

How long does this test take to run? Does advanceClock actually speed up the election process? I wouldn't expect it to make a replica tick faster.

moved to TestServer. And tweaked some knobs. It takes ~3.5s


pkg/kv/kvserver/client_raft_test.go, line 1157 at r5 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Don't forget this.

done


pkg/kv/kvserver/client_raft_test.go, line 1175 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Did you confirm that this blocked before the introduction of your fix?

yup


pkg/kv/kvserver/replica_proposal_buf.go, line 421 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Yeah, I remember that single-replica groups short-circuit a lot of logic and don't maintain a lot of things that look like invariants in multi-replica groups. I don't remember anything well enough to give you useful advice right now, though.

gone


pkg/kv/kvserver/replica_proposal_buf.go, line 418 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: var leaderKnown, iAmTheLeader bool

done


pkg/kv/kvserver/replica_proposal_buf_test.go, line 74 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

It seems like we should be able to write some tests for this fairly easily.

added test


pkg/kv/kvserver/replica_range_lease.go, line 212 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I'm not sold on this. We want to backport this PR, so we'd like it to be as small as possible. And we're planning on using master as a staging ground for the backport, so we shouldn't merge anything on top that we don't intend to backport in case it hides issues with the portion we do intend to backport.

So I'm going to assume that if this is in this PR, we want to backport it too. If so, then there's a high bar for changes, especially in such subtle areas as this. This is only an optimization, right? Have we seen that it's anything more than theoretical? Have we seen the liveness heartbeats discussed above cause issues? As of 20.2, liveness heartbeats are coalesced, so I'd be surprised if they were.

I'm primarily worried about the additional complexity this adds in exchange for the unclear optimization. The PR itself is already making a potentially destabilizing change, so I'm wary of making any issues that fall out from it harder to debug or increases the chance that we got it wrong.

removed commit

@andreimatei andreimatei force-pushed the kv.lease-acq-forwarding branch from b31e727 to 6d60b64 Compare October 30, 2020 19:45
@andreimatei andreimatei requested a review from a team as a code owner October 30, 2020 19:45
@tbg
Copy link
Member

tbg commented Nov 3, 2020

Test failure: pkg/kv/kvserver/client_raft_test.go:1090:11: this value of err is never used (SA4006)

@nvanbenschoten you've got this PR, right?

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 8 files at r6, 9 of 11 files at r7.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei, @bdarnell, @knz, @nvanbenschoten, and @tbg)


pkg/kv/kvserver/client_raft_test.go, line 1283 at r7 (raw file):

// the lease and then blocking traffic for a long time until they find out
// whether they successfully took the lease or not.
func TestRequestsOnLaggingReplicaXXX(t *testing.T) {

Looks like this test name is off. Did you mean to remove this version with multiTestContext?


pkg/kv/kvserver/replica_proposal_buf.go, line 189 at r7 (raw file):

}

// setTestingKnobs should only be called immediately after Init(); it cannot be

nit: I would skip the unexported setter method and just assign directly like we do for the other testing knobs, but this is also fine.


pkg/kv/kvserver/replica_proposal_buf_test.go, line 74 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

added test

Did you want to add a test to this file? I think that would be the easiest way to unit test this change. It should be pretty satisfying to do.

Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @bdarnell, @knz, @nvanbenschoten, and @tbg)


pkg/kv/kvserver/client_raft_test.go, line 1283 at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Looks like this test name is off. Did you mean to remove this version with multiTestContext?

done


pkg/kv/kvserver/replica_proposal_buf.go, line 189 at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: I would skip the unexported setter method and just assign directly like we do for the other testing knobs, but this is also fine.

done


pkg/kv/kvserver/replica_proposal_buf_test.go, line 74 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Did you want to add a test to this file? I think that would be the easiest way to unit test this change. It should be pretty satisfying to do.

done, PTAL

andreimatei added a commit to andreimatei/cockroach that referenced this pull request Jan 11, 2021
This patch backpedals a little bit on the logic introduced in cockroachdb#55148.
That patch said that, if a leader is known, every other replica refuses
to propose a lease acquisition. Instead, the replica in question
redirects whomever was triggering the lease acquisition to the leader,
thinking that the leader should take the lease.
That patch introduced a deadlock: some replicas refuse to take the lease
because they are not VOTER_FULL (see CheckCanReceiveLease()). To fix the
deadlock, this patch incorporates that check in the proposal buffer's
decision about whether or not to reject a proposal: if the leader is
believed to refuse to take the lease, then we again forward our own
lease request.

An edge case to the edge case is when the leader is not even part of the
proposer's range descriptor. This can happen if the proposer is far
behind. In this case, we assume that the leader is eligible. If it
isn't, the deadlock will resolve once the proposer catches up.

A future patch will relax the conditions under which a replica agrees to
take the lease. VOTER_INCOMING replicas should take the lease.
VOTER_DEMOTING are more controversial.

Fixes cockroachdb#57798

Release note: None
andreimatei added a commit to andreimatei/cockroach that referenced this pull request Jan 14, 2021
This patch backpedals a little bit on the logic introduced in cockroachdb#55148.
That patch said that, if a leader is known, every other replica refuses
to propose a lease acquisition. Instead, the replica in question
redirects whomever was triggering the lease acquisition to the leader,
thinking that the leader should take the lease.
That patch introduced a deadlock: some replicas refuse to take the lease
because they are not VOTER_FULL (see CheckCanReceiveLease()). To fix the
deadlock, this patch incorporates that check in the proposal buffer's
decision about whether or not to reject a proposal: if the leader is
believed to refuse to take the lease, then we again forward our own
lease request.

An edge case to the edge case is when the leader is not even part of the
proposer's range descriptor. This can happen if the proposer is far
behind. In this case, we assume that the leader is eligible. If it
isn't, the deadlock will resolve once the proposer catches up.

A future patch will relax the conditions under which a replica agrees to
take the lease. VOTER_INCOMING replicas should take the lease.
VOTER_DEMOTING are more controversial.

Fixes cockroachdb#57798

Release note: None
andreimatei added a commit to andreimatei/cockroach that referenced this pull request Jan 22, 2021
This patch backpedals a little bit on the logic introduced in cockroachdb#55148.
That patch said that, if a leader is known, every other replica refuses
to propose a lease acquisition. Instead, the replica in question
redirects whomever was triggering the lease acquisition to the leader,
thinking that the leader should take the lease.
That patch introduced a deadlock: some replicas refuse to take the lease
because they are not VOTER_FULL (see CheckCanReceiveLease()). To fix the
deadlock, this patch incorporates that check in the proposal buffer's
decision about whether or not to reject a proposal: if the leader is
believed to refuse to take the lease, then we again forward our own
lease request.

An edge case to the edge case is when the leader is not even part of the
proposer's range descriptor. This can happen if the proposer is far
behind. In this case, we assume that the leader is eligible. If it
isn't, the deadlock will resolve once the proposer catches up.

A future patch will relax the conditions under which a replica agrees to
take the lease. VOTER_INCOMING replicas should take the lease.
VOTER_DEMOTING are more controversial.

Fixes cockroachdb#57798

Release note: None
andreimatei added a commit to andreimatei/cockroach that referenced this pull request Jan 25, 2021
This patch backpedals a little bit on the logic introduced in cockroachdb#55148.
That patch said that, if a leader is known, every other replica refuses
to propose a lease acquisition. Instead, the replica in question
redirects whomever was triggering the lease acquisition to the leader,
thinking that the leader should take the lease.
That patch introduced a deadlock: some replicas refuse to take the lease
because they are not VOTER_FULL (see CheckCanReceiveLease()). To fix the
deadlock, this patch incorporates that check in the proposal buffer's
decision about whether or not to reject a proposal: if the leader is
believed to refuse to take the lease, then we again forward our own
lease request.

An edge case to the edge case is when the leader is not even part of the
proposer's range descriptor. This can happen if the proposer is far
behind. In this case, we assume that the leader is eligible. If it
isn't, the deadlock will resolve once the proposer catches up.

A future patch will relax the conditions under which a replica agrees to
take the lease. VOTER_INCOMING replicas should take the lease.
VOTER_DEMOTING are more controversial.

Fixes cockroachdb#57798

Release note: None
craig bot pushed a commit that referenced this pull request Jan 25, 2021
58722: kvserver: don't refuse to fwd lease proposals in some edge cases r=andreimatei a=andreimatei

This patch backpedals a little bit on the logic introduced in #55148.
That patch said that, if a leader is known, every other replica refuses
to propose a lease acquisition. Instead, the replica in question
redirects whomever was triggering the lease acquisition to the leader,
thinking that the leader should take the lease.
That patch introduced a deadlock: some replicas refuse to take the lease
because they are not VOTER_FULL (see CheckCanReceiveLease()). To fix the
deadlock, this patch incorporates that check in the proposal buffer's
decision about whether or not to reject a proposal: if the leader is
believed to refuse to take the lease, then we again forward our own
lease request.

An edge case to the edge case is when the leader is not even part of the
proposer's range descriptor. This can happen if the proposer is far
behind. In this case, we assume that the leader is eligible. If it
isn't, the deadlock will resolve once the proposer catches up.

A future patch will relax the conditions under which a replica agrees to
take the lease. VOTER_INCOMING replicas should take the lease.
VOTER_DEMOTING are more controversial.

Fixes #57798

Release note: None

59087: util/log: new output format 'crdb-v2' r=itsbilal a=knz

Fixes  #50166. 

This new format intends to address all the known shortcomings with `crdb-v1` while remaining compatible with entry parsers designed for the previous version.
See the user-facing release note below for a summary of changes; and the included reference documentation for details.


Example TTY output with colors:
![image](https://user-images.githubusercontent.com/642886/104824568-261e9380-5853-11eb-9ad9-e5936f0890fd.png)


Example for a single-line unstructured entry.

Before:
```
I210116 22:17:03.736236 57 cli/start.go:681 ⋮ node startup completed:
```

After:
```
I210116 22:17:03.736236 57 cli/start.go:681 ⋮ [-] 12  node startup completed:
              tag field now always included   ^^^
          entry counter now always included       ^^^
```

Example for a multi-line unstructured entry.

Before:
```
I210116 22:15:38.105666 452 gossip/gossip.go:567 ⋮ [n1] 74  gossip status (ok, 1 node‹›)
gossip client (0/3 cur/max conns)
gossip connectivity
  n1 [sentinel];
```

(subsequent lines lack a log entry prefix; hard to determine where
entries start and end)

After:
```
I210116 22:15:38.105666 452 gossip/gossip.go:567 ⋮ [n1] 74  gossip status (ok, 1 node‹›)
I210116 22:15:38.105666 452 gossip/gossip.go:567 ⋮ [n1] 74 +gossip client (0/3 cur/max conns)
I210116 22:15:38.105666 452 gossip/gossip.go:567 ⋮ [n1] 74 +gossip connectivity
I210116 22:15:38.105666 452 gossip/gossip.go:567 ⋮ [n1] 74 +  n1 [sentinel];
  ^^^ common prefix repeated for each msg line
       same entry counter for every subsequent line    ^^^
               continuation marker "+" on susequent lines ^^
```

Example for a structured entry.

Before:
```
I210116 22:14:38.175829 469 util/log/event_log.go:32 ⋮ [n1] Structured entry: {...}
```

After:
```
I210116 22:14:38.175829 469 util/log/event_log.go:32 ⋮ [n1] 21 ={...}
                            entry counter always present    ^^
                      equal sign "=" denotes structured entry ^^
```

Release note (cli change): The default output format for `file-group`
and `stderr` sinks has been changed to `crdb-v2`.

This new format is non-ambiguous and makes it possible to reliably
parse log files. Refer to the format's documentation for
details. Additionally, it prevents single log lines from exceeding a
large size; this problem is inherent to the `crdb-v1` format and can
prevent `cockroach debug zip` from retrieving v1 log files.

The new format has also been designed so that existinglog file
analyzers for the `crdb-v1` format can read entries written the new
format. However, this conversion may be imperfect. Again, refer to
the new format's documentation for details.

In case of incompatibility, users can force the previous format by
using `format: crdb-v1` in their logging configuration.

59141: ui: upgrade admin-ui-components to new dep r=dhartunian a=dhartunian

We renamed the `admin-ui-components` package
to `cluster-ui`.

Release note: None

59388: build,bazel: remove references to `gofmt` in bazel build r=rickystewart a=rickystewart

This was cargo-culted from the `Makefile`, but isn't necessary to get
the build to succeed, and interferes with hermiticity because it
requires `gofmt` to be globally installed. It's simpler to just remove
these references entirely.

Release note: None

Co-authored-by: Andrei Matei <[email protected]>
Co-authored-by: Raphael 'kena' Poss <[email protected]>
Co-authored-by: David Hartunian <[email protected]>
Co-authored-by: Ricky Stewart <[email protected]>
@andreimatei andreimatei deleted the kv.lease-acq-forwarding branch March 15, 2021 16:04
andreimatei added a commit to andreimatei/cockroach that referenced this pull request Mar 22, 2021
…ts""

This reverts commit 49c8337.

This is a revert of a revert, reintroducing the backport of the main
commit from cockroachdb#55148. The backport was reverted because of a deadlock,
which is not being fixed by the next commit.

Release note (bug fix): A bug causing queries sent to a
freshly-restarted node to sometimes hang for a long time while the node
catches up with replication has been fixed.
andreimatei added a commit to andreimatei/cockroach that referenced this pull request Mar 22, 2021
This patch backpedals a little bit on the logic introduced in cockroachdb#55148.
That patch said that, if a leader is known, every other replica refuses
to propose a lease acquisition. Instead, the replica in question
redirects whomever was triggering the lease acquisition to the leader,
thinking that the leader should take the lease.
That patch introduced a deadlock: some replicas refuse to take the lease
because they are not VOTER_FULL (see CheckCanReceiveLease()). To fix the
deadlock, this patch incorporates that check in the proposal buffer's
decision about whether or not to reject a proposal: if the leader is
believed to refuse to take the lease, then we again forward our own
lease request.

An edge case to the edge case is when the leader is not even part of the
proposer's range descriptor. This can happen if the proposer is far
behind. In this case, we assume that the leader is eligible. If it
isn't, the deadlock will resolve once the proposer catches up.

A future patch will relax the conditions under which a replica agrees to
take the lease. VOTER_INCOMING replicas should take the lease.
VOTER_DEMOTING are more controversial.

Fixes cockroachdb#57798

Release note: None
craig bot pushed a commit that referenced this pull request Mar 23, 2021
61982: kvserver: allow lease extensions from followers in propBuf r=nvanbenschoten,andreimatei,tbg a=erikgrinaker

The Raft proposal buffer currently rejects lease acquisition proposals
from followers if there is an eligible leader, to avoid interfering with
it trying to claim the lease. However, this logic unintentionally
rejected lease extension proposals from followers as well, which could
cause a leaseholder to lose its lease.

This patch changes the proposal buffer to allow lease extension
proposals from followers for extension-based leases as long as they
still hold a valid lease.

Resolves #59179. Follow-up to #55148.

Release note: None

Co-authored-by: Erik Grinaker <[email protected]>
aayushshah15 added a commit to aayushshah15/cockroach that referenced this pull request May 1, 2021
Previously, we were only updating the `leases.error` (metric measuring the
number of failed lease requests) if a lease request failed during command
evaluation. However after cockroachdb#55148, lease requests can be rejected by the replica
proposal buffer if there is a known raft leader that is different from the
replica requesting the lease. We were not recording such rejections as failed
lease requests in the metrics.

This commit fixes this oversight.

Release note: None
craig bot pushed a commit that referenced this pull request May 4, 2021
64531: kvserver: fix metrics collection for rejected lease requests r=aayushshah15 a=aayushshah15

Previously, we were only updating the `leases.error` (metric measuring the
number of failed lease requests) if a lease request failed during command
evaluation. However after #55148, lease requests can be rejected by the replica
proposal buffer if there is a known raft leader that is different from the
replica requesting the lease. We were not recording such rejections as failed
lease requests in the metrics.

This commit fixes this oversight.

Release note: None

Co-authored-by: Aayush Shah <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

storage: restarted node in need of snapshots can be wedged for long time
6 participants