Merge #35261 · cockroachdb/cockroach@cf449c6

Commit

Merge #35261

35261: storage: Improve handling of lease index retries r=tbg,nvanbenschoten a=bdarnell

Pipelined writes wait for commands to be evaluated, then detach before
they apply. Retryable errors generated at evaluation time are handled
correctly (by DistSender or above) even for pipelined writes, but
pipelined writes have lost the ability to handle apply-time retry
conditions in the executeWriteBatch loop (there used to be more of
these, but illegal lease index is now the only condition retried at
this level now). To remedy this, this commit moves reproposals due to
illegal lease indexes below raft (but only on the proposing node)

In practice, this can happen to writes that race with a raft
leadership transfer. This is observable immediately after table
creation, since table creation performs a split, and then may perform
a lease transfer to balance load (and if the lease is transfer, raft
leadership is transferred afterward).

Specifically,
1. Lease is transferred from store s1 to s2. s1 is still raft leader.
2. Write w1 evaluates on store s2 and is assigned lease index i1. s2
forwards the proposal to s1.
3. s1 initiates raft leader transfer to s2. This puts it into a
temporarily leaderless state so it drops the forwarded proposal.
4. s2 is elected raft leader, completing the transfer.
5. A second write w2 evalutes on s2, is assigned lease index i2, and
goes right in the raft log since s2 is both leaseholder and leader.
6. s2 refreshes proposals as a side effect of becoming leader, and
writes w1 to the log with lease index i1.
7. s2 applies w2, then w1. w1 fails because of the out of order lease
index.

If w1 was pipelined, the client stops listening after step 2, and
won't learn of the failure until it tries to commit. At this point the
commit returns an ASYNC_WRITE_FAILURE retry to the client.

Note that in some cases, NotLeaseHolderError can be generated at apply
time. These errors cannot be retried (since the proposer has lost its
lease) so they will unavoidably result in an ASYNC_WRITE_FAILURE error
to the client. However, this is uncommon - most NotLeaseHolderErrors
are generated at evaluation time, which is compatible with pipelined
writes.
Fixes #28876

The fourth commit is the important part of this change. The fifth is optional; it's a simplification but it changes an under-tested error path. Assuming we like the fifth commit, I'd like to add a sixth that rips out proposalReevaluationReason and the executeWriteBatch/tryExecuteWriteBatch loop altogether. 

Co-authored-by: Ben Darnell <[email protected]>

Loading branch information

craig[bot] and bdarnell committed Mar 19, 2019

2 parents 01f9662 + f7dc847 commit cf449c6

pkg/storage/batcheval/cmd_lease_test.go

pkg/storage/replica.go

pkg/storage/replica_proposal.go

pkg/storage/replica_raft.go

pkg/storage/replica_read.go

pkg/storage/replica_test.go

pkg/storage/replica_write.go

pkg/storage/store_test.go

pkg/storage/track_raft_protos.go

0 comments on commit `cf449c6`

Please sign in to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `cf449c6`

Commit

There are no files selected for viewing

0 comments on commit cf449c6

0 comments on commit `cf449c6`