Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv: allow DeleteRangeRequests to be pipelined #119975

Merged
merged 2 commits into from
Mar 26, 2024

Conversation

arulajmani
Copy link
Collaborator

Previously, ranged requests could not be pipelined. However, there is no
good reason to not allow them to be pipeliend -- we just have to take
extra care to correctly update in-flight writes tracking on the response
path. We do so now.

As part of this patch, we introduce two new flags -- canPipeline and
canParallelCommit. We use these flags to determine whether batches can
be pipelined or committed using parallel commits. This is in contrast to
before, where we derived this information from other flags
(isIntentWrite, !isRange). This wasn't strictly necessary for this
change, but helps clean up the concepts.

As a consequence of this change, we now have a distinction between
requests that can be pipelined and requests that can be part of a batch
that can be committed in parallel. Notably, this applies to
DeleteRangeRequests -- they can be pipeliend, but not be committed in
parallel. That's because we need to have the entire write set upfront
when performing a parallel commit, lest we need to perform recovery --
we don't have this for DeleteRange requests.

In the future, we'll extend the concept of canPipeline
(and !canParallelCommit) to other locking ranged requests as well. In
particular, (replicated) locking {,Reverse}ScanRequests who want to
pipeline their lock acquisitions.

Closes #64723
Informs #117978

Release note: None

@arulajmani arulajmani requested a review from a team as a code owner March 5, 2024 23:48
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good! Have you tested it out end-to-end with SQL in an MR cluster to show the latency win? You can use the script from #64723.

Reviewed 1 of 1 files at r1, 5 of 5 files at r2, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani)


pkg/kv/kvclient/kvcoord/txn_interceptor_committer.go line 139 at r2 (raw file):

	mu                 sync.Locker
	disable1PC         bool
	disableElideEndTxn bool

I don't understand why this part of the change was needed. Maintaining a separate flag instead of deriving this from the EndTxn request feels like a step backwards. Doing so means that we need to keep the flag in sync with the rest of the txn, lest we accidentally disable elision too much or too little.

For example, on first reading, it looks like there's a bug here where we should reset this flag to false in epochBumpedLocked. That's not actually a bug because we don't want to elide in these cases, but the flag being separate from lockFootprint (which documents this) raises these questions.


pkg/kv/kvclient/kvcoord/txn_interceptor_committer.go line 370 at r2 (raw file):

		req := ru.GetInner()
		switch {
		case kvpb.IsIntentWrite(req):

Should we use CanParallelCommit here?


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go line 462 at r2 (raw file):

		req := ru.GetInner()

		// The current request cannot be pipelined, so it prevents us from

This comment should either start with "Determine whether ..." (with a few other updates) or should be moved within the if block.


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go line 485 at r2 (raw file):

		}

		switch req.Method() {

nit: we'll need to move this above the CanPipeline check if we listen to the comment I made in api.go.


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go line 497 at r2 (raw file):

			// can disable DeleteRange pipelining entirely for requests that set this
			// field to false.
			deleteRangeReq.ReturnKeys = true

We'll need to do this for mixed version clusters, but longer term, should we try to just deprecate this flag and always treat it as set to true? It will be set to true most of the time going forward anyway.


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go line 497 at r2 (raw file):

			// can disable DeleteRange pipelining entirely for requests that set this
			// field to false.
			deleteRangeReq.ReturnKeys = true

Should we add a test that demonstrates that this flag is set to true? Or add it to TestTxnPipelinerDeleteRangeRequests.


pkg/kv/kvpb/api.go line 87 at r2 (raw file):

	isLocking:         {isTxn},
	isIntentWrite:     {isWrite, isLocking},
	canParallelCommit: {canPipeline},

Does canPipeline depend on isIntentWrite?


pkg/kv/kvpb/api.go line 186 at r2 (raw file):

}

// CanPipeline returns true iff the BatchRequest can be pipelined.

s/BatchRequest/command/

Here and below.


pkg/kv/kvpb/api.go line 193 at r2 (raw file):

// CanParallelCommit returns true iff the BatchRequest can be part of a batch
// that is committed in parallel.
func CanParallelCommit(args Request) bool {

Can we use this in place of kvpb.IsIntentWrite(req) && !kvpb.IsRange(req) (twice) in maybeStripInFlightWrites?


pkg/kv/kvpb/api.go line 1669 at r2 (raw file):

	// recovery we need the entire in-flight write set to plop on the txn record,
	// and we don't have that on the request path if the batch contains a
	// DeleteRange request.

Should we only return canPipeline if drr.ReturnKeys?


pkg/kv/kvclient/kvcoord/txn_interceptor_committer_test.go line 75 at r1 (raw file):

			require.Len(t, ba.Requests, 2)
			require.IsType(t, &kvpb.GetRequest{}, ba.Requests[0].GetInner())
			require.IsType(t, &kvpb.PutRequest{}, ba.Requests[1].GetInner())

ScanRequest


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner_test.go line 617 at r2 (raw file):

	require.Equal(t, 3, tp.ifWrites.len())

	// Now, test RefreshRangeRequests, which cannot be pipelined.

RefreshRangeRequest will never go through the txnPipeliner, so I don't think this is worth testing. If you want to test a ranged request, maybe use a ScanRequest?


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner_test.go line 637 at r2 (raw file):

	require.Nil(t, pErr)
	require.NotNil(t, br)
	require.Equal(t, 0, tp.ifWrites.len())

How do we end up with 0 in-flight writes here? We didn't clear them or issue QueryIntent request for them.


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner_test.go line 972 at r2 (raw file):

}

// TestTxnPipelinerDeleteRangeRequests ensures the txnPipelineer correctly

txnPipeliner


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner_test.go line 997 at r2 (raw file):

		br.Txn = ba.Txn
		resp := br.Responses[0].GetInner()
		resp.(*kvpb.DeleteRangeResponse).Keys = []roachpb.Key{keyB}

Since this is the only test where we set DeleteRangeResponse.Keys, let's add a few to demonstrate that all end up in ifWrites and that they all end up with the correct sequence number. See the qiReq1 := ... logic above for inspiration.


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner_test.go line 1010 at r2 (raw file):

	ba = &kvpb.BatchRequest{}
	ba.Header = kvpb.Header{Txn: &txn}
	ba.Add(&kvpb.DeleteRangeRequest{RequestHeader: kvpb.RequestHeader{Key: keyD, EndKey: keyE}})

Let's have this overlap with some (but not all) of the in-flight intent writes from the first request. That will lead to an interesting order of QueryIntentRequests.

Previously, this test was constructing (and testing) an unrealistic
scenario. We should never be eliding EndTxn requests if there is a Put
in the batch; but, because we weren't correctly populating lock spans,
we ended up asserting that we were eliding the request. We now switch to
using a ScanRequest in here instead.

Epic: none

Release note: None
@arulajmani arulajmani force-pushed the pipeline-delete-range-requests branch from 3d8ea25 to 63270dd Compare March 25, 2024 21:55
Copy link
Collaborator Author

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still need to look into/fix TestTxnPipelinerRejectAboveBudget, but I addressed all your review comments, beefed up some testing, and I think I have all other tests passing.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/kv/kvclient/kvcoord/txn_interceptor_committer.go line 139 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I don't understand why this part of the change was needed. Maintaining a separate flag instead of deriving this from the EndTxn request feels like a step backwards. Doing so means that we need to keep the flag in sync with the rest of the txn, lest we accidentally disable elision too much or too little.

For example, on first reading, it looks like there's a bug here where we should reset this flag to false in epochBumpedLocked. That's not actually a bug because we don't want to elide in these cases, but the flag being separate from lockFootprint (which documents this) raises these questions.

We discussed this offline. The main motivation here was that TestCommitMutatingTransaction was failing without this. But that turned out to be because I hadn't debugged the failure properly, so this change can be reverted.

I beefed up TestCommitMutatingTransaction to include cases where the EndTxnRequest is part of the same batch as the mutating request as well. The DeleteRange variant of that is interesting because it exercises the logic in attachLocksToEndTxn that you pointed me to earlier. I also added a new test for no-op DeleteRange requests, as those can elide EndTxn requests in some cases.


pkg/kv/kvclient/kvcoord/txn_interceptor_committer.go line 370 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Should we use CanParallelCommit here?

We should be, done. Thanks for catching.


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go line 462 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

This comment should either start with "Determine whether ..." (with a few other updates) or should be moved within the if block.

Done.


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go line 497 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

We'll need to do this for mixed version clusters, but longer term, should we try to just deprecate this flag and always treat it as set to true? It will be set to true most of the time going forward anyway.

That makes sense, I added a TODO.

TODO(arul): open an issue about deprecating this. Maybe we can even try and get it into this release.


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go line 497 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Should we add a test that demonstrates that this flag is set to true? Or add it to TestTxnPipelinerDeleteRangeRequests.

Added it to TestTxnPipelinerDeleteRangeRequests.


pkg/kv/kvpb/api.go line 87 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Does canPipeline depend on isIntentWrite?

You're right, it does as of this PR though this will change when we start pipelining replicated lock acquisitions. For now, I've added it here -- later, we'll probably want something like:

var var flagDependencies = map[flag][]flag{
...
	canPipeline:       {isLocking},
	canParallelCommit: {canPipeline, isIntentWrite},
...
}

pkg/kv/kvpb/api.go line 193 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Can we use this in place of kvpb.IsIntentWrite(req) && !kvpb.IsRange(req) (twice) in maybeStripInFlightWrites?

Done. I'm 50-50 on what's more understandable, so let me know if we want to keep this change or not.


pkg/kv/kvpb/api.go line 1669 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Should we only return canPipeline if drr.ReturnKeys?

Done.


pkg/kv/kvclient/kvcoord/txn_interceptor_committer_test.go line 75 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

ScanRequest

🤦


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner_test.go line 617 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

RefreshRangeRequest will never go through the txnPipeliner, so I don't think this is worth testing. If you want to test a ranged request, maybe use a ScanRequest?

Done.


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner_test.go line 637 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

How do we end up with 0 in-flight writes here? We didn't clear them or issue QueryIntent request for them.

Meant to update this to 2; the test was failing earlier.


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner_test.go line 997 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Since this is the only test where we set DeleteRangeResponse.Keys, let's add a few to demonstrate that all end up in ifWrites and that they all end up with the correct sequence number. See the qiReq1 := ... logic above for inspiration.

Done.


pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner_test.go line 1010 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Let's have this overlap with some (but not all) of the in-flight intent writes from the first request. That will lead to an interesting order of QueryIntentRequests.

Good call, done.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: once CI is green.

Reviewed 9 of 9 files at r4, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @arulajmani)


pkg/kv/kvclient/kvcoord/txn_coord_sender_test.go line 2001 at r4 (raw file):

		}

		if _, ok := ba.GetArg(kvpb.DeleteRange); ok {

nit: consider moving above if et, ok := ..., because DeleteRange will come earlier in the batch.


pkg/kv/kvclient/kvcoord/txn_coord_sender_test.go line 2112 at r4 (raw file):

			db := kv.NewDB(log.MakeTestingAmbientCtxWithNewTracer(), factory, clock, stopper)
			if err := db.Txn(ctx, test.f); err != nil {
				t.Fatalf("%d: unexpected error on commit: %s", i, err)

nit: require.NoError, while we're here.


pkg/kv/kvclient/kvcoord/txn_coord_sender_test.go line 2119 at r4 (raw file):

			}
			expectedCalls = append(expectedCalls, kvpb.EndTxn)
			if !reflect.DeepEqual(expectedCalls, calls) {

nit: require.Equal


pkg/kv/kvclient/kvcoord/txn_coord_sender_test.go line 2203 at r4 (raw file):

			}
			if !reflect.DeepEqual(expectedCalls, calls) {
				t.Fatalf("%d: expected %s, got %s", i, expectedCalls, calls)

Same point about using require.

Previously, ranged requests could not be pipelined. However, there is no
good reason to not allow them to be pipeliend -- we just have to take
extra care to correctly update in-flight writes tracking on the response
path. We do so now.

As part of this patch, we introduce two new flags -- canPipeline and
canParallelCommit. We use these flags to determine whether batches can
be pipelined or committed using parallel commits. This is in contrast to
before, where we derived this information from other flags
(isIntentWrite, !isRange). This wasn't strictly necessary for this
change, but helps clean up the concepts.

As a consequence of this change, we now have a distinction between
requests that can be pipelined and requests that can be part of a batch
that can be committed in parallel. Notably, this applies to
DeleteRangeRequests -- they can be pipeliend, but not be committed in
parallel. That's because we need to have the entire write set upfront
when performing a parallel commit, lest we need to perform recovery --
we don't have this for DeleteRange requests.

In the future, we'll extend the concept of canPipeline
(and !canParallelCommit) to other locking ranged requests as well. In
particular, (replicated) locking {,Reverse}ScanRequests who want to
pipeline their lock acquisitions.

Closes cockroachdb#64723
Informs cockroachdb#117978

Release note: None
@arulajmani arulajmani force-pushed the pipeline-delete-range-requests branch from 63270dd to ce93193 Compare March 25, 2024 23:29
@arulajmani arulajmani requested a review from a team as a code owner March 25, 2024 23:29
Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 3 of 3 files at r5, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @arulajmani)

Copy link
Collaborator Author

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed comments from the last round and fixed TestTxnPipelinerRejectAboveBudget. Ended up adding a ScanRequest variant and a DeleteRangeRequest variant, as we're now handling what we track for ranged locking requests differently for each.

Separately, I applied a bandaid on TestIntentResolution but I think we should just get rid of it. I'll send out a followup to get your thoughts on in a bit.

TFTRs!

bors r=nvanbenschoten

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @arulajmani)

@craig
Copy link
Contributor

craig bot commented Mar 26, 2024

@craig craig bot merged commit 7db4486 into cockroachdb:master Mar 26, 2024
21 of 22 checks passed
craig bot pushed a commit that referenced this pull request Apr 3, 2024
121088: kv: pipeline replicated lock acquisition r=nvanbenschoten a=nvanbenschoten

Fixes #117978.

Builds upon the foundation laid in [#119975](#119975), [#119933](#119933), [#121065](#121065), and [#121086](#121086).

This commit completes the client-side handling of replicated lock acquisition pipelining. Replicated lock acquisition through `Get`, `Scan`, and `ReverseScan` requests now qualifies to be pipelined. The `txnPipeliner` is updated to track the strength associated with each in-flight write and pass that along to the corresponding `QueryIntentRequest`.

See [benchmark with TPC-C results here](#121088 (comment)).

Release note: None

Co-authored-by: Nathan VanBenschoten <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kv: pipeline DeleteRange requests where possible
3 participants