Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv: clarify the Sender contract and pre-allocate commit batches #30485

Merged
merged 3 commits into from
Sep 27, 2018

Conversation

andreimatei
Copy link
Contributor

@andreimatei andreimatei commented Sep 20, 2018

See individual commits.

Release note: None

@andreimatei andreimatei requested a review from a team September 20, 2018 23:41
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@andreimatei
Copy link
Contributor Author

Tobi, putting this up on the flag pole to see if the cat licks it up.

cc @nvanbenschoten

Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm with you in spirit, and I'm glad the time has come where such reuse is actually important for performance, but I'm afraid that turning it off and going wild-west on the whole thing is going to cost us dearly in obscure bugs. Let's figure out what the contracts should be (to make the optimizations most powerful) and then let's think about the best way to enforce them (I hope that in the process we can get rid of this transport, which is quite expensive and slows down race a bunch).

The race transport doesn't only exist because we might be manipulating data on the server (in the local case). It also exists because grpc might be reading the data even after the call came back (or so we assume). In the spirit of there are no benign data races, it follows that unless you can prove that what you gave to gRPC is no longer going to be accessed by it, you don't want to mutate it. I haven't looked at what grpc does with the passed-in message exactly. Ideally it would encode it before it does anything async and then we're home free and can forget about the race transport for that purpose.

Then what's left is finding better ways to enforce the guarantees we want in our code.

For example,

Server side code is currently not allowed to mutate objects referenced by a Batch.

Can (mostly) be caught by hashing the batch on the way in and on the way out and comparing at the (Node).Batch endpoint (unless the requests gets passed elsewhere out-of-band, but that doesn't happen often and we can audit it). But I agree that we're probably fairly good here, though that too may change as we move objects between the heap and the stack (and so what was a shallow copy cedes to be one).

Ideally I think we get to a place where the incoming request (incl. the Txn proto) is treated immutable at the node (i.e. below DistSender) and where we feel comfortable that it's not passed around across goroutine boundaries anywhere in DistSender or above (and so can be pooled and reused aggressively).

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained

@andreimatei andreimatei changed the title kv: allow shallow mutations of requests in a batch kv: clarify the Sender contract and relax the race transport Sep 21, 2018
Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into grpc... At first it looked pretty convincing that there no funny business and once the thing returns from an Invoke() call, there's no referencing a request in the background or anything. But then I realized that I was looking at an older version. In the current version it's a bit harder to assert anything; it's not the most traceable code. But I didn't see any clues that there might be monkey business, and thinking about it I'm confident enough that it's all fine - like, it order for it to not be fine, the code would really have to try explicitly to screw you.

I've added better comments as per our convo to the Sender iface. PTAL.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained

Copy link
Contributor

@bdarnell bdarnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's reasonable to assume that grpc won't touch the input proto once the client method has returned. But I'm not sure about making that same assumption about DistSender. Couldn't tasks started by sendPartialBatchAsync still be running when DistSender.Send returns?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained

@andreimatei
Copy link
Contributor Author

Ben, I think your concern is valid. I hadn't seen this on prior inspection but, upon more looking, I think the following early return from the DistSender puts a wrench in the plans:

Looks like we return without waiting for all the sub-batches when we error to "combine" two responses. How legitimate this errors are, I don't know. I'll see if I can get rid of them, or otherwise make the DistSender wait for all the async operations. Unless you think that's morally wrong?

So I guess in the meantime I'll take the relevant commit out of #30448.

@bdarnell
Copy link
Contributor

Errors from combine are rare to non-existent in practice, so it ought to be possible to eliminate that early return. I think this is probably a good direction to pursue, but the guarantees involved are pretty subtle. I'm worried that we could miss something here or re-introduce problems later.

@andreimatei andreimatei force-pushed the batch-race branch 2 times, most recently from 81e441b to e2c0dda Compare September 24, 2018 22:06
@andreimatei andreimatei changed the title kv: clarify the Sender contract and relax the race transport kv: clarify the Sender contract and pre-allocate commit batches Sep 24, 2018
@andreimatei
Copy link
Contributor Author

I got rid of those combine errors.
I'd like to move forward with this change and the general push for batch reuse, but I won't backport.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 5 files at r1, 1 of 3 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/internal/client/txn.go, line 41 at r3 (raw file):

	typ TxnType

	// alloc holds pre-allocated fields that can be used to avoid heap allocations

Nice! The only thing I'll point out is that this kind of optimization is even more impactful when we start pooling the Txn object itself. Let's leave a TODO to do that, along with the TxnCoordSender. They're both huge objects that would benefit a lot from being pooled. Pooling them would also force us to clean up their lifetimes, which is a good idea on its own.


pkg/internal/client/txn.go, line 44 at r3 (raw file):

	// for batches with lifetimes tied to a Txn.
	alloc struct {
		requests [4]roachpb.RequestUnion

We're only ever using a single element out of this, so let's give this a length of 1.


pkg/roachpb/api.go, line 246 at r1 (raw file):

	}
	if rh.ResumeSpan != nil {
		log.Fatalf(ctx, "combining %+v with %+v", rh.ResumeSpan, otherRH.ResumeSpan)

I don't know about this. We shouldn't fatal if we get something we don't like back from a different node in the cluster. Why don't we just change the defer in divideAndSendBatchToRange to treat the error like any other error? It can return it later without immediately breaking from the loop.

Copy link
Contributor

@bdarnell bdarnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 5 files at r1, 3 of 3 files at r2, 1 of 1 files at r3.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/roachpb/api.go, line 246 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I don't know about this. We shouldn't fatal if we get something we don't like back from a different node in the cluster. Why don't we just change the defer in divideAndSendBatchToRange to treat the error like any other error? It can return it later without immediately breaking from the loop.

I agree. Changing a panic to a fatal here doesn't hurt anything, but it's moving in the wrong direction to force us to fatal instead of returning an error. It would be better to keep the error return from combine() and just delay the return until everything is done.

Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/roachpb/api.go, line 246 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

I agree. Changing a panic to a fatal here doesn't hurt anything, but it's moving in the wrong direction to force us to fatal instead of returning an error. It would be better to keep the error return from combine() and just delay the return until everything is done.

Before I change everything back, I'd like to litigate this more.
Ben, the position you've expressed when we were talking before seems inconsistent to me: you said you're fine with assertions about the behavior of a single process, but not so much about what is received from a different node. And you said that this policy preference is not necessarily related to different node version mismatches. But generally it seems to me like the only sane thing to do is have the same policy across nodes that we have within a single process. Like, we make a million assumptions about what another node will return; we can't accept any arbitrary data and trying to tolerate programming errors on another node that we don't tolerate locally... I don't see why we'd do that. If a remote node says this is the data at this key, we take it at face value. We don't assert that it returned data from the key we requested, and such.
What gives?

@bdarnell
Copy link
Contributor

I'm just coming from the general principle that any crash that can be triggered by data received over the network is a bug (not necessarily a high-priority bug, but a bug nonetheless). If the existing code didn't have an error return I wouldn't be advocating for adding one at this point, but since it's there removing it seems like a step in the wrong direction.

I think it's worth a little extra effort to keep the error return from combine() and modify the loop to wait for all results regardless of error. But it's not worth a lot of extra effort. I was also mainly reacting to Nathan's comment without looking too closely at the particular panic here. If combining ResumeSpans and Errors are the only way combine can fail, then I'm not too worried about the panics.

Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/roachpb/api.go, line 246 at r1 (raw file):
Pasting Ben's response here:

I'm just coming from the general principle that any crash that can be triggered by data received over the network is a bug (not necessarily a high-priority bug, but a bug nonetheless). If the existing code didn't have an error return I wouldn't be advocating for adding one at this point, but since it's there removing it seems like a step in the wrong direction.

I think it's worth a little extra effort to keep the error return from combine() and modify the loop to wait for all results regardless of error. But it's not worth a lot of extra effort. I was also mainly reacting to Nathan's comment without looking too closely at the particular panic here. If combining ResumeSpans and Errors are the only way combine can fail, then I'm not too worried about the panics.

Thanks. I'd still be inclined to debate what exactly you're thinking when you say "crash that can be triggered by data received over the network is a bug" - like, if we try to read the value of a sequence and we get back a string, would it be a "bug" if we crashed? Or an inconsistent lease? I don't think you want to argue that, so you probably want to refine your statement to only refer to some response "metadata". But even then, do you think that a more precise description of what you want to express can be phrased, and still be general enough that I can tattoo it on my chest?

We don't need to have a general discussion if you don't want to; we can discuss this specific.

  • combining something with a ResumeSpan was a panic; now it's a fatal. I think this indicates a situation where the gateway split a batch that it shouldn't have.
  • combining a response that had an error was an error. Now it's a Fatal. This indicates a bug in the DistSender's logic that's supposed to strip these errors before combining anything.
  • combining two "non-combinable" was an error. The patch makes it a fatal. "Non-combinable" means we were trying to combine a, say, ScanResponse with a PutResponse. Even before the patch, trying to combine a ScanResponse with a DelRangeResponse (where they're both "combinable") would be crash. This indicates a bug in how the gateway calculates the positions of the responses it combines or, I guess, a server-side bug where we rearrange the responses to individual constituents of a batch.

So, given these specific changes, do you object to the two new fatals introduced?

@bdarnell
Copy link
Contributor

if we try to read the value of a sequence and we get back a string, would it be a "bug" if we crashed? Or an inconsistent lease? I don't think you want to argue that, so you probably want to refine your statement to only refer to some response "metadata".

I do want to argue that. We should strive to return an error instead of panicking for things like this. Data-dependent crashes are nearly always bugs (sometimes, like at startup, it really doesn't make sense to continue if we can't read what's on the disk, but ideally even that would be handled by returning an error up the stack).

So, given these specific changes, do you object to the two new fatals introduced?

Not enough to keep arguing about it.

@nvanbenschoten
Copy link
Member

+1 to @bdarnell's stance here. My general rule of thumb is that errors that can be scoped to a single request should be.

... before returning from DistSender.Send(). It had an obscure error
case when it wasn't waiting for all.
This waiting starts being mandated by the next commits.

Release note: None
This patch clarifies the transport's contract wrt ownership and lifetime
of the request. In particular, it makes it clear that the client is
allowed to mutate the request after the the Send() code completed.
The race transport is then relaxed a bit to allow some specific
mutations.

Release note: None
When commiting, the client.Txn used to allocate a slice of requests
(inside a BatchRequest), a RequestUnion, and an EndTransactionRequest.
This shows up on profiles of a single node read-heavy workload.
This patch moves to pre-allocating these inside a Txn. This works since
there's no concurrent commits on a single txn.

Release note: None
Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reintroduced the combine errors (and converted the panic we had to an error too).
PTAL.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/internal/client/txn.go, line 41 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Nice! The only thing I'll point out is that this kind of optimization is even more impactful when we start pooling the Txn object itself. Let's leave a TODO to do that, along with the TxnCoordSender. They're both huge objects that would benefit a lot from being pooled. Pooling them would also force us to clean up their lifetimes, which is a good idea on its own.

Done.


pkg/internal/client/txn.go, line 44 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

We're only ever using a single element out of this, so let's give this a length of 1.

Done.


pkg/roachpb/api.go, line 246 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

Pasting Ben's response here:

I'm just coming from the general principle that any crash that can be triggered by data received over the network is a bug (not necessarily a high-priority bug, but a bug nonetheless). If the existing code didn't have an error return I wouldn't be advocating for adding one at this point, but since it's there removing it seems like a step in the wrong direction.

I think it's worth a little extra effort to keep the error return from combine() and modify the loop to wait for all results regardless of error. But it's not worth a lot of extra effort. I was also mainly reacting to Nathan's comment without looking too closely at the particular panic here. If combining ResumeSpans and Errors are the only way combine can fail, then I'm not too worried about the panics.

Thanks. I'd still be inclined to debate what exactly you're thinking when you say "crash that can be triggered by data received over the network is a bug" - like, if we try to read the value of a sequence and we get back a string, would it be a "bug" if we crashed? Or an inconsistent lease? I don't think you want to argue that, so you probably want to refine your statement to only refer to some response "metadata". But even then, do you think that a more precise description of what you want to express can be phrased, and still be general enough that I can tattoo it on my chest?

We don't need to have a general discussion if you don't want to; we can discuss this specific.

  • combining something with a ResumeSpan was a panic; now it's a fatal. I think this indicates a situation where the gateway split a batch that it shouldn't have.
  • combining a response that had an error was an error. Now it's a Fatal. This indicates a bug in the DistSender's logic that's supposed to strip these errors before combining anything.
  • combining two "non-combinable" was an error. The patch makes it a fatal. "Non-combinable" means we were trying to combine a, say, ScanResponse with a PutResponse. Even before the patch, trying to combine a ScanResponse with a DelRangeResponse (where they're both "combinable") would be crash. This indicates a bug in how the gateway calculates the positions of the responses it combines or, I guess, a server-side bug where we rearrange the responses to individual constituents of a batch.

So, given these specific changes, do you object to the two new fatals introduced?

This conversation had moved to the top level PR comments.

Copy link
Contributor

@bdarnell bdarnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 8 files at r4, 3 of 3 files at r5, 1 of 1 files at r6.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm_strong:

Reviewed 8 of 8 files at r4, 2 of 3 files at r5, 1 of 1 files at r6.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained

Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTRs

bors r+

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained

@craig
Copy link
Contributor

craig bot commented Sep 27, 2018

Timed out

@andreimatei
Copy link
Contributor Author

andreimatei commented Sep 27, 2018 via email

craig bot pushed a commit that referenced this pull request Sep 27, 2018
30485: kv: clarify the Sender contract and pre-allocate commit batches r=andreimatei a=andreimatei

See individual commits.

Release note: None

Co-authored-by: Andrei Matei <[email protected]>
@craig
Copy link
Contributor

craig bot commented Sep 27, 2018

Build succeeded

@craig craig bot merged commit 06c1adf into cockroachdb:master Sep 27, 2018
andreimatei added a commit to andreimatei/cockroach that referenced this pull request Sep 28, 2018
This reverts commit 06c1adf.

Revert the last commit from cockroachdb#30485 - the one that pre-allocation and
possible reuse of EndTransaction batches. It's causing races.
I'll figure it out and re-send the original patch.
Touches cockroachdb#30769

Release note: None
craig bot pushed a commit that referenced this pull request Sep 28, 2018
30773: Revert "client: don't allocate some commit batches" r=andreimatei a=andreimatei

This reverts commit 06c1adf.

Revert the last commit from #30485 - the one that pre-allocation and
possible reuse of EndTransaction batches. It's causing races.
I'll figure it out and re-send the original patch.
Touches #30769

Release note: None

Co-authored-by: Andrei Matei <[email protected]>
@andreimatei andreimatei deleted the batch-race branch October 2, 2018 18:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants