-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kv: clarify the Sender contract and pre-allocate commit batches #30485
Conversation
Tobi, putting this up on the flag pole to see if the cat licks it up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm with you in spirit, and I'm glad the time has come where such reuse is actually important for performance, but I'm afraid that turning it off and going wild-west on the whole thing is going to cost us dearly in obscure bugs. Let's figure out what the contracts should be (to make the optimizations most powerful) and then let's think about the best way to enforce them (I hope that in the process we can get rid of this transport, which is quite expensive and slows down race a bunch).
The race transport doesn't only exist because we might be manipulating data on the server (in the local case). It also exists because grpc might be reading the data even after the call came back (or so we assume). In the spirit of there are no benign data races, it follows that unless you can prove that what you gave to gRPC is no longer going to be accessed by it, you don't want to mutate it. I haven't looked at what grpc does with the passed-in message exactly. Ideally it would encode it before it does anything async and then we're home free and can forget about the race transport for that purpose.
Then what's left is finding better ways to enforce the guarantees we want in our code.
For example,
Server side code is currently not allowed to mutate objects referenced by a Batch.
Can (mostly) be caught by hashing the batch on the way in and on the way out and comparing at the (Node).Batch
endpoint (unless the requests gets passed elsewhere out-of-band, but that doesn't happen often and we can audit it). But I agree that we're probably fairly good here, though that too may change as we move objects between the heap and the stack (and so what was a shallow copy cedes to be one).
Ideally I think we get to a place where the incoming request (incl. the Txn proto) is treated immutable at the node (i.e. below DistSender) and where we feel comfortable that it's not passed around across goroutine boundaries anywhere in DistSender or above (and so can be pooled and reused aggressively).
Reviewable status: complete! 0 of 0 LGTMs obtained
1eb4e63
to
c876f8e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked into grpc... At first it looked pretty convincing that there no funny business and once the thing returns from an Invoke()
call, there's no referencing a request in the background or anything. But then I realized that I was looking at an older version. In the current version it's a bit harder to assert anything; it's not the most traceable code. But I didn't see any clues that there might be monkey business, and thinking about it I'm confident enough that it's all fine - like, it order for it to not be fine, the code would really have to try explicitly to screw you.
I've added better comments as per our convo to the Sender iface. PTAL.
Reviewable status: complete! 0 of 0 LGTMs obtained
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's reasonable to assume that grpc won't touch the input proto once the client method has returned. But I'm not sure about making that same assumption about DistSender. Couldn't tasks started by sendPartialBatchAsync
still be running when DistSender.Send
returns?
Reviewable status: complete! 0 of 0 LGTMs obtained
Ben, I think your concern is valid. I hadn't seen this on prior inspection but, upon more looking, I think the following early return from the cockroach/pkg/kv/dist_sender.go Line 811 in 8886bf1
Looks like we return without waiting for all the sub-batches when we error to "combine" two responses. How legitimate this errors are, I don't know. I'll see if I can get rid of them, or otherwise make the So I guess in the meantime I'll take the relevant commit out of #30448. |
Errors from |
81e441b
to
e2c0dda
Compare
I got rid of those combine errors. |
6c6c382
to
d5b3b2e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 5 files at r1, 1 of 3 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained
pkg/internal/client/txn.go, line 41 at r3 (raw file):
typ TxnType // alloc holds pre-allocated fields that can be used to avoid heap allocations
Nice! The only thing I'll point out is that this kind of optimization is even more impactful when we start pooling the Txn
object itself. Let's leave a TODO to do that, along with the TxnCoordSender. They're both huge objects that would benefit a lot from being pooled. Pooling them would also force us to clean up their lifetimes, which is a good idea on its own.
pkg/internal/client/txn.go, line 44 at r3 (raw file):
// for batches with lifetimes tied to a Txn. alloc struct { requests [4]roachpb.RequestUnion
We're only ever using a single element out of this, so let's give this a length of 1.
pkg/roachpb/api.go, line 246 at r1 (raw file):
} if rh.ResumeSpan != nil { log.Fatalf(ctx, "combining %+v with %+v", rh.ResumeSpan, otherRH.ResumeSpan)
I don't know about this. We shouldn't fatal if we get something we don't like back from a different node in the cluster. Why don't we just change the defer in divideAndSendBatchToRange
to treat the error like any other error? It can return it later without immediately breaking from the loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 5 files at r1, 3 of 3 files at r2, 1 of 1 files at r3.
Reviewable status: complete! 0 of 0 LGTMs obtained
pkg/roachpb/api.go, line 246 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I don't know about this. We shouldn't fatal if we get something we don't like back from a different node in the cluster. Why don't we just change the defer in
divideAndSendBatchToRange
to treat the error like any other error? It can return it later without immediately breaking from the loop.
I agree. Changing a panic to a fatal here doesn't hurt anything, but it's moving in the wrong direction to force us to fatal instead of returning an error. It would be better to keep the error return from combine()
and just delay the return until everything is done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained
pkg/roachpb/api.go, line 246 at r1 (raw file):
Previously, bdarnell (Ben Darnell) wrote…
I agree. Changing a panic to a fatal here doesn't hurt anything, but it's moving in the wrong direction to force us to fatal instead of returning an error. It would be better to keep the error return from
combine()
and just delay the return until everything is done.
Before I change everything back, I'd like to litigate this more.
Ben, the position you've expressed when we were talking before seems inconsistent to me: you said you're fine with assertions about the behavior of a single process, but not so much about what is received from a different node. And you said that this policy preference is not necessarily related to different node version mismatches. But generally it seems to me like the only sane thing to do is have the same policy across nodes that we have within a single process. Like, we make a million assumptions about what another node will return; we can't accept any arbitrary data and trying to tolerate programming errors on another node that we don't tolerate locally... I don't see why we'd do that. If a remote node says this is the data at this key, we take it at face value. We don't assert that it returned data from the key we requested, and such.
What gives?
I'm just coming from the general principle that any crash that can be triggered by data received over the network is a bug (not necessarily a high-priority bug, but a bug nonetheless). If the existing code didn't have an I think it's worth a little extra effort to keep the error return from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained
pkg/roachpb/api.go, line 246 at r1 (raw file):
Pasting Ben's response here:
I'm just coming from the general principle that any crash that can be triggered by data received over the network is a bug (not necessarily a high-priority bug, but a bug nonetheless). If the existing code didn't have an error return I wouldn't be advocating for adding one at this point, but since it's there removing it seems like a step in the wrong direction.
I think it's worth a little extra effort to keep the error return from combine() and modify the loop to wait for all results regardless of error. But it's not worth a lot of extra effort. I was also mainly reacting to Nathan's comment without looking too closely at the particular panic here. If combining ResumeSpans and Errors are the only way combine can fail, then I'm not too worried about the panics.
Thanks. I'd still be inclined to debate what exactly you're thinking when you say "crash that can be triggered by data received over the network is a bug" - like, if we try to read the value of a sequence and we get back a string, would it be a "bug" if we crashed? Or an inconsistent lease? I don't think you want to argue that, so you probably want to refine your statement to only refer to some response "metadata". But even then, do you think that a more precise description of what you want to express can be phrased, and still be general enough that I can tattoo it on my chest?
We don't need to have a general discussion if you don't want to; we can discuss this specific.
- combining something with a
ResumeSpan
was a panic; now it's a fatal. I think this indicates a situation where the gateway split a batch that it shouldn't have. - combining a response that had an error was an error. Now it's a Fatal. This indicates a bug in the DistSender's logic that's supposed to strip these errors before combining anything.
- combining two "non-combinable" was an error. The patch makes it a fatal. "Non-combinable" means we were trying to combine a, say,
ScanResponse
with aPutResponse
. Even before the patch, trying to combine aScanResponse
with aDelRangeResponse
(where they're both "combinable") would be crash. This indicates a bug in how the gateway calculates the positions of the responses it combines or, I guess, a server-side bug where we rearrange the responses to individual constituents of a batch.
So, given these specific changes, do you object to the two new fatals introduced?
I do want to argue that. We should strive to return an error instead of panicking for things like this. Data-dependent crashes are nearly always bugs (sometimes, like at startup, it really doesn't make sense to continue if we can't read what's on the disk, but ideally even that would be handled by returning an error up the stack).
Not enough to keep arguing about it. |
+1 to @bdarnell's stance here. My general rule of thumb is that errors that can be scoped to a single request should be. |
... before returning from DistSender.Send(). It had an obscure error case when it wasn't waiting for all. This waiting starts being mandated by the next commits. Release note: None
This patch clarifies the transport's contract wrt ownership and lifetime of the request. In particular, it makes it clear that the client is allowed to mutate the request after the the Send() code completed. The race transport is then relaxed a bit to allow some specific mutations. Release note: None
d5b3b2e
to
755d2be
Compare
When commiting, the client.Txn used to allocate a slice of requests (inside a BatchRequest), a RequestUnion, and an EndTransactionRequest. This shows up on profiles of a single node read-heavy workload. This patch moves to pre-allocating these inside a Txn. This works since there's no concurrent commits on a single txn. Release note: None
755d2be
to
06c1adf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reintroduced the combine errors (and converted the panic we had to an error too).
PTAL.
Reviewable status: complete! 0 of 0 LGTMs obtained
pkg/internal/client/txn.go, line 41 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Nice! The only thing I'll point out is that this kind of optimization is even more impactful when we start pooling the
Txn
object itself. Let's leave a TODO to do that, along with the TxnCoordSender. They're both huge objects that would benefit a lot from being pooled. Pooling them would also force us to clean up their lifetimes, which is a good idea on its own.
Done.
pkg/internal/client/txn.go, line 44 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
We're only ever using a single element out of this, so let's give this a length of 1.
Done.
pkg/roachpb/api.go, line 246 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
Pasting Ben's response here:
I'm just coming from the general principle that any crash that can be triggered by data received over the network is a bug (not necessarily a high-priority bug, but a bug nonetheless). If the existing code didn't have an error return I wouldn't be advocating for adding one at this point, but since it's there removing it seems like a step in the wrong direction.
I think it's worth a little extra effort to keep the error return from combine() and modify the loop to wait for all results regardless of error. But it's not worth a lot of extra effort. I was also mainly reacting to Nathan's comment without looking too closely at the particular panic here. If combining ResumeSpans and Errors are the only way combine can fail, then I'm not too worried about the panics.
Thanks. I'd still be inclined to debate what exactly you're thinking when you say "crash that can be triggered by data received over the network is a bug" - like, if we try to read the value of a sequence and we get back a string, would it be a "bug" if we crashed? Or an inconsistent lease? I don't think you want to argue that, so you probably want to refine your statement to only refer to some response "metadata". But even then, do you think that a more precise description of what you want to express can be phrased, and still be general enough that I can tattoo it on my chest?
We don't need to have a general discussion if you don't want to; we can discuss this specific.
- combining something with a
ResumeSpan
was a panic; now it's a fatal. I think this indicates a situation where the gateway split a batch that it shouldn't have.- combining a response that had an error was an error. Now it's a Fatal. This indicates a bug in the DistSender's logic that's supposed to strip these errors before combining anything.
- combining two "non-combinable" was an error. The patch makes it a fatal. "Non-combinable" means we were trying to combine a, say,
ScanResponse
with aPutResponse
. Even before the patch, trying to combine aScanResponse
with aDelRangeResponse
(where they're both "combinable") would be crash. This indicates a bug in how the gateway calculates the positions of the responses it combines or, I guess, a server-side bug where we rearrange the responses to individual constituents of a batch.So, given these specific changes, do you object to the two new fatals introduced?
This conversation had moved to the top level PR comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 8 files at r4, 3 of 3 files at r5, 1 of 1 files at r6.
Reviewable status: complete! 0 of 0 LGTMs obtained
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 8 of 8 files at r4, 2 of 3 files at r5, 1 of 1 files at r6.
Reviewable status: complete! 1 of 0 LGTMs obtained
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Timed out |
bors r+
…On Thu, Sep 27, 2018 at 5:34 PM craig[bot] ***@***.***> wrote:
Timed out
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#30485 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAXBcVVK_NrX7X1d66Aj7oCN9nH6N7Zsks5ufUR9gaJpZM4WzOEm>
.
|
30485: kv: clarify the Sender contract and pre-allocate commit batches r=andreimatei a=andreimatei See individual commits. Release note: None Co-authored-by: Andrei Matei <[email protected]>
Build succeeded |
This reverts commit 06c1adf. Revert the last commit from cockroachdb#30485 - the one that pre-allocation and possible reuse of EndTransaction batches. It's causing races. I'll figure it out and re-send the original patch. Touches cockroachdb#30769 Release note: None
30773: Revert "client: don't allocate some commit batches" r=andreimatei a=andreimatei This reverts commit 06c1adf. Revert the last commit from #30485 - the one that pre-allocation and possible reuse of EndTransaction batches. It's causing races. I'll figure it out and re-send the original patch. Touches #30769 Release note: None Co-authored-by: Andrei Matei <[email protected]>
See individual commits.
Release note: None