-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kv, client: don't send non-txn requests through the TxnCoordSender anymore #26741
Conversation
Everything but the last commit is #26496 |
LGTM Review status: complete! 0 of 0 LGTMs obtained Comments from Reviewable |
When running TPC-C 10k on a 30 node cluster without partitioning, range 1 was receiving thousands of qps while all other ranges were receiving no more than low hundreds of qps (more details in cockroachdb#26608. Part of it was context cancellations causing range descriptors to be evicted from the range cache (cockroachdb#26764), but an even bigger part of it was HeartbeatTxns being sent for transactions with no anchor key, accounting for thousands of QPS even after cockroachdb#26764 was fixed. This causes the same outcome as the old code without the load, because without this change we'd just send the request and get back a REASON_TXN_NOT_FOUND error, which would cause the function to return true. It's possible that we should instead avoid the heartbeat loop at all for transactions without a key, or that we should put in more effort to prevent such requests from even counting as transactions (a la cockroachdb#26741, which perhaps makes this change unnecessary?). Advice would be great. Release note: None
fad9e0c
to
6067f93
Compare
6067f93
to
001c693
Compare
Reviewed 22 of 22 files at r1. pkg/internal/client/sender.go, line 91 at r1 (raw file):
I think there's still value in keeping this name as Comments from Reviewable |
bors r+ Review status: complete! 1 of 0 LGTMs obtained pkg/internal/client/sender.go, line 91 at r1 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
well but then would I also rename Comments from Reviewable |
Build failed |
001c693
to
84f43c2
Compare
the bors failure is an acceptance timeout. I've tested the reported test manually and the whole PR on TC a few times and didn't repro... bors r+ Review status: complete! 0 of 0 LGTMs obtained (and 1 stale) Comments from Reviewable |
Which acceptance test?
On Tue, Jun 19, 2018 at 4:10 PM Andrei Matei ***@***.***> wrote:
the bors failure is an acceptance timeout. I've tested the reported test
manually and the whole PR on TC a few times and didn't repro...
bors r+
------------------------------
Review status: [image: ] complete! 0 of 0 LGTMs obtained (and 1
stale)
------------------------------
*Comments from Reviewable
<https://reviewable.io/reviews/cockroachdb/cockroach/26741#-:-LFOeXfA2pIPrkqvGkVA:b-x13gpo>*
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
<#26741 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AE135LW9FhxhsLv0ADo9Qc9f-gdSWZKUks5t-VqsgaJpZM4Uo13n>
.
--
…-- Tobias
|
|
84f43c2
to
115985c
Compare
Canceled |
bors r+ |
Merge conflict (retrying...) |
…y more We were sending them through the TCS because the TCS was in charge of wrapping them in a Txn and retrying if the batch spanned requests (cause batches need to be atomic and you can only get that cross-range in txns). But that's nasty. The TCS is littered with checks about whether a request is transactional or not, and the code to do the wrapped retry did not belong there anyway. This patch moves the wrapping/retry in a new Sender under the client.DB. Now non-txn requests go through that and then straight to the DistSender. Release note: None
115985c
to
12eec9a
Compare
bors r+ Review status: complete! 0 of 0 LGTMs obtained (and 1 stale) Comments from Reviewable |
bors r+ Review status: complete! 0 of 0 LGTMs obtained (and 1 stale) Comments from Reviewable |
Build failed (retrying...) |
26741: kv, client: don't send non-txn requests through the TxnCoordSender anymore r=andreimatei a=andreimatei We were sending them through the TCS because the TCS was in charge of wrapping them in a Txn and retrying if the batch spanned requests (cause batches need to be atomic and you can only get that cross-range in txns). But that's nasty. The TCS is littered with checks about whether a request is transactional or not, and the code to do the wrapped retry did not belong there anyway. This patch moves the wrapping/retry in a new Sender under the client.DB. Now non-txn requests go through that and then straight to the DistSender. Release note: None 26856: distsql: change default disk monitor increment to 1MiB r=asubiotto a=asubiotto The previous increment was 64MiB. This was unnecessarily large and provided too high a granularity for stat reporting. Closes #26793 Release note: None Co-authored-by: Andrei Matei <[email protected]> Co-authored-by: Alfonso Subiotto Marqués <[email protected]>
Build succeeded |
We were sending them through the TCS because the TCS was in charge of
wrapping them in a Txn and retrying if the batch spanned requests (cause
batches need to be atomic and you can only get that cross-range in
txns).
But that's nasty. The TCS is littered with checks about whether a
request is transactional or not, and the code to do the wrapped retry
did not belong there anyway.
This patch moves the wrapping/retry in a new Sender under the client.DB.
Now non-txn requests go through that and then straight to the
DistSender.
Release note: None