kv, client: don't send non-txn requests through the TxnCoordSender anymore #26741

andreimatei · 2018-06-15T00:15:44Z

We were sending them through the TCS because the TCS was in charge of
wrapping them in a Txn and retrying if the batch spanned requests (cause
batches need to be atomic and you can only get that cross-range in
txns).
But that's nasty. The TCS is littered with checks about whether a
request is transactional or not, and the code to do the wrapped retry
did not belong there anyway.
This patch moves the wrapping/retry in a new Sender under the client.DB.
Now non-txn requests go through that and then straight to the
DistSender.

Release note: None

cockroach-teamcity · 2018-06-15T00:15:53Z

This change is

andreimatei · 2018-06-15T00:15:57Z

Everything but the last commit is #26496

bdarnell · 2018-06-15T19:52:54Z

LGTM

Review status: complete! 0 of 0 LGTMs obtained

Comments from Reviewable

When running TPC-C 10k on a 30 node cluster without partitioning, range 1 was receiving thousands of qps while all other ranges were receiving no more than low hundreds of qps (more details in cockroachdb#26608. Part of it was context cancellations causing range descriptors to be evicted from the range cache (cockroachdb#26764), but an even bigger part of it was HeartbeatTxns being sent for transactions with no anchor key, accounting for thousands of QPS even after cockroachdb#26764 was fixed. This causes the same outcome as the old code without the load, because without this change we'd just send the request and get back a REASON_TXN_NOT_FOUND error, which would cause the function to return true. It's possible that we should instead avoid the heartbeat loop at all for transactions without a key, or that we should put in more effort to prevent such requests from even counting as transactions (a la cockroachdb#26741, which perhaps makes this change unnecessary?). Advice would be great. Release note: None

nvanbenschoten · 2018-06-19T15:42:15Z

Reviewed 22 of 22 files at r1.
Review status: complete! 0 of 0 LGTMs obtained (and 1 stale)

pkg/internal/client/sender.go, line 91 at r1 (raw file):

	// DistSQL flow.
	// txn is the transaction whose requests this sender will carry.
	TransactionalSender(typ TxnType, txn *roachpb.Transaction) TxnSender

I think there's still value in keeping this name as NewTransactionalSender to indicate that this is constructing an object instead of just returning a singleton.

Comments from Reviewable

andreimatei · 2018-06-19T17:19:27Z

bors r+

Review status: complete! 1 of 0 LGTMs obtained

pkg/internal/client/sender.go, line 91 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I think there's still value in keeping this name as NewTransactionalSender to indicate that this is constructing an object instead of just returning a singleton.

well but then would I also rename NonTransactionalSender()? Maybe sometimes the implementation of that returns a new instance of something too...
I'll just leave it. The comment on the TxnSenderFactory (plus the "factory" name) interface suggests enough that new objects may be created, I'd say.

Comments from Reviewable

craig · 2018-06-19T18:08:50Z

Build failed

GitHub CI (Cockroach)

andreimatei · 2018-06-19T20:10:05Z

the bors failure is an acceptance timeout. I've tested the reported test manually and the whole PR on TC a few times and didn't repro...

bors r+

Review status: complete! 0 of 0 LGTMs obtained (and 1 stale)

Comments from Reviewable

tbg · 2018-06-19T20:18:03Z

Which acceptance test?

On Tue, Jun 19, 2018 at 4:10 PM Andrei Matei ***@***.***> wrote: the bors failure is an acceptance timeout. I've tested the reported test manually and the whole PR on TC a few times and didn't repro... bors r+ ------------------------------ Review status: [image:

] complete! 0 of 0 LGTMs obtained (and 1 stale) ------------------------------ *Comments from Reviewable <https://reviewable.io/reviews/cockroachdb/cockroach/26741#-:-LFOeXfA2pIPrkqvGkVA:b-x13gpo>* — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <#26741 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AE135LW9FhxhsLv0ADo9Qc9f-gdSWZKUks5t-VqsgaJpZM4Uo13n> .

--

…

-- Tobias

andreimatei · 2018-06-19T20:21:06Z

TestRapidRestarts was the one running at the 30 minute mark when the timeout fired.
Failure was here https://teamcity.cockroachdb.com/viewLog.html?buildId=727850&buildTypeId=Cockroach_UnitTests

craig · 2018-06-19T20:31:30Z

Canceled

andreimatei · 2018-06-19T20:33:04Z

bors r+

craig · 2018-06-19T20:49:22Z

Merge conflict (retrying...)

…y more We were sending them through the TCS because the TCS was in charge of wrapping them in a Txn and retrying if the batch spanned requests (cause batches need to be atomic and you can only get that cross-range in txns). But that's nasty. The TCS is littered with checks about whether a request is transactional or not, and the code to do the wrapped retry did not belong there anyway. This patch moves the wrapping/retry in a new Sender under the client.DB. Now non-txn requests go through that and then straight to the DistSender. Release note: None

andreimatei · 2018-06-20T15:01:29Z

bors r+

Review status: complete! 0 of 0 LGTMs obtained (and 1 stale)

Comments from Reviewable

andreimatei · 2018-06-20T15:40:20Z

bors r+

Review status: complete! 0 of 0 LGTMs obtained (and 1 stale)

Comments from Reviewable

craig · 2018-06-20T15:53:52Z

Build failed (retrying...)

GitHub CI (Cockroach)

26741: kv, client: don't send non-txn requests through the TxnCoordSender anymore r=andreimatei a=andreimatei We were sending them through the TCS because the TCS was in charge of wrapping them in a Txn and retrying if the batch spanned requests (cause batches need to be atomic and you can only get that cross-range in txns). But that's nasty. The TCS is littered with checks about whether a request is transactional or not, and the code to do the wrapped retry did not belong there anyway. This patch moves the wrapping/retry in a new Sender under the client.DB. Now non-txn requests go through that and then straight to the DistSender. Release note: None 26856: distsql: change default disk monitor increment to 1MiB r=asubiotto a=asubiotto The previous increment was 64MiB. This was unnecessarily large and provided too high a granularity for stat reporting. Closes #26793 Release note: None Co-authored-by: Andrei Matei <[email protected]> Co-authored-by: Alfonso Subiotto Marqués <[email protected]>

craig · 2018-06-20T16:25:49Z

Build succeeded

GitHub CI (Cockroach)

andreimatei assigned tbg and nvanbenschoten Jun 15, 2018

andreimatei requested review from a team June 15, 2018 00:15

a-robinson mentioned this pull request Jun 16, 2018

kv: Don't heartbeat transactions that are lacking an anchor key #26765

Closed

nvanbenschoten mentioned this pull request Jun 18, 2018

kv: decompose TxnCoordSender into a stack of txnReqInterceptors #26496

Merged

andreimatei force-pushed the txn-wrapping branch 2 times, most recently from fad9e0c to 6067f93 Compare June 18, 2018 20:36

andreimatei mentioned this pull request Jun 18, 2018

kv: update the TCS's txn on requests way out #26811

Merged

andreimatei force-pushed the txn-wrapping branch from 6067f93 to 001c693 Compare June 18, 2018 21:18

andreimatei force-pushed the txn-wrapping branch from 001c693 to 84f43c2 Compare June 19, 2018 19:48

andreimatei force-pushed the txn-wrapping branch from 84f43c2 to 115985c Compare June 19, 2018 20:31

andreimatei force-pushed the txn-wrapping branch from 115985c to 12eec9a Compare June 20, 2018 15:01

craig bot merged commit 12eec9a into cockroachdb:master Jun 20, 2018

andreimatei deleted the txn-wrapping branch June 20, 2018 19:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kv, client: don't send non-txn requests through the TxnCoordSender anymore #26741

kv, client: don't send non-txn requests through the TxnCoordSender anymore #26741

andreimatei commented Jun 15, 2018

cockroach-teamcity commented Jun 15, 2018

andreimatei commented Jun 15, 2018

bdarnell commented Jun 15, 2018

nvanbenschoten commented Jun 19, 2018

andreimatei commented Jun 19, 2018

craig bot commented Jun 19, 2018

andreimatei commented Jun 19, 2018

tbg commented Jun 19, 2018 via email

andreimatei commented Jun 19, 2018

craig bot commented Jun 19, 2018

andreimatei commented Jun 19, 2018

craig bot commented Jun 19, 2018

andreimatei commented Jun 20, 2018

andreimatei commented Jun 20, 2018

craig bot commented Jun 20, 2018

craig bot commented Jun 20, 2018

kv, client: don't send non-txn requests through the TxnCoordSender anymore #26741

kv, client: don't send non-txn requests through the TxnCoordSender anymore #26741

Conversation

andreimatei commented Jun 15, 2018

cockroach-teamcity commented Jun 15, 2018

andreimatei commented Jun 15, 2018

bdarnell commented Jun 15, 2018

nvanbenschoten commented Jun 19, 2018

andreimatei commented Jun 19, 2018

craig bot commented Jun 19, 2018

Build failed

andreimatei commented Jun 19, 2018

tbg commented Jun 19, 2018 via email

andreimatei commented Jun 19, 2018

craig bot commented Jun 19, 2018

Canceled

andreimatei commented Jun 19, 2018

craig bot commented Jun 19, 2018

Merge conflict (retrying...)

andreimatei commented Jun 20, 2018

andreimatei commented Jun 20, 2018

craig bot commented Jun 20, 2018

Build failed (retrying...)

craig bot commented Jun 20, 2018

Build succeeded