Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv, client: don't send non-txn requests through the TxnCoordSender anymore #26741

Merged
merged 1 commit into from
Jun 20, 2018

Conversation

andreimatei
Copy link
Contributor

We were sending them through the TCS because the TCS was in charge of
wrapping them in a Txn and retrying if the batch spanned requests (cause
batches need to be atomic and you can only get that cross-range in
txns).
But that's nasty. The TCS is littered with checks about whether a
request is transactional or not, and the code to do the wrapped retry
did not belong there anyway.
This patch moves the wrapping/retry in a new Sender under the client.DB.
Now non-txn requests go through that and then straight to the
DistSender.

Release note: None

@andreimatei andreimatei requested review from a team June 15, 2018 00:15
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@andreimatei
Copy link
Contributor Author

Everything but the last commit is #26496

@bdarnell
Copy link
Contributor

LGTM


Review status: :shipit: complete! 0 of 0 LGTMs obtained


Comments from Reviewable

a-robinson added a commit to a-robinson/cockroach that referenced this pull request Jun 16, 2018
When running TPC-C 10k on a 30 node cluster without partitioning, range
1 was receiving thousands of qps while all other ranges were receiving
no more than low hundreds of qps (more details in cockroachdb#26608. Part of it was
context cancellations causing range descriptors to be evicted from the
range cache (cockroachdb#26764), but an even bigger part of it was HeartbeatTxns
being sent for transactions with no anchor key, accounting for thousands
of QPS even after cockroachdb#26764 was fixed.

This causes the same outcome as the old code without the load, because
without this change we'd just send the request and get back a
REASON_TXN_NOT_FOUND error, which would cause the function to return
true.

It's possible that we should instead avoid the heartbeat loop at all for
transactions without a key, or that we should put in more effort to
prevent such requests from even counting as transactions (a la cockroachdb#26741,
which perhaps makes this change unnecessary?). Advice would be great.

Release note: None
@andreimatei andreimatei force-pushed the txn-wrapping branch 2 times, most recently from fad9e0c to 6067f93 Compare June 18, 2018 20:36
@nvanbenschoten
Copy link
Member

:lgtm:


Reviewed 22 of 22 files at r1.
Review status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale)


pkg/internal/client/sender.go, line 91 at r1 (raw file):

	// DistSQL flow.
	// txn is the transaction whose requests this sender will carry.
	TransactionalSender(typ TxnType, txn *roachpb.Transaction) TxnSender

I think there's still value in keeping this name as NewTransactionalSender to indicate that this is constructing an object instead of just returning a singleton.


Comments from Reviewable

@andreimatei
Copy link
Contributor Author

bors r+


Review status: :shipit: complete! 1 of 0 LGTMs obtained


pkg/internal/client/sender.go, line 91 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I think there's still value in keeping this name as NewTransactionalSender to indicate that this is constructing an object instead of just returning a singleton.

well but then would I also rename NonTransactionalSender()? Maybe sometimes the implementation of that returns a new instance of something too...
I'll just leave it. The comment on the TxnSenderFactory (plus the "factory" name) interface suggests enough that new objects may be created, I'd say.


Comments from Reviewable

@craig
Copy link
Contributor

craig bot commented Jun 19, 2018

Build failed

@andreimatei
Copy link
Contributor Author

the bors failure is an acceptance timeout. I've tested the reported test manually and the whole PR on TC a few times and didn't repro...

bors r+


Review status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale)


Comments from Reviewable

@tbg
Copy link
Member

tbg commented Jun 19, 2018 via email

@andreimatei
Copy link
Contributor Author

TestRapidRestarts was the one running at the 30 minute mark when the timeout fired.
Failure was here https://teamcity.cockroachdb.com/viewLog.html?buildId=727850&buildTypeId=Cockroach_UnitTests

@craig
Copy link
Contributor

craig bot commented Jun 19, 2018

Canceled

@andreimatei
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented Jun 19, 2018

Merge conflict (retrying...)

…y more

We were sending them through the TCS because the TCS was in charge of
wrapping them in a Txn and retrying if the batch spanned requests (cause
batches need to be atomic and you can only get that cross-range in
txns).
But that's nasty. The TCS is littered with checks about whether a
request is transactional or not, and the code to do the wrapped retry
did not belong there anyway.
This patch moves the wrapping/retry in a new Sender under the client.DB.
Now non-txn requests go through that and then straight to the
DistSender.

Release note: None
@andreimatei
Copy link
Contributor Author

bors r+


Review status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale)


Comments from Reviewable

@andreimatei
Copy link
Contributor Author

bors r+


Review status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale)


Comments from Reviewable

@craig
Copy link
Contributor

craig bot commented Jun 20, 2018

Build failed (retrying...)

craig bot pushed a commit that referenced this pull request Jun 20, 2018
26741: kv, client: don't send non-txn requests through the TxnCoordSender anymore r=andreimatei a=andreimatei

We were sending them through the TCS because the TCS was in charge of
wrapping them in a Txn and retrying if the batch spanned requests (cause
batches need to be atomic and you can only get that cross-range in
txns).
But that's nasty. The TCS is littered with checks about whether a
request is transactional or not, and the code to do the wrapped retry
did not belong there anyway.
This patch moves the wrapping/retry in a new Sender under the client.DB.
Now non-txn requests go through that and then straight to the
DistSender.

Release note: None

26856: distsql: change default disk monitor increment to 1MiB r=asubiotto a=asubiotto

The previous increment was 64MiB. This was unnecessarily large and
provided too high a granularity for stat reporting.

Closes #26793

Release note: None

Co-authored-by: Andrei Matei <[email protected]>
Co-authored-by: Alfonso Subiotto Marqués <[email protected]>
@craig
Copy link
Contributor

craig bot commented Jun 20, 2018

Build succeeded

@craig craig bot merged commit 12eec9a into cockroachdb:master Jun 20, 2018
@andreimatei andreimatei deleted the txn-wrapping branch June 20, 2018 19:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants