kv: dist sender sends too many RPCs #34999

jordanlewis · 2019-02-15T16:46:23Z

The dist sender sends a batch to many ranges at once by chunking the batch up into one per range, and making a potentially asynchronous RPC (or local call, if the range lives on the local node) to each range.

Consider a case where there are 10 ranges per node, and a batch contains requests for all 10 ranges for a remote node. Then, the dist sender will make 10 asynchronous network RPCs all to the same node.

This seems inefficient. If the dist sender instead chunked up a batch by node, instead of by range, and sent a single RPC per node, then we'd have way fewer in-flight RPCs. The receiving node could then be in charge of further concurrency if it chooses. At the moment, the dist sender is the arbiter of concurrency - even for remote nodes, whose load it knows nothing about!

I'm assigning @nvanbenschoten for initial thoughts.

jordanlewis · 2019-02-15T16:47:31Z

@ajwerner might also be interested in this, as it has to do with admission control. We shouldn't make a ton of concurrent RPCs to a remote node when we don't know how loaded that node is. The remote node should have much more control of what it does with a set of incoming requests.

ajwerner · 2019-02-15T16:59:00Z

This seems like a valuable and straightforward change. The distsender already has all the information it might need to split up the batch like this. I'd be happy to take a stab at this. Do you have good workloads in mind that hit this limit?

jordanlewis · 2019-02-15T17:40:41Z

All workloads that use index or lookup joins on tables with reasonable numbers of ranges will hit thi slimit.

ajwerner · 2019-02-15T19:30:00Z

After some reflection it's not clear that the concurrency has a huge overhead. The big issue is that hitting the concurrency limit is extremely expensive from a latency perspective (especially in a geo-distributed setting 😱. Sure we're trusting GRPC to be efficient at dealing with lots of requests and we're spawning lots of goroutines on the sender but it's not clear the cost of that in practice. Have you tried bumping defaultSenderConcurrency for the workloads you've been running and seeing its impact?

ajwerner · 2019-02-15T19:30:40Z

I'm still going to try to type up the experiment but I worry it's going to have some unintended consequences and is going to further muddle some already pretty gnarly and complex code

github-actions · 2021-06-05T02:14:37Z

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
5 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

jordanlewis added C-performance Perf of queries or internals. Solution not expected to change functional behavior. A-admission-control labels Feb 15, 2019

jordanlewis assigned nvanbenschoten Feb 15, 2019

jordanlewis assigned ajwerner and unassigned nvanbenschoten Feb 15, 2019

github-actions bot added the no-issue-activity label Jun 5, 2021

github-actions bot added the X-stale label Jun 16, 2021

github-actions bot closed this as completed Jun 16, 2021

ajwerner mentioned this issue Jul 14, 2021

kv: rangefeeds use too many goroutines #67600

Closed

ajwerner mentioned this issue Jul 8, 2022

kvserver,server,sql: provide efficient mechanism to retrieve data size information for span #84105

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kv: dist sender sends too many RPCs #34999

kv: dist sender sends too many RPCs #34999

jordanlewis commented Feb 15, 2019

jordanlewis commented Feb 15, 2019

ajwerner commented Feb 15, 2019

jordanlewis commented Feb 15, 2019

ajwerner commented Feb 15, 2019

ajwerner commented Feb 15, 2019

github-actions bot commented Jun 5, 2021

kv: dist sender sends too many RPCs #34999

kv: dist sender sends too many RPCs #34999

Comments

jordanlewis commented Feb 15, 2019

jordanlewis commented Feb 15, 2019

ajwerner commented Feb 15, 2019

jordanlewis commented Feb 15, 2019

ajwerner commented Feb 15, 2019

ajwerner commented Feb 15, 2019

github-actions bot commented Jun 5, 2021