sql: reconsider joinReaderOrderingStrategy batch size #72857

nvanbenschoten · 2021-11-17T06:46:45Z

joinReaderStrategy implementations each contain a getLookupRowsBatchSizeHint method, which dictates the size in bytes of the batch of lookup rows. There are currently three strategies. joinReaderIndexJoinStrategy returns a batch size of 4 MiB, joinReaderNoOrderingStrategy returns a batch size of 2 MiB, and joinReaderOrderingStrategy returns a batch size of 10 KiB.

I'd like to understand the reason for the comparatively tiny batch size used with the joinReaderOrderingStrategy. The implementation includes the following comment:

cockroach/pkg/sql/rowexec/joinreader_strategies.go

Lines 505 to 509 in b96940c

    
           func (s *joinReaderOrderingStrategy) getLookupRowsBatchSizeHint() int64 { 
        
           	// TODO(asubiotto): Eventually we might want to adjust this batch size 
        
           	//  dynamically based on whether the result row container spilled or not. 
        
           	return 10 << 10 /* 10 KiB */ 
        
           }

This comes up in the context of TPC-E, where this low batch size leads to some queries performing more KV operations than necessary, leading to wasted work and increased latency. Take the following query as an example:

SELECT
  t_id,
  t_bid_price::FLOAT8,
  t_exec_name,
  t_is_cash,
  t_trade_price::FLOAT8,
  se_amt::FLOAT8,
  se_cash_due_date,
  se_cash_type,
  ct_amt::FLOAT8,
  ct_dts,
  ct_name
FROM
  trade
  LEFT LOOKUP JOIN settlement ON se_t_id = t_id
  LEFT LOOKUP JOIN cash_transaction ON ct_t_id = t_id
WHERE
  t_ca_id = $1 AND t_dts >= $2 AND t_dts <= $3
ORDER BY
  t_dts ASC
LIMIT
  20

The query should be serviceable using 3 KV requests. First, it should issue a Scan request to the trade table. Next, it should issue a batch of 20 Get requests to the settlement table. Finally, it should issue another batch of 20 Get requests to the cash_transaction table.

Instead, a trace of the query looks like the following (stmt-bundle-711251192270782466.zip):

Notice the 5 KV requests. Each of the two batches of Get requests is split because the join pipeline is chunked up due to the 10 KiB batch limit associated with the joinReaderOrderingStrategy. If I bump this limit to 100 KiB, the query executes as expected (stmt-bundle-711258586890502146.zip):

At the risk of bumping up against ✨ @jordanlewis Law of Sticky Magic Constants ✨, I'd like to pick a fight with this constant batch size. Is it well-reasoned? Should it be a cluster setting instead of a constant? Would we be concerned with just increasing it?

Jira issue: CRDB-11321

The text was updated successfully, but these errors were encountered:

yuzefovich · 2021-11-18T15:59:37Z

In #48058, we made the following change:
The lookup join with ordering variant has a batch size of 10KiB. The goal is to keep it as close to the previous 100 row batch size as possible, since increasing the batch size could result in lookup joins that were not spilling to disk do so, since the number of results grows based on the number of lookup rows and this strategy need to buffer all result rows for a lookup batch.

I think Alfonso ran some benchmarks using TPC-H queries and picked this number as a result.

Making this batch size hint a cluster setting sounds like a good idea to me, but simply increasing it could have performance regressions on some workloads, especially given that we have added more memory accounting into the joinReader during 21.2 cycle and the disk-backed container used for buffered rows shares the same memory pool with the joinReader.

petermattis · 2021-11-18T17:17:09Z

Making this batch size hint a cluster setting sounds like a good idea to me, but simply increasing it could have performance regressions on some workloads, especially given that we have added more memory accounting into the joinReader during 21.2 cycle and the disk-backed container used for buffered rows shares the same memory pool with the joinReader.

As you note, cluster-level settings like this are problematic in that they affect every query. Could we instead add a query hint, similar to the way we can hint a particular join type or index? If that is a step too far, a session variable would be useful to limit the scope.

yuzefovich · 2021-11-18T19:38:55Z

Could we instead add a query hint, similar to the way we can hint a particular join type or index? If that is a step too far, a session variable would be useful to limit the scope.

Query hint sounds a bit too heavy-weight for this (and probably wouldn't be pretty), but using a session variable is definitely a good point. Added that in #72921.

72921: rowexec: make batch size of join reader ordering strategy a setting r=yuzefovich a=yuzefovich Three different strategies of the join reader have different batch sizes (which determine how many input rows are buffered to perform a lookup) with the ordering strategy having the smallest size of 10KiB. This commit introduces a private cluster setting (as well as the corresponding session variable that will be propagated to the remote flows) that determines that batch size but keeps 10KiB as the default value. Touches: #72857. Release note: None 72929: sql: remove unnecessary parens from cluster_contended_indexes r=matthewtodd a=matthewtodd The modified query introduced in #72780 had unnecessary parentheses in its where clause. This change removes them. Release note: None 72933: protectedts: improve a test r=andreimatei a=andreimatei This test was starting a Reconciler by hand, running against an existing Server. The Reconciler was using the cluster's stopper (as opposed to the server's stopper). In doing so, we ended up with two different Tracers contributing to the same traces. This is pretty broken and will become dissallowed. The patch fixes it by configuring the alien Reconciler with the same stopper (and tracer) as the server it's running against. Release note: None Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: Matthew Todd <[email protected]> Co-authored-by: Andrei Matei <[email protected]>

nvanbenschoten · 2021-11-19T18:56:19Z

@yuzefovich thanks for landing #72921. I'll defer to you on whether you'd like to keep this issue open to track doing something more dynamic with this batch size (like the TODO mentions) or whether you'd like to close it.

yuzefovich · 2021-11-19T20:52:33Z

I think I'll keep this issue as a placeholder and inspiration for introducing some dynamic sizing that I might look into after the Streamer work is done.

mgartner · 2023-06-22T15:42:29Z

@yuzefovich is this issue still relevant now that we've made progress on the Streamer?

yuzefovich · 2023-06-22T16:39:48Z

Yes, this issue is still relevant since right now this batch size still controls how much data we buffer from the input before creating requests to pass them to the streamer. That said, once we support pipelining of Enqueue and GetResults calls by the streamer (#82163), then this issue will become irrelevant.

nvanbenschoten added C-performance Perf of queries or internals. Solution not expected to change functional behavior. A-sql-execution Relating to SQL execution. labels Nov 17, 2021

blathers-crl bot added the T-sql-queries SQL Queries Team label Nov 17, 2021

yuzefovich mentioned this issue Nov 18, 2021

rowexec: make batch size of join reader ordering strategy a setting #72921

Merged

mgartner added this to SQL Queries Jul 24, 2023

mgartner moved this to New Backlog in SQL Queries Jul 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: reconsider joinReaderOrderingStrategy batch size #72857

sql: reconsider joinReaderOrderingStrategy batch size #72857

nvanbenschoten commented Nov 17, 2021 •

edited by cockroach-jira-scripts

Loading

yuzefovich commented Nov 18, 2021

petermattis commented Nov 18, 2021

yuzefovich commented Nov 18, 2021

nvanbenschoten commented Nov 19, 2021

yuzefovich commented Nov 19, 2021

mgartner commented Jun 22, 2023

yuzefovich commented Jun 22, 2023

sql: reconsider joinReaderOrderingStrategy batch size #72857

sql: reconsider joinReaderOrderingStrategy batch size #72857

Comments

nvanbenschoten commented Nov 17, 2021 • edited by cockroach-jira-scripts Loading

yuzefovich commented Nov 18, 2021

petermattis commented Nov 18, 2021

yuzefovich commented Nov 18, 2021

nvanbenschoten commented Nov 19, 2021

yuzefovich commented Nov 19, 2021

mgartner commented Jun 22, 2023

yuzefovich commented Jun 22, 2023

nvanbenschoten commented Nov 17, 2021 •

edited by cockroach-jira-scripts

Loading