Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: reconsider joinReaderOrderingStrategy batch size #72857

Open
nvanbenschoten opened this issue Nov 17, 2021 · 7 comments
Open

sql: reconsider joinReaderOrderingStrategy batch size #72857

nvanbenschoten opened this issue Nov 17, 2021 · 7 comments
Labels
A-sql-execution Relating to SQL execution. C-performance Perf of queries or internals. Solution not expected to change functional behavior. T-sql-queries SQL Queries Team

Comments

@nvanbenschoten
Copy link
Member

nvanbenschoten commented Nov 17, 2021

joinReaderStrategy implementations each contain a getLookupRowsBatchSizeHint method, which dictates the size in bytes of the batch of lookup rows. There are currently three strategies. joinReaderIndexJoinStrategy returns a batch size of 4 MiB, joinReaderNoOrderingStrategy returns a batch size of 2 MiB, and joinReaderOrderingStrategy returns a batch size of 10 KiB.

I'd like to understand the reason for the comparatively tiny batch size used with the joinReaderOrderingStrategy. The implementation includes the following comment:

func (s *joinReaderOrderingStrategy) getLookupRowsBatchSizeHint() int64 {
// TODO(asubiotto): Eventually we might want to adjust this batch size
// dynamically based on whether the result row container spilled or not.
return 10 << 10 /* 10 KiB */
}

This comes up in the context of TPC-E, where this low batch size leads to some queries performing more KV operations than necessary, leading to wasted work and increased latency. Take the following query as an example:

SELECT
  t_id,
  t_bid_price::FLOAT8,
  t_exec_name,
  t_is_cash,
  t_trade_price::FLOAT8,
  se_amt::FLOAT8,
  se_cash_due_date,
  se_cash_type,
  ct_amt::FLOAT8,
  ct_dts,
  ct_name
FROM
  trade
  LEFT LOOKUP JOIN settlement ON se_t_id = t_id
  LEFT LOOKUP JOIN cash_transaction ON ct_t_id = t_id
WHERE
  t_ca_id = $1 AND t_dts >= $2 AND t_dts <= $3
ORDER BY
  t_dts ASC
LIMIT
  20

The query should be serviceable using 3 KV requests. First, it should issue a Scan request to the trade table. Next, it should issue a batch of 20 Get requests to the settlement table. Finally, it should issue another batch of 20 Get requests to the cash_transaction table.

Instead, a trace of the query looks like the following (stmt-bundle-711251192270782466.zip):

Screen Shot 2021-11-17 at 1 17 39 AM

Notice the 5 KV requests. Each of the two batches of Get requests is split because the join pipeline is chunked up due to the 10 KiB batch limit associated with the joinReaderOrderingStrategy. If I bump this limit to 100 KiB, the query executes as expected (stmt-bundle-711258586890502146.zip):

Screen Shot 2021-11-17 at 1 17 44 AM

At the risk of bumping up against ✨ @jordanlewis Law of Sticky Magic Constants ✨, I'd like to pick a fight with this constant batch size. Is it well-reasoned? Should it be a cluster setting instead of a constant? Would we be concerned with just increasing it?

Jira issue: CRDB-11321

@nvanbenschoten nvanbenschoten added C-performance Perf of queries or internals. Solution not expected to change functional behavior. A-sql-execution Relating to SQL execution. labels Nov 17, 2021
@blathers-crl blathers-crl bot added the T-sql-queries SQL Queries Team label Nov 17, 2021
@yuzefovich
Copy link
Member

In #48058, we made the following change:
The lookup join with ordering variant has a batch size of 10KiB. The goal is to keep it as close to the previous 100 row batch size as possible, since increasing the batch size could result in lookup joins that were not spilling to disk do so, since the number of results grows based on the number of lookup rows and this strategy need to buffer all result rows for a lookup batch.

I think Alfonso ran some benchmarks using TPC-H queries and picked this number as a result.

Making this batch size hint a cluster setting sounds like a good idea to me, but simply increasing it could have performance regressions on some workloads, especially given that we have added more memory accounting into the joinReader during 21.2 cycle and the disk-backed container used for buffered rows shares the same memory pool with the joinReader.

@petermattis
Copy link
Collaborator

Making this batch size hint a cluster setting sounds like a good idea to me, but simply increasing it could have performance regressions on some workloads, especially given that we have added more memory accounting into the joinReader during 21.2 cycle and the disk-backed container used for buffered rows shares the same memory pool with the joinReader.

As you note, cluster-level settings like this are problematic in that they affect every query. Could we instead add a query hint, similar to the way we can hint a particular join type or index? If that is a step too far, a session variable would be useful to limit the scope.

@yuzefovich
Copy link
Member

Could we instead add a query hint, similar to the way we can hint a particular join type or index? If that is a step too far, a session variable would be useful to limit the scope.

Query hint sounds a bit too heavy-weight for this (and probably wouldn't be pretty), but using a session variable is definitely a good point. Added that in #72921.

craig bot pushed a commit that referenced this issue Nov 18, 2021
72921: rowexec: make batch size of join reader ordering strategy a setting r=yuzefovich a=yuzefovich

Three different strategies of the join reader have different batch sizes
(which determine how many input rows are buffered to perform a lookup)
with the ordering strategy having the smallest size of 10KiB. This
commit introduces a private cluster setting (as well as the
corresponding session variable that will be propagated to the remote
flows) that determines that batch size but keeps 10KiB as the default
value.

Touches: #72857.

Release note: None

72929: sql: remove unnecessary parens from cluster_contended_indexes r=matthewtodd a=matthewtodd

The modified query introduced in #72780 had unnecessary parentheses in
its where clause. This change removes them.

Release note: None

72933: protectedts: improve a test r=andreimatei a=andreimatei

This test was starting a Reconciler by hand, running against an existing
Server. The Reconciler was using the cluster's stopper (as opposed to
the server's stopper). In doing so, we ended up with two different Tracers
contributing to the same traces. This is pretty broken and will become
dissallowed. The patch fixes it by configuring the alien Reconciler with
the same stopper (and tracer) as the server it's running against.

Release note: None

Co-authored-by: Yahor Yuzefovich <[email protected]>
Co-authored-by: Matthew Todd <[email protected]>
Co-authored-by: Andrei Matei <[email protected]>
@nvanbenschoten
Copy link
Member Author

@yuzefovich thanks for landing #72921. I'll defer to you on whether you'd like to keep this issue open to track doing something more dynamic with this batch size (like the TODO mentions) or whether you'd like to close it.

@yuzefovich
Copy link
Member

I think I'll keep this issue as a placeholder and inspiration for introducing some dynamic sizing that I might look into after the Streamer work is done.

@mgartner
Copy link
Collaborator

@yuzefovich is this issue still relevant now that we've made progress on the Streamer?

@yuzefovich
Copy link
Member

Yes, this issue is still relevant since right now this batch size still controls how much data we buffer from the input before creating requests to pass them to the streamer. That said, once we support pipelining of Enqueue and GetResults calls by the streamer (#82163), then this issue will become irrelevant.

@mgartner mgartner moved this to New Backlog in SQL Queries Jul 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-execution Relating to SQL execution. C-performance Perf of queries or internals. Solution not expected to change functional behavior. T-sql-queries SQL Queries Team
Projects
Status: Backlog
Development

No branches or pull requests

4 participants