-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: reconsider joinReaderOrderingStrategy batch size #72857
Comments
In #48058, we made the following change: I think Alfonso ran some benchmarks using TPC-H queries and picked this number as a result. Making this batch size hint a cluster setting sounds like a good idea to me, but simply increasing it could have performance regressions on some workloads, especially given that we have added more memory accounting into the |
As you note, cluster-level settings like this are problematic in that they affect every query. Could we instead add a query hint, similar to the way we can hint a particular join type or index? If that is a step too far, a session variable would be useful to limit the scope. |
Query hint sounds a bit too heavy-weight for this (and probably wouldn't be pretty), but using a session variable is definitely a good point. Added that in #72921. |
72921: rowexec: make batch size of join reader ordering strategy a setting r=yuzefovich a=yuzefovich Three different strategies of the join reader have different batch sizes (which determine how many input rows are buffered to perform a lookup) with the ordering strategy having the smallest size of 10KiB. This commit introduces a private cluster setting (as well as the corresponding session variable that will be propagated to the remote flows) that determines that batch size but keeps 10KiB as the default value. Touches: #72857. Release note: None 72929: sql: remove unnecessary parens from cluster_contended_indexes r=matthewtodd a=matthewtodd The modified query introduced in #72780 had unnecessary parentheses in its where clause. This change removes them. Release note: None 72933: protectedts: improve a test r=andreimatei a=andreimatei This test was starting a Reconciler by hand, running against an existing Server. The Reconciler was using the cluster's stopper (as opposed to the server's stopper). In doing so, we ended up with two different Tracers contributing to the same traces. This is pretty broken and will become dissallowed. The patch fixes it by configuring the alien Reconciler with the same stopper (and tracer) as the server it's running against. Release note: None Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: Matthew Todd <[email protected]> Co-authored-by: Andrei Matei <[email protected]>
@yuzefovich thanks for landing #72921. I'll defer to you on whether you'd like to keep this issue open to track doing something more dynamic with this batch size (like the TODO mentions) or whether you'd like to close it. |
I think I'll keep this issue as a placeholder and inspiration for introducing some dynamic sizing that I might look into after the |
@yuzefovich is this issue still relevant now that we've made progress on the Streamer? |
Yes, this issue is still relevant since right now this batch size still controls how much data we buffer from the input before creating requests to pass them to the streamer. That said, once we support pipelining of |
joinReaderStrategy
implementations each contain agetLookupRowsBatchSizeHint
method, which dictates the size in bytes of the batch of lookup rows. There are currently three strategies.joinReaderIndexJoinStrategy
returns a batch size of 4 MiB,joinReaderNoOrderingStrategy
returns a batch size of 2 MiB, andjoinReaderOrderingStrategy
returns a batch size of 10 KiB.I'd like to understand the reason for the comparatively tiny batch size used with the
joinReaderOrderingStrategy
. The implementation includes the following comment:cockroach/pkg/sql/rowexec/joinreader_strategies.go
Lines 505 to 509 in b96940c
This comes up in the context of TPC-E, where this low batch size leads to some queries performing more KV operations than necessary, leading to wasted work and increased latency. Take the following query as an example:
The query should be serviceable using 3 KV requests. First, it should issue a
Scan
request to thetrade
table. Next, it should issue a batch of 20Get
requests to thesettlement
table. Finally, it should issue another batch of 20Get
requests to thecash_transaction
table.Instead, a trace of the query looks like the following (stmt-bundle-711251192270782466.zip):
Notice the 5 KV requests. Each of the two batches of
Get
requests is split because the join pipeline is chunked up due to the 10 KiB batch limit associated with thejoinReaderOrderingStrategy
. If I bump this limit to 100 KiB, the query executes as expected (stmt-bundle-711258586890502146.zip):At the risk of bumping up against ✨ @jordanlewis Law of Sticky Magic Constants ✨, I'd like to pick a fight with this constant batch size. Is it well-reasoned? Should it be a cluster setting instead of a constant? Would we be concerned with just increasing it?
Jira issue: CRDB-11321
The text was updated successfully, but these errors were encountered: