-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql/distsqlrun: add benchmarks for aggregator and mergeJoiner #20759
sql/distsqlrun: add benchmarks for aggregator and mergeJoiner #20759
Conversation
Added a 3rd commit which significantly speeds up
|
1f2ec5c
to
8cbd609
Compare
Updated the 3rd commit to fix another place where
@RaduBerinde I'm assuming there isn't a correctness issue with using an |
8cbd609
to
a7d0956
Compare
b.SetBytes(int64(8 * numRows * numCols)) | ||
b.ResetTimer() | ||
for i := 0; i < b.N; i++ { | ||
h, err := newMergeJoiner(flowCtx, spec, leftInput, rightInput, post, &RowDisposer{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think the initialization of the processor here is pessimistically skewing the benchmark numbers.
Perhaps try pulling this out of the loop? I don't think any other state is mutated horribly in mergeJoiner
after each successive Run
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The initialization of the processor is only a factor at small numRows
. And it is somewhat interesting to capture that initialization overhead. The hash joiner benchmark has a TODO about addressing this overhead, but I'm not convinced we want to change the benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough 👍 the hash joiner's state with the memory containers also look non-trivial to address for the benchmark so it might be best to keep this uniform across the benchmarks.
Reviewed 2 of 2 files at r1, 2 of 2 files at r2, 4 of 4 files at r3. pkg/sql/distsqlrun/aggregator_test.go, line 411 at r1 (raw file):
Consider extracting all of this minus pkg/sql/distsqlrun/aggregator_test.go, line 427 at r1 (raw file):
nit: This could probably be put into the call to pkg/sql/distsqlrun/aggregator_test.go, line 440 at r1 (raw file):
Is this call to pkg/sql/distsqlrun/hashjoiner_test.go, line 878 at r2 (raw file):
Although we don't have a standard for the benchmark key capitalization (we didn't decide in #16830), all the benchmarks in Also, since you're switching pkg/sql/distsqlrun/mergejoiner_test.go, line 590 at r2 (raw file):
nit: Comments from Reviewable |
Interesting that grouping is slightly faster than distinct. They are doing the same work in these benchmarks. The STDDEV and VARIANCE aggregations are slow due to decimal computations. name time/op Aggregation/IDENT-8 49.6µs ± 2% Aggregation/AVG-8 70.4µs ± 1% Aggregation/COUNT-8 49.6µs ± 2% Aggregation/MAX-8 56.7µs ± 2% Aggregation/MIN-8 55.6µs ± 2% Aggregation/STDDEV-8 1.50ms ± 1% Aggregation/SUM-8 64.8µs ± 2% Aggregation/SUM_INT-8 65.4µs ± 4% Aggregation/VARIANCE-8 1.48ms ± 3% Aggregation/XOR_AGG-8 50.5µs ± 2% Grouping-8 260µs ± 1% Distinct-8 291µs ± 1% name speed Aggregation/IDENT-8 161MB/s ± 2% Aggregation/AVG-8 114MB/s ± 1% Aggregation/COUNT-8 161MB/s ± 2% Aggregation/MAX-8 141MB/s ± 2% Aggregation/MIN-8 144MB/s ± 2% Aggregation/STDDEV-8 5.35MB/s ± 1% Aggregation/SUM-8 124MB/s ± 2% Aggregation/SUM_INT-8 122MB/s ± 4% Aggregation/VARIANCE-8 5.42MB/s ± 3% Aggregation/XOR_AGG-8 158MB/s ± 2% Grouping-8 30.8MB/s ± 1% Distinct-8 27.5MB/s ± 1% Release note: None
Adjust BenchmarkHashJoiner to remove the projection and only use a single column so that the benchmark focuses on "speed of light" of the processor itself. name time/op HashJoiner/rows=0-8 2.90µs ± 1% HashJoiner/rows=4-8 7.50µs ± 1% HashJoiner/rows=16-8 18.0µs ± 1% HashJoiner/rows=256-8 217µs ± 1% HashJoiner/rows=4096-8 3.39ms ± 1% HashJoiner/rows=65536-8 64.4ms ± 4% MergeJoiner/rows=0-8 3.15µs ± 0% MergeJoiner/rows=4-8 6.50µs ± 1% MergeJoiner/rows=16-8 14.9µs ± 0% MergeJoiner/rows=256-8 170µs ± 1% MergeJoiner/rows=4096-8 2.64ms ± 0% MergeJoiner/rows=65536-8 44.4ms ± 1% name speed HashJoiner/rows=0-8 HashJoiner/rows=4-8 4.27MB/s ± 1% HashJoiner/rows=16-8 7.09MB/s ± 1% HashJoiner/rows=256-8 9.43MB/s ± 1% HashJoiner/rows=4096-8 9.66MB/s ± 1% HashJoiner/rows=65536-8 8.14MB/s ± 4% MergeJoiner/rows=0-8 MergeJoiner/rows=4-8 4.93MB/s ± 0% MergeJoiner/rows=16-8 8.61MB/s ± 0% MergeJoiner/rows=256-8 12.0MB/s ± 1% MergeJoiner/rows=4096-8 12.4MB/s ± 0% MergeJoiner/rows=65536-8 11.8MB/s ± 1% Release note: None
Plumb EvalContext into streamMerger and streamGroupAccumulator in order to avoid allocating it on every call to advanceGroup. name old time/op new time/op delta MergeJoiner/rows=0-8 3.15µs ± 0% 3.29µs ± 1% +4.54% (p=0.000 n=10+9) MergeJoiner/rows=4-8 6.50µs ± 1% 6.01µs ± 1% -7.62% (p=0.000 n=9+8) MergeJoiner/rows=16-8 14.9µs ± 0% 9.2µs ± 1% -38.01% (p=0.000 n=8+9) MergeJoiner/rows=256-8 170µs ± 1% 71µs ± 2% -58.19% (p=0.000 n=10+10) MergeJoiner/rows=4096-8 2.64ms ± 0% 1.05ms ± 0% -60.42% (p=0.000 n=8+9) MergeJoiner/rows=65536-8 44.4ms ± 1% 17.2ms ± 1% -61.22% (p=0.000 n=9+10) name old speed new speed delta MergeJoiner/rows=4-8 4.93MB/s ± 0% 5.33MB/s ± 1% +8.15% (p=0.000 n=8+8) MergeJoiner/rows=16-8 8.61MB/s ± 0% 13.89MB/s ± 1% +61.32% (p=0.000 n=8+9) MergeJoiner/rows=256-8 12.0MB/s ± 1% 28.8MB/s ± 2% +139.20% (p=0.000 n=10+10) MergeJoiner/rows=4096-8 12.4MB/s ± 0% 31.3MB/s ± 0% +152.68% (p=0.000 n=8+9) MergeJoiner/rows=65536-8 11.8MB/s ± 1% 30.4MB/s ± 1% +157.84% (p=0.000 n=9+10) Release note (performance change): Speed up except and merge joins by avoiding an unnecessary allocation.
a7d0956
to
bda143f
Compare
Review status: 1 of 8 files reviewed at latest revision, 6 unresolved discussions. pkg/sql/distsqlrun/aggregator_test.go, line 411 at r1 (raw file): Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
Extracting into what? Out of the loop? That doesn't make much if any difference. pkg/sql/distsqlrun/aggregator_test.go, line 427 at r1 (raw file): Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
I copy&pasted this from somewhere else. Avoiding the allocation was not a consideration. I'll move the creation of pkg/sql/distsqlrun/aggregator_test.go, line 440 at r1 (raw file): Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
I usually do this out of habit. We sometimes put stuff in pkg/sql/distsqlrun/hashjoiner_test.go, line 878 at r2 (raw file): Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
I've reverted this to pkg/sql/distsqlrun/mergejoiner_test.go, line 590 at r2 (raw file): Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
Done. Comments from Reviewable |
If anything, the old code had issues. There may be some cornercases which we were screwing up before (maybe |
We all desire approval, from our partners, from our parents, from our friends. This PR desires it too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ha, LGTM
Interesting that grouping is slightly faster than distinct. They are
doing the same work in these benchmarks. The STDDEV and VARIANCE
aggregations are slow due to decimal computations.
Release note: None