-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opt: perf improvements for large queries #85100
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯 This looks great! Can you share some of the benchmark results?
Reviewed 2 of 2 files at r1, 4 of 4 files at r2, 2 of 2 files at r3, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball)
-- commits
line 16 at r2:
By "query size", do you mean number of joins in the query?
pkg/sql/opt/xform/coster.go
line 1103 at r3 (raw file):
// // shouldSkipEq is a function that is used to skip equality filters on some // joins in the case when they do not cost anything.
nit: explain why they don't cost anything, like "... when they do not cost anything because they exist in filters even though the join operation performs the filtering".
pkg/sql/opt/xform/join_funcs.go
line 122 at r3 (raw file):
addCol := func(col opt.ColumnID, descending bool) { eqIdx := getEqIdx(col)
nit: We only really need to equality columns here, not the index, so instead of returning the index you could make the function getEqCols := func(col opt.ColumnID) (left, right opt.ColumnID) { ... }
pkg/sql/opt/bench/bench_test.go
line 803 at r1 (raw file):
// 1. Table with small number of columns. // 2. Table with no indexes. // 3. Very simple query that returns single row based on key filter.
nit: these comments don't look correct
pkg/sql/opt/bench/bench_test.go
line 816 at r1 (raw file):
ORDER BY tab_455284.col1_2 DESC, tab_455284.col1_1 DESC LIMIT 1:::INT8)), (NULL, 6736:::INT8)) AS tab_455285 (col_1103773, col_1103774)),
nit: can you make the indentation less aggressive here?
pkg/sql/opt/bench/bench_test.go
line 852 at r1 (raw file):
(SELECT * FROM ( VALUES ((-0.19099748134613037):::FLOAT8), (0.9743397235870361:::FLOAT8), ((-1.6944892406463623):::FLOAT8)) AS tab_297691 (col_692430))
nit: indentation
pkg/sql/opt/props/equiv_set_test.go
line 27 at r2 (raw file):
const numIterations = 8 const chanceUseExisting = 0.5 const removeChance = 0.05
nit: removeChance
is unused
pkg/sql/opt/props/equiv_set_test.go
line 56 at r2 (raw file):
rightGroup := testOracle[right] rightContains := rightGroup.Contains(left) _ = rightContains
nit: remove all the lines related to rightGroup
- these are unnecessary, correct? You could simplify this all to return testOracle[left].Contains(right)
.
pkg/sql/opt/props/equiv_set_test.go
line 87 at r2 (raw file):
var colsUsed opt.ColSet var fds FuncDepSet t.Run(fmt.Sprintf("iteration %d", i), func(t *testing.T) {
nit: it might be good to test with different numbers of columns like:
const maxCols = 256
// ...
for i := 2; i <= maxCols; i = i << 1 {
for j := 0; i < numIterations; j++ {
t.Run(fmt.Sprintf("cols=%d;iteration=%d", i, j), func(t *testing.T) {
// ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you share some of the benchmark results?
Oops, forgot about that. Here are the two queries vs master:
name old time/op new time/op delta
SlowQueries/slow-query-1-10 34.1ms ± 0% 22.1ms ± 0% -35.25% (p=0.000 n=10+9)
SlowQueries/slow-query-2-10 342ms ± 0% 303ms ± 1% -11.46% (p=0.000 n=9+9)
name old alloc/op new alloc/op delta
SlowQueries/slow-query-1-10 26.6MB ± 0% 12.4MB ± 0% -53.24% (p=0.000 n=10+10)
SlowQueries/slow-query-2-10 228MB ± 0% 180MB ± 0% -21.15% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
SlowQueries/slow-query-1-10 220k ± 0% 126k ± 0% -42.53% (p=0.000 n=10+9)
SlowQueries/slow-query-2-10 1.74M ± 0% 1.60M ± 0% -8.34% (p=0.000 n=10+10)
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner)
Previously, mgartner (Marcus Gartner) wrote…
By "query size", do you mean number of joins in the query?
Yes, changed the comment. Done.
pkg/sql/opt/xform/coster.go
line 1103 at r3 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
nit: explain why they don't cost anything, like "... when they do not cost anything because they exist in filters even though the join operation performs the filtering".
Done.
pkg/sql/opt/xform/join_funcs.go
line 122 at r3 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
nit: We only really need to equality columns here, not the index, so instead of returning the index you could make the function
getEqCols := func(col opt.ColumnID) (left, right opt.ColumnID) { ... }
Done.
pkg/sql/opt/bench/bench_test.go
line 803 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
nit: these comments don't look correct
Oops, copy-paste error. Done.
pkg/sql/opt/bench/bench_test.go
line 816 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
nit: can you make the indentation less aggressive here?
Done.
pkg/sql/opt/bench/bench_test.go
line 852 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
nit: indentation
Done.
pkg/sql/opt/props/equiv_set_test.go
line 27 at r2 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
nit:
removeChance
is unused
Done.
pkg/sql/opt/props/equiv_set_test.go
line 56 at r2 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
nit: remove all the lines related to
rightGroup
- these are unnecessary, correct? You could simplify this all toreturn testOracle[left].Contains(right)
.
Done.
pkg/sql/opt/props/equiv_set_test.go
line 87 at r2 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
nit: it might be good to test with different numbers of columns like:
const maxCols = 256 // ... for i := 2; i <= maxCols; i = i << 1 { for j := 0; i < numIterations; j++ { t.Run(fmt.Sprintf("cols=%d;iteration=%d", i, j), func(t *testing.T) { // ...
Good idea. Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 2 files at r1, 7 of 7 files at r4, 4 of 4 files at r5, 2 of 2 files at r6, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball and @mgartner)
-- commits
line 22 at r6:
nit: add a release note
-- commits
line 41 at r6:
nit: add a release note
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 7 of 7 files at r4, 4 of 4 files at r5, 2 of 2 files at r6, all commit messages.
Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @DrewKimball)
pkg/sql/opt/props/equiv_set_test.go
line 83 at r5 (raw file):
var colsUsed opt.ColSet var fds FuncDepSet t.Run(fmt.Sprintf("cols=%d;iteration=%d", numCols, i), func(t *testing.T) {
nit: use /
instead of ;
- sorry I mislead you there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTRs!
Fixed the lint failure, but I think the test failures are flakes.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @mgartner and @rytaft)
Previously, rytaft (Rebecca Taft) wrote…
nit: add a release note
Done.
Previously, rytaft (Rebecca Taft) wrote…
nit: add a release note
Done.
pkg/sql/opt/props/equiv_set_test.go
line 83 at r5 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
nit: use
/
instead of;
- sorry I mislead you there
Oh, didn't notice that. Done.
Turned out those tests weren't flakes - ended up having to only set the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
once you squash those last two commits into the original three
Reviewed 6 of 6 files at r7, 4 of 4 files at r8, 2 of 2 files at r9, 1 of 1 files at r10, 2 of 2 files at r11, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 6 of 6 files at r7, 4 of 4 files at r8, 2 of 2 files at r9, 1 of 1 files at r10, 2 of 2 files at r11, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball)
This commit adds two slow-planning queries pulled from cockroachdb#64793 to be used in benchmarking the optimizer. In addition, the `ReorderJoinsLimit` has been set to the default 8 for benchmarking tests. Release note: None
Previously, the `JoinOrderBuilder` would construct a `FuncDepSet` from scratch on each call to `addJoins` in order to eliminate redundant join filters. This led to unnecessary large allocations because `addJoins` is called an exponential number of times in the number of joins in the query. This commit adds a struct `EquivSet` that efficiently stores equivalence relations as `ColSets` in a slice. Rather than being constructed on each call to `addJoins`, a `Reset` method is called that maintains slice memory. Release note (performance improvement): The optimizer is now less likely to take an excessive amount of time to plan queries with many joins.
Previously, `computeHashJoinCost` would use a `FastIntMap` to represent join equality filters to pass to `computeFiltersCost`. In addition, `GenerateMergeJoins` used a `FastIntMap` to look up columns among its join equality columns. This lead to unnecessary allocations since column IDs are often large enough to exceed the small field of `FastIntMap`. This commit modifies `computeFiltersCost` to take an anonymous function that is used to decide whether to skip an equality condition, removing the need for a mapping between columns. This commit also refactors `GenerateMergeJoins` to simply perform a linear scan of its equality columns; this avoids the allocation issue, and should be fast in practice because the number of equalities will not generally be large. Release note (performance improvement): The optimizer is now less likely to take an excessive amount of time to plan queries with many joins.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
once you squash those last two commits into the original three
Whoops. Done.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTRs!
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @mgartner and @rytaft)
I think the test failures are actually flakes this time - the |
bors r+ |
Build succeeded: |
opt: add bench test for slow queries
This commit adds two slow-planning queries pulled from #64793 to be used
in benchmarking the optimizer. In addition, the
ReorderJoinsLimit
has beenset to the default 8 for benchmarking tests.
opt: add struct for tracking column equivalence sets
Previously, the
JoinOrderBuilder
would construct aFuncDepSet
fromscratch on each call to
addJoins
in order to eliminate redundant joinfilters. This led to unnecessary large allocations because
addJoins
iscalled an exponential number of times in query size.
This commit adds a struct
EquivSet
that efficiently stores equivalencerelations as
ColSets
in a slice. Rather than being constructed on eachcall to
addJoins
, aReset
method is called that maintains slice memory.In the future,
EquivSet
can be used to handle equivalencies withinFuncDepSet
structs as well. This well avoid a significant number of allocations in cases with
many equivalent columns, as outlined in #83963.
opt: avoid usage of FastIntMap in optimizer hot paths
Previously,
computeHashJoinCost
would use aFastIntMap
to represent joinequality filters to pass to
computeFiltersCost
. In addition,GenerateMergeJoins
used aFastIntMap
to look up columns among its joinequality columns. This lead to unnecessary allocations since column IDs are
often large enough to exceed the small field of
FastIntMap
.This commit modifies
computeFiltersCost
to take an anonymous functionthat is used to decide whether to skip an equality condition, removing the
need for a mapping between columns.
This commit also refactors
GenerateMergeJoins
to simply perform a linearscan of its equality columns; this avoids the allocation issue, and should be
fast in practice because the number of equalities will not generally be large.
Release note: None