-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opt: hoist uncorrelated equality subqueries #100881
opt: hoist uncorrelated equality subqueries #100881
Conversation
In the backports for this I'll default this session setting to Here's some analysis of the performance improvements for a simple query that benefits from this change: https://gist.github.com/mgartner/9aa764ecbdb6d879d71863eb6b32b4c2 |
6f4073d
to
c93a876
Compare
Hold off on reviewing for now - there's some interesting test failures this has caused. |
c93a876
to
a82051e
Compare
This is currently blocked on #100915. |
a82051e
to
abab506
Compare
|
abab506
to
209a9e0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great idea! Do you have benchmark results for tpch Q15?
Reviewed 3 of 3 files at r1, 1 of 1 files at r2, 3 of 3 files at r3, 19 of 19 files at r4, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @rytaft)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have benchmark results for tpch Q15?
+1
Reviewed 3 of 3 files at r1, 1 of 1 files at r2, 3 of 3 files at r3, 19 of 19 files at r4, all commit messages.
Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @mgartner)
209a9e0
to
0d0ea8c
Compare
Subqueries that are in equality expressions with a variable are now hoisted. When these expressions exist in a filter, hoisting the subquery can allow the main query to plan a lookup join, rather than an inefficient full-table scan. For example, consider the table and query: CREATE TABLE t ( a INT, INDEX (a) ); SELECT * FROM t WHERE a = (SELECT max(a) FROM t); Prior to this commit, the query plan for this query required a full table scan: select ├── columns: a:1 ├── scan t@t_a_idx │ ├── columns: a:1 │ └── constraint: /1/2: (/NULL - ] └── filters └── eq ├── a:1 └── subquery └── scalar-group-by ├── columns: max:9 ├── scan t@t_a_idx,rev │ ├── columns: a:5 │ ├── constraint: /5/6: (/NULL - ] │ └── limit: 1(rev) └── aggregations └── const-agg [as=max:9, outer=(5)] └── a:5 By hoisting the subquery, the full table scan is replaced with a lookup join: project ├── columns: a:1 └── inner-join (lookup t@t_a_idx) ├── columns: a:1 max:9 ├── key columns: [9] = [1] ├── scalar-group-by │ ├── columns: max:9 │ ├── scan t@t_a_idx,rev │ │ ├── columns: a:5 │ │ ├── constraint: /5/6: (/NULL - ] │ │ └── limit: 1(rev) │ └── aggregations │ └── const-agg [as=max:9, outer=(5)] │ └── a:5 └── filters (true) This hoisting is enabled by default, but can be disabled by setting the `optimizer_hoist_uncorrelated_equality_subqueries` session setting to `false`. Fixes cockroachdb#83392 Informs cockroachdb#51820 Informs cockroachdb#93829 Informs cockroachdb#100855 Release note (performance improvement): Queries that have subqueries in equality expressions are now more efficiently planned by the optimizer.
0d0ea8c
to
41fc214
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have benchmark results for tpch Q15?
How do I run the TPC-H benchmark?
I ran q15 manually in a demo on my GCE worker, and its ~5% slower than before this commit. Before it took ~6.3s and not it takes ~6.6s. It's not super obvious from the statement bundles why it's slower - it seems like it should be faster based on the query plan. And most of the time in both cases is spent on index-joins - so q15 would be best optimized with a STORING
index to eliminate them. That'd have a significantly greater impact than this 5% regression. So I think this is fine. What do you all think?
Master stmt-bundle: https://gist.github.com/mgartner/3ce71bb0d8861011820e05b11bf5391e
Hoist subquery stmt-bundle: https://gist.github.com/mgartner/f9170879a8bc1a24a82f44979c3be4cd
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @DrewKimball and @rytaft)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running queries manually is how I normally check for TPC-H regressions. How many times did you run the query? 5% seems small enough that it could just be noise. In any case, I'm not too concerned.
Regarding a STORING
index -- I'm not sure what's allowed according to the spec, but if it's legal, I'm all for it (in a separate PR, obviously).
Reviewed 8 of 8 files at r5, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @mgartner)
I don't think it was noise because it was consistently ~5% slower when run multiple times in multiple demos. I'm not too concerned about the 5% either. |
I'm going to go ahead and merge. TFTRs! bors r+ |
Build succeeded: |
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from 41fc214 to blathers/backport-release-22.2-100881: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 22.2.x failed. See errors above. error creating merge commit from 41fc214 to blathers/backport-release-23.1.0-100881: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 23.1.0 failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
Since cockroachdb#100881, the optimizer has hoisted uncorrelated subqueries used in an equality expression (only when the `optimizer_hoist_uncorrelated_equality_subqueries` session setting is enabled). This can cause problems when the hoisted subquery has been duplicated in the expression tree, e.g., when pushing a filter into both sides of a join. A subquery is a scalar expression, so the columns of its child expression are never emitted from the subquery. This makes it safe duplicate a subquery in an expression tree. However, when a subquery is hoisted, it is transformed into a join which can produce the columns of the child expression. Hoisting the same subquery multiple times can produce query plans with duplicate column IDs in two logically different expressions. This can lead to incorrect query plans (see the comment for `opt.Metadata`), as well as produce expressions with children that have intersecting column IDs (after additional normalization rules fire). To avoid these dangers, this commit ensures that each unique subquery is hoisted at most once. This will prevent bad plans, but it may not inhibit the optimizer from finding optimal plans. In the future, it may be possible to lift this restriction by generating new column IDs for uncorrelated subqueries each time they are hoisted. Fixes cockroachdb#114703 There is no release note because the session setting enabling this bug is disabled by default, and because the possible correctness bug is theoretical - we have not found a reproduction of a correctness bug, but it could exist in theory. Release note: None
115142: opt: hoist uncorrelated subqueries at most once r=mgartner a=mgartner Since #100881, the optimizer has hoisted uncorrelated subqueries used in an equality expression (only when the `optimizer_hoist_uncorrelated_equality_subqueries` session setting is enabled). This can cause problems when the hoisted subquery has been duplicated in the expression tree, e.g., when pushing a filter into both sides of a join. A subquery is a scalar expression, so the columns of its child expression are never emitted from the subquery. This makes it safe duplicate a subquery in an expression tree. However, when a subquery is hoisted, it is transformed into a join which can produce the columns of the child expression. Hoisting the same subquery multiple times can produce query plans with duplicate column IDs in two logically different expressions. This can lead to incorrect query plans (see the comment for `opt.Metadata`), as well as produce expressions with children that have intersecting column IDs (after additional normalization rules fire). To avoid these dangers, this commit ensures that each unique subquery is hoisted at most once. This will prevent bad plans, but it may not inhibit the optimizer from finding optimal plans. In the future, it may be possible to lift this restriction by generating new column IDs for uncorrelated subqueries each time they are hoisted. Fixes #114703 There is no release note because the session setting enabling this bug is disabled by default, and because the possible correctness bug is theoretical - we have not found a reproduction of a correctness bug, but it could exist in theory. Release note: None Co-authored-by: Marcus Gartner <[email protected]>
Since #100881, the optimizer has hoisted uncorrelated subqueries used in an equality expression (only when the `optimizer_hoist_uncorrelated_equality_subqueries` session setting is enabled). This can cause problems when the hoisted subquery has been duplicated in the expression tree, e.g., when pushing a filter into both sides of a join. A subquery is a scalar expression, so the columns of its child expression are never emitted from the subquery. This makes it safe duplicate a subquery in an expression tree. However, when a subquery is hoisted, it is transformed into a join which can produce the columns of the child expression. Hoisting the same subquery multiple times can produce query plans with duplicate column IDs in two logically different expressions. This can lead to incorrect query plans (see the comment for `opt.Metadata`), as well as produce expressions with children that have intersecting column IDs (after additional normalization rules fire). To avoid these dangers, this commit ensures that each unique subquery is hoisted at most once. This will prevent bad plans, but it may not inhibit the optimizer from finding optimal plans. In the future, it may be possible to lift this restriction by generating new column IDs for uncorrelated subqueries each time they are hoisted. Fixes #114703 There is no release note because the session setting enabling this bug is disabled by default, and because the possible correctness bug is theoretical - we have not found a reproduction of a correctness bug, but it could exist in theory. Release note: None
Since #100881, the optimizer has hoisted uncorrelated subqueries used in an equality expression (only when the `optimizer_hoist_uncorrelated_equality_subqueries` session setting is enabled). This can cause problems when the hoisted subquery has been duplicated in the expression tree, e.g., when pushing a filter into both sides of a join. A subquery is a scalar expression, so the columns of its child expression are never emitted from the subquery. This makes it safe duplicate a subquery in an expression tree. However, when a subquery is hoisted, it is transformed into a join which can produce the columns of the child expression. Hoisting the same subquery multiple times can produce query plans with duplicate column IDs in two logically different expressions. This can lead to incorrect query plans (see the comment for `opt.Metadata`), as well as produce expressions with children that have intersecting column IDs (after additional normalization rules fire). To avoid these dangers, this commit ensures that each unique subquery is hoisted at most once. This will prevent bad plans, but it may not inhibit the optimizer from finding optimal plans. In the future, it may be possible to lift this restriction by generating new column IDs for uncorrelated subqueries each time they are hoisted. Fixes #114703 There is no release note because the session setting enabling this bug is disabled by default, and because the possible correctness bug is theoretical - we have not found a reproduction of a correctness bug, but it could exist in theory. Release note: None
Since #100881, the optimizer has hoisted uncorrelated subqueries used in an equality expression (only when the `optimizer_hoist_uncorrelated_equality_subqueries` session setting is enabled). This can cause problems when the hoisted subquery has been duplicated in the expression tree, e.g., when pushing a filter into both sides of a join. A subquery is a scalar expression, so the columns of its child expression are never emitted from the subquery. This makes it safe duplicate a subquery in an expression tree. However, when a subquery is hoisted, it is transformed into a join which can produce the columns of the child expression. Hoisting the same subquery multiple times can produce query plans with duplicate column IDs in two logically different expressions. This can lead to incorrect query plans (see the comment for `opt.Metadata`), as well as produce expressions with children that have intersecting column IDs (after additional normalization rules fire). To avoid these dangers, this commit ensures that each unique subquery is hoisted at most once. This will prevent bad plans, but it may not inhibit the optimizer from finding optimal plans. In the future, it may be possible to lift this restriction by generating new column IDs for uncorrelated subqueries each time they are hoisted. Fixes #114703 There is no release note because the session setting enabling this bug is disabled by default, and because the possible correctness bug is theoretical - we have not found a reproduction of a correctness bug, but it could exist in theory. Release note: None
opt: hoist uncorrelated equality subqueries
Subqueries that are in equality expressions with a variable are now
hoisted. When these expressions exist in a filter, hoisting the subquery
can allow the main query to plan a lookup join, rather than an
inefficient full-table scan.
For example, consider the table and query:
Prior to this commit, the query plan for this query required a full
table scan:
By hoisting the subquery, the full table scan is replaced with a lookup
join:
This hoisting is enabled by default, but can be disabled by setting the
optimizer_hoist_uncorrelated_equality_subqueries
session setting tofalse
.Fixes #83392
Informs #51820
Informs #93829
Informs #100855
Release note (performance improvement): Queries that have subqueries in
equality expressions are now more efficiently planned by the optimizer.