Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: hoist uncorrelated subqueries at most once #115142

Merged
merged 1 commit into from
Nov 29, 2023

Conversation

mgartner
Copy link
Collaborator

@mgartner mgartner commented Nov 27, 2023

Since #100881, the optimizer has hoisted uncorrelated subqueries used in
an equality expression (only when the
optimizer_hoist_uncorrelated_equality_subqueries session setting is
enabled). This can cause problems when the hoisted subquery has been
duplicated in the expression tree, e.g., when pushing a filter into both
sides of a join.

A subquery is a scalar expression, so the columns of its child
expression are never emitted from the subquery. This makes it safe
duplicate a subquery in an expression tree. However, when a subquery is
hoisted, it is transformed into a join which can produce the columns of
the child expression. Hoisting the same subquery multiple times can
produce query plans with duplicate column IDs in two logically different
expressions. This can lead to incorrect query plans (see the comment for
opt.Metadata), as well as produce expressions with children that have
intersecting column IDs (after additional normalization rules fire).

To avoid these dangers, this commit ensures that each unique subquery is
hoisted at most once. This will prevent bad plans, but it may not
inhibit the optimizer from finding optimal plans. In the future, it may
be possible to lift this restriction by generating new column IDs for
uncorrelated subqueries each time they are hoisted.

Fixes #114703

There is no release note because the session setting enabling this bug
is disabled by default, and because the possible correctness bug is
theoretical - we have not found a reproduction of a correctness bug, but
it could exist in theory.

Release note: None

@mgartner mgartner added backport-23.1.x Flags PRs that need to be backported to 23.1 backport-23.2.x Flags PRs that need to be backported to 23.2. labels Nov 27, 2023
@mgartner mgartner requested a review from a team November 27, 2023 21:41
@mgartner mgartner requested a review from a team as a code owner November 27, 2023 21:41
@mgartner mgartner requested review from DrewKimball and removed request for a team November 27, 2023 21:41
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Nice fix! Is this also a problem for UDFs? Also, what's stopping it from being a problem for correlated subqueries as well?

Reviewed 3 of 3 files at r1, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @mgartner)


-- commits line 24 at r1:
[nit] ideas -> ids

@mgartner
Copy link
Collaborator Author

Is this also a problem for UDFs?

It is indeed, and this PR doesn't fix it. I'll probably modify this PR to address it.

Also, what's stopping it from being a problem for correlated subqueries as well?

There's nothing fundamental preventing this for correlated subqueries. In the test cases I've added, PushFilterIntoJoinLeftAndRight is the rule that pushes a filter with a subquery into both sides of a join. That particular type of subquery duplication is prevented by this check in CanMapJoinOpFilter:

if scalarProps.HasCorrelatedSubquery {
return false
}

So it's possible that we duplicate a correlated subquery in some other way, and arrive at the same problem. However, we've not witnessed an occurrence of this error with a correlated subquery (to my knowledge), so I'm hesitant to add the same limitation.

Copy link
Collaborator Author

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this also a problem for UDFs?

It is indeed, and this PR doesn't fix it. I'll probably modify this PR to address it.

Nevermind, I was confused. I think a query with UDFs could run into a similar issue when the UDF is inlined as a subquery, but this PR should prevent that. Is there a specific case you were thinking of that I should test?

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @mgartner)

@mgartner mgartner force-pushed the 114703-fix-hoist-subquery branch from bf39fbf to a221080 Compare November 28, 2023 15:34
@mgartner mgartner requested a review from DrewKimball November 28, 2023 15:34
Copy link
Collaborator Author

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball)


-- commits line 24 at r1:

Previously, DrewKimball (Drew Kimball) wrote…

[nit] ideas -> ids

Done.

Copy link
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, I was confused. I think a query with UDFs could run into a similar issue when the UDF is inlined as a subquery, but this PR should prevent that. Is there a specific case you were thinking of that I should test?

Good point, I forgot that inlining a UDF just creates a subquery. It might be nice to have a simple test with a no-arg UDF that gets inlined.

Reviewed all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner)

Since cockroachdb#100881, the optimizer has hoisted uncorrelated subqueries used in
an equality expression (only when the
`optimizer_hoist_uncorrelated_equality_subqueries` session setting is
enabled). This can cause problems when the hoisted subquery has been
duplicated in the expression tree, e.g., when pushing a filter into both
sides of a join.

A subquery is a scalar expression, so the columns of its child
expression are never emitted from the subquery. This makes it safe
duplicate a subquery in an expression tree. However, when a subquery is
hoisted, it is transformed into a join which can produce the columns of
the child expression. Hoisting the same subquery multiple times can
produce query plans with duplicate column IDs in two logically different
expressions. This can lead to incorrect query plans (see the comment for
`opt.Metadata`), as well as produce expressions with children that have
intersecting column IDs (after additional normalization rules fire).

To avoid these dangers, this commit ensures that each unique subquery is
hoisted at most once. This will prevent bad plans, but it may not
inhibit the optimizer from finding optimal plans. In the future, it may
be possible to lift this restriction by generating new column IDs for
uncorrelated subqueries each time they are hoisted.

Fixes cockroachdb#114703

There is no release note because the session setting enabling this bug
is disabled by default, and because the possible correctness bug is
theoretical - we have not found a reproduction of a correctness bug, but
it could exist in theory.

Release note: None
@mgartner mgartner force-pushed the 114703-fix-hoist-subquery branch from a221080 to 4c4e5a4 Compare November 28, 2023 22:04
@mgartner
Copy link
Collaborator Author

It might be nice to have a simple test with a no-arg UDF that gets inlined.

Done.

@mgartner
Copy link
Collaborator Author

TFTRs!

bors r+

@craig
Copy link
Contributor

craig bot commented Nov 29, 2023

Build succeeded:

@craig craig bot merged commit b819e14 into cockroachdb:master Nov 29, 2023
8 checks passed
@mgartner mgartner deleted the 114703-fix-hoist-subquery branch November 29, 2023 20:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-23.1.x Flags PRs that need to be backported to 23.1 backport-23.2.x Flags PRs that need to be backported to 23.2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql/tests: TestRandomSyntaxSQLSmith failed [inner-join RelExpr children have intersecting columns]
3 participants