Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: Hoist Exists operator and try to decorrelate #24969

Merged
merged 2 commits into from
Apr 24, 2018

Conversation

andy-kimball
Copy link
Contributor

This PR has two commits for easier review. The first adds support for OuterCols as a relational property. The second uses that to implement transforming the Exists operator into a SemiJoinApply operator, and then attempting to decorrelate that.

@andy-kimball andy-kimball requested a review from a team as a code owner April 22, 2018 01:29
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@petermattis
Copy link
Collaborator

Review status: 0 of 25 files reviewed at latest revision, all discussions resolved.


pkg/sql/opt/norm/rules/join.opt, line 114 at r2 (raw file):

# operator.
#
# Citations: [3]

I believe the transformation in [3] is apply(R, select(E)) -> select(apply(R, E)). You're doing something slightly different here by tacking the selection filters onto the join condition. This is what I did in opttoy and it seemed to work ok, though it might be worthwhile to call out this difference in the comment here.


pkg/sql/opt/norm/rules/select.opt, line 234 at r2 (raw file):

    $input:*
    $filter:(Filters
        $list:[ ... $exists:(Not (Exists $subquery:* & (IsCorrelatedSubquery $subquery))) ... ]

Nice! These transformations are quite pleasant to read.


pkg/sql/opt/norm/testdata/join, line 304 at r2 (raw file):

SELECT * FROM a WHERE EXISTS(SELECT * FROM b WHERE x=k)
----
semi-join

🎉


Comments from Reviewable

@rytaft
Copy link
Collaborator

rytaft commented Apr 23, 2018

Reviewed 14 of 14 files at r1.
Review status: 14 of 25 files reviewed at latest revision, 3 unresolved discussions.


pkg/sql/opt/memo/logical_props_factory.go, line 130 at r1 (raw file):

	if !filterProps.OuterCols.SubsetOf(inputProps.OutputCols) {
		props.Relational.OuterCols = filterProps.OuterCols.Difference(inputProps.OutputCols)
		props.Relational.OuterCols.UnionWith(inputProps.OuterCols)

Should this line be outside of the if statement? (Also, the if may not be needed at all -- Difference should handle it.)


pkg/sql/opt/memo/logical_props_factory.go, line 155 at r1 (raw file):

	// columns are outer columns from the Project expression, in addition to any
	// outer columns inherited from the input expression.
	if !projectionProps.OuterCols.SubsetOf(inputProps.OutputCols) {

I don't think this check is necessary -- Difference will handle it.


pkg/sql/opt/memo/logical_props_factory.go, line 209 at r1 (raw file):

	// inherited from the input expressions.
	inputCols := leftProps.OutputCols.Union(rightProps.OutputCols)
	if !onProps.OuterCols.SubsetOf(inputCols) {

I don't think this check is necessary -- Difference will handle it.


pkg/sql/opt/memo/logical_props_factory.go, line 252 at r1 (raw file):

	// Any outer columns from aggregation expressions that are not bound by the
	// input columns are outer columns.
	props.Relational.OuterCols = aggProps.OuterCols.Copy()

[nit] to make this match the logic above, you could just do

props.Relational.OuterCols = aggProps.OuterCols.Difference(inputProps.OuterCols)


pkg/sql/opt/memo/logical_props_factory.go, line 343 at r1 (raw file):

	// Inherit outer columns from limit expression.
	if !limitProps.OuterCols.Empty() {
		props.Relational.OuterCols = limitProps.OuterCols.Union(inputProps.OuterCols)

you miss the inputProps outer cols if there are no outer cols in the limit.


pkg/sql/opt/memo/logical_props_factory.go, line 362 at r1 (raw file):

	// Inherit outer columns from offset expression.
	if !offsetProps.OuterCols.Empty() {
		props.Relational.OuterCols = offsetProps.OuterCols.Union(inputProps.OuterCols)

ditto


Comments from Reviewable

@RaduBerinde
Copy link
Member

Great stuff!


Review status: 14 of 25 files reviewed at latest revision, 9 unresolved discussions.


pkg/sql/opt/memo/logical_props_factory.go, line 130 at r1 (raw file):

Previously, rytaft wrote…

Should this line be outside of the if statement? (Also, the if may not be needed at all -- Difference should handle it.)

Note that OuterCols is inputProps.OuterCols already here (which is not ok to modify)


pkg/sql/opt/norm/factory.go, line 187 at r2 (raw file):

		}

		newList[i] = item

[nit] add a comment saying that if the list doesn't contain the element, we panic here


pkg/sql/opt/norm/testdata/select, line 821 at r2 (raw file):

           └── variable: a.k [type=int, outer=(1)]

# Ensure that EXISTS is hoisted even when one of several conjuncts.

[nit] it is one of


pkg/sql/opt/norm/testdata/select, line 827 at r2 (raw file):

semi-join
 ├── columns: k:1(int!null) i:2(int) f:3(float) s:4(string) j:5(jsonb)
 ├── scan a

What is missing for s=foo and i > 1 to be pushed down? I guess PushFilterIntoJoinLeft should be enabled on semi-join?


pkg/sql/opt/norm/testdata/select, line 918 at r2 (raw file):

 │    ├── columns: b.x:6(int!null) b.y:7(int)
 │    └── keys: (6)
 └── filters [type=bool, outer=(1,2,4,6), constraints=(/2: [/2 - ]; /4: [/'foo' - /'foo'])]

I don't think this is correct. This will for example output all rows where s <> 'foo' (it's similar to the behavior of ON with outer joins).

The filters should stay inside the left input. We could do that by adjusting the rule:

(AntiJoinApply
    (Select $input (Filters (RemoveListItem $list $exists)))
    $subquery
)

pkg/sql/opt/ops/scalar.opt, line 66 at r2 (raw file):

#     (IsNull (Variable 1))
#   )
#

[nit] maybe mention that this is equivalent to (IsNull (Subquery (Select .. )))


Comments from Reviewable

@rytaft
Copy link
Collaborator

rytaft commented Apr 23, 2018

This is exciting to see!


Reviewed 11 of 11 files at r2.
Review status: all files reviewed at latest revision, 14 unresolved discussions.


pkg/sql/opt/norm/rules/select.opt, line 178 at r2 (raw file):

# PushSelectIntoGroupBy pushes a Select condition below a GroupBy in the case
# where it does not reference any of the aggregation columns. This only works
# if this is an instance of the "scalar" GroupBy, which returns only one row,

I think you meant -- this only works if this is not an instance of the scalar GroupBy


pkg/sql/opt/norm/rules/select.opt, line 228 at r2 (raw file):


# HoistSelectNotExists extracts non-existential subqueries from Select filters,
# turning them into semi-joins. This eliminates the subquery, which is often

semi-joins -> anti-joins


pkg/sql/opt/norm/testdata/join, line 367 at r2 (raw file):

 ├── values
 │    ├── columns: column1:6(int)
 │    ├── outer: (1)

why does this only show outer=(1) instead of outer=(1, 2)?


pkg/sql/opt/norm/testdata/scalar, line 450 at r2 (raw file):

# --------------------------------------------------
opt
SELECT * FROM a WHERE EXISTS(SELECT MAX(s) FROM a GROUP BY i)

maybe also add a test for scalar group by to show that it doesn't get eliminated?


Comments from Reviewable

Now that we have support for subqueries, column references to outer
relations are possible. This commit calculates the OuterCols logical
property for the various relational operators.

Release note: None
@andy-kimball
Copy link
Contributor Author

TFTR, I've made quite a few changes in response to comments, and fixed a few additional bugs that I found while making all the changes as well.


Review status: 12 of 25 files reviewed at latest revision, 18 unresolved discussions.


pkg/sql/opt/memo/logical_props_factory.go, line 130 at r1 (raw file):

Previously, rytaft wrote…

Should this line be outside of the if statement? (Also, the if may not be needed at all -- Difference should handle it.)

This is a small optimization to avoid an allocation in the >64 cols case. We inherit the OuterCols of the input. The only time we have to modify them is if the filter is adding new outer cols (which is very unusual). Difference will allocate a map if there are >64 cols, so it seemed good to avoid that in the common case.


pkg/sql/opt/memo/logical_props_factory.go, line 155 at r1 (raw file):

Previously, rytaft wrote…

I don't think this check is necessary -- Difference will handle it.

See above comment.


pkg/sql/opt/memo/logical_props_factory.go, line 209 at r1 (raw file):

Previously, rytaft wrote…

I don't think this check is necessary -- Difference will handle it.

See above comment.


pkg/sql/opt/memo/logical_props_factory.go, line 252 at r1 (raw file):

Previously, rytaft wrote…

[nit] to make this match the logic above, you could just do

props.Relational.OuterCols = aggProps.OuterCols.Difference(inputProps.OuterCols)

Done.


pkg/sql/opt/memo/logical_props_factory.go, line 343 at r1 (raw file):

Previously, rytaft wrote…

you miss the inputProps outer cols if there are no outer cols in the limit.

They're copied from the input in that case. No need for an extra allocation in that 99% case.


pkg/sql/opt/memo/logical_props_factory.go, line 362 at r1 (raw file):

Previously, rytaft wrote…

ditto

See above comment.


pkg/sql/opt/norm/factory.go, line 187 at r2 (raw file):

Previously, RaduBerinde wrote…

[nit] add a comment saying that if the list doesn't contain the element, we panic here

Done.


pkg/sql/opt/norm/rules/join.opt, line 114 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

I believe the transformation in [3] is apply(R, select(E)) -> select(apply(R, E)). You're doing something slightly different here by tacking the selection filters onto the join condition. This is what I did in opttoy and it seemed to work ok, though it might be worthwhile to call out this difference in the comment here.

Done.


pkg/sql/opt/norm/rules/select.opt, line 178 at r2 (raw file):

Previously, rytaft wrote…

I think you meant -- this only works if this is not an instance of the scalar GroupBy

Done.


pkg/sql/opt/norm/rules/select.opt, line 228 at r2 (raw file):

Previously, rytaft wrote…

semi-joins -> anti-joins

Done.


pkg/sql/opt/norm/testdata/join, line 367 at r2 (raw file):

Previously, rytaft wrote…

why does this only show outer=(1) instead of outer=(1, 2)?

This is a bug in LogicalPropsFactory.constructValuesProps, which I just fixed. Good catch!


pkg/sql/opt/norm/testdata/scalar, line 450 at r2 (raw file):

Previously, rytaft wrote…

maybe also add a test for scalar group by to show that it doesn't get eliminated?

Done.


pkg/sql/opt/norm/testdata/select, line 821 at r2 (raw file):

Previously, RaduBerinde wrote…

[nit] it is one of

Done.


pkg/sql/opt/norm/testdata/select, line 827 at r2 (raw file):

Previously, RaduBerinde wrote…

What is missing for s=foo and i > 1 to be pushed down? I guess PushFilterIntoJoinLeft should be enabled on semi-join?

I added this case to this rule as well as the JoinRight variant. I also discovered a possible cycle that I need to detect, so there are a number of changes I made overall.


pkg/sql/opt/norm/testdata/select, line 918 at r2 (raw file):

Previously, RaduBerinde wrote…

I don't think this is correct. This will for example output all rows where s <> 'foo' (it's similar to the behavior of ON with outer joins).

The filters should stay inside the left input. We could do that by adjusting the rule:

(AntiJoinApply
    (Select $input (Filters (RemoveListItem $list $exists)))
    $subquery
)

You're right. I just pulled this rule from opttoy (where it wasn't tested) and didn't think about it enough. I modified this rule according to your suggestion and added some more tests as well.


pkg/sql/opt/ops/scalar.opt, line 66 at r2 (raw file):

Previously, RaduBerinde wrote…

[nit] maybe mention that this is equivalent to (IsNull (Subquery (Select .. )))

Done.


Comments from Reviewable

@RaduBerinde
Copy link
Member

:lgtm:


Review status: 12 of 25 files reviewed at latest revision, 12 unresolved discussions.


Comments from Reviewable

@rytaft
Copy link
Collaborator

rytaft commented Apr 24, 2018

:lgtm:


Reviewed 1 of 13 files at r3, 12 of 12 files at r4.
Review status: all files reviewed at latest revision, 3 unresolved discussions.


pkg/sql/opt/memo/logical_props_factory.go, line 130 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

This is a small optimization to avoid an allocation in the >64 cols case. We inherit the OuterCols of the input. The only time we have to modify them is if the filter is adding new outer cols (which is very unusual). Difference will allocate a map if there are >64 cols, so it seemed good to avoid that in the common case.

Oops missed the assignment above. Thanks for the explanation!


pkg/sql/opt/norm/rules/join.opt, line 54 at r4 (raw file):

# Citations: [1]
[PushFilterIntoJoinLeft, Normalize]
(InnerJoin | InnerJoinApply | RightJoin | RightJoinApply | SemiJoin | SemiJoinApply

Can you update the comment to explain why semi-joins are eligible but anti-joins are not?


Comments from Reviewable

Transform EXISTS clause in WHERE clauses into a SemiJoinApply or
AntiSemiJoinApply operator. Additional rules will attempt to decorrelate
the right operand of the Apply by pushing the Apply down through any Select
operator. Example:

  SELECT * FROM a WHERE EXISTS(SELECT * FROM b WHERE a.x=b.x)
  =>
  SELECT * FROM a SEMI JOIN APPLY (SELECT * FROM b WHERE a.x=b.x)
  =>
  SELECT * FROM a SEMI JOIN b WHERE a.x=b.x

Release note: None
@andy-kimball
Copy link
Contributor Author

bors r+

craig bot pushed a commit that referenced this pull request Apr 24, 2018
24969: opt: Hoist Exists operator and try to decorrelate r=andy-kimball a=andy-kimball

This PR has two commits for easier review. The first adds support for OuterCols as a relational property. The second uses that to implement transforming the Exists operator into a SemiJoinApply operator, and then attempting to decorrelate that.

25034: storage: bump LastHeartbeat timestamp when writing txn record r=nvanbenschoten a=nvanbenschoten

Fixes #23945.
See #20448.

This change addresses a case where delayed BeginTxn requests can result
in txn records looking inactive immediately upon being written. We now
bump the txn record's LastHeartbeat timestamp when writing the record.

Release note: None

Co-authored-by: Andrew Kimball <[email protected]>
Co-authored-by: Nathan VanBenschoten <[email protected]>
@craig
Copy link
Contributor

craig bot commented Apr 24, 2018

Build succeeded

@craig craig bot merged commit d3c6e99 into cockroachdb:master Apr 24, 2018
@andy-kimball andy-kimball deleted the exists branch June 8, 2018 02:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants