Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: do not plan unnecessary paired semi- and anti- lookup joins #88491

Merged
merged 1 commit into from
Sep 27, 2022

Conversation

mgartner
Copy link
Collaborator

@mgartner mgartner commented Sep 22, 2022

This commit fixes an issue where the optimizer would plan a paired semi
or anti lookup join in cases when a single lookup join would suffice.
This only occurred in rare cases when the join filter contained a
tautology or contradiction that could not be normalized to true or false
in the canonical query plan, but could be eliminated from the filters
when building a lookup join. If the tautology or contradiction
referenced a column not covered by the lookup index, the optimizer
mistakenly assumed that the index was not covering and planned a paired
join. Now the optimizer can recognize that the index is actually
covering, because the referenced column is not needed to evaluate the
filters, and a single lookup join is planned.

Fixes #87306

Release note (performance improvement): The optimizer now explores plans
with a single lookup join expressions in rare cases where it previously
planned two lookup join expressions.

@mgartner mgartner requested review from rytaft and michae2 September 22, 2022 18:23
@mgartner mgartner requested a review from a team as a code owner September 22, 2022 18:23
@cockroach-teamcity
Copy link
Member

This change is Reviewable

// anti-joins because they never include columns from the right side
// in their output columns. Other joins include columns from the
// right side in their output columns, so even if the ON filters no
// longer reference an un-covered column, they must be fetched (case
// 2, see function comment).
filterColsFromRight := rightCols.Intersection(onFilters.OuterCols())
remainingFilterCols := rightKeyCols.Union(lookupJoin.On.OuterCols())
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think I might need to include outer columns of the lookup expression here too...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is right... only the ON condition should need to see the right-side rows, while the lookup condition is only used to generate spans.

Copy link
Collaborator

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea makes sense, but your comment below is undermining my confidence a bit haha 🙂

Reviewed 2 of 3 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @rytaft)


pkg/sql/opt/xform/join_funcs.go line 544 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Hmm, I think I might need to include outer columns of the lookup expression here too...

I'm having trouble understanding the various columnsets in play here, and this comment is making me a little scared 🙂

  • Is rightKeyCols the columns from the join conditions that are part of the lookup index key?
  • What are the outer columns of the lookup expression? Are those different from lookupJoin.On.OuterCols()?
  • What is onFilters.OuterCols() and how is it different from lookupJoin.On.OuterCols()?

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 3 of 3 files at r1, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball, @mgartner, and @michae2)


pkg/sql/opt/xform/join_funcs.go line 538 at r1 (raw file):

			// the secondary index not covering y.
			//
			// Note that these are special cases that only works for semi- and

nit: only works -> only work


pkg/sql/opt/xform/join_funcs.go line 544 at r1 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

I think this is right... only the ON condition should need to see the right-side rows, while the lookup condition is only used to generate spans.

You have this comment above RightSideCols:

	// RightSideCols is an ordered list of prefix index columns that are
	// constrained by this constraint. It corresponds 1:1 with the columns in
	// KeyCols if KeyCols is non-nil. Otherwise, it includes the prefix of index
	// columns constrained by LookupExpr.

Based on the last sentence there, I think this code is correct as-is (what @DrewKimball said).

This commit fixes an issue where the optimizer would plan a paired semi
or anti lookup join in cases when a single lookup join would suffice.
This only occurred in rare cases when the join filter contained a
tautology or contradiction that could not be normalized to true or false
in the canonical query plan, but could be eliminated from the filters
when building a lookup join. If the tautology or contradiction
referenced a column not covered by the lookup index, the optimizer
mistakenly assumed that the index was not covering and planned a paired
join. Now the optimizer can recognize that the index is actually
covering, because the referenced column is not needed to evaluate the
filters, and a single lookup join is planned.

Fixes cockroachdb#87306

Release note (performance improvement): The optimizer now explores plans
with a single lookup join expressions in rare cases where it previously
planned two lookup join expressions.
@mgartner mgartner force-pushed the 87306-paired-semi-join branch from 27302db to 7af0002 Compare September 27, 2022 17:29
Copy link
Collaborator Author

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball, @michae2, and @rytaft)


pkg/sql/opt/xform/join_funcs.go line 538 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

nit: only works -> only work

Done.


pkg/sql/opt/xform/join_funcs.go line 544 at r1 (raw file):
Ok good, I agree that this looks fine as-is.

Is rightKeyCols the columns from the join conditions that are part of the lookup index key?

rightKeyCols contains the columns in the lookup index that are constrained by either the key columns or the lookup expression.

What are the outer columns of the lookup expression? Are those different from lookupJoin.On.OuterCols()?

Outer columns of the lookup expression all come from the left side of the join. As Drew pointed out, the column values of each input row are substituted in the lookup expression to generate lookup spans.

What is onFilters.OuterCols() and how is it different from lookupJoin.On.OuterCols()?

This one is tricky. There's 3 versions of the on filters, each one slightly different:

  1. on - These are the original ON filters passed as an argument to the function.
  2. onFilters - These are the same as on unless the lookup index is a partial index. In that case, onFilters may be a subset of on which has been reduced because they match the partial index predicate. For example, if the original ON condition is ON t1.id = t2.id AND t2.deleted = false and there exists INDEX t2(id) WHERE deleted = false, then on would be t1.id = t2.id AND t2.deleted = false and onFilters would be t1.id = t2.id. The t2.deleted = false no longer needs to be applied explicitly because a lookup into the partial index implicitly applies that filter.
  3. lookupJoin.On - These are the remaining filters after the lookup constraint has been built. They are the same as lookupConstraint.RemainingFilters:
	// RemainingFilters contains explicit ON filters that are not represented by
	// KeyCols or LookupExpr. These filters must be included as ON filters in
	// the lookup join.

In order words, these filters could not be part of the lookup spans themselves, so they must be applied after the lookup is performed.

@mgartner
Copy link
Collaborator Author

It might be good to backport this into v22.2.1 so I've added the label.

@mgartner
Copy link
Collaborator Author

TFTRs!

bors r+

@craig
Copy link
Contributor

craig bot commented Sep 27, 2022

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql: internal error: lookup join with no lookup columns
5 participants