Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: allow lookup joins to order on index columns #84689

Merged
merged 1 commit into from
Jul 23, 2022

Conversation

DrewKimball
Copy link
Collaborator

It is possible for lookup joins to return the results of each lookup
in the order of the lookup index. In the case when the input is ordered
on a key, preserving the input ordering and then returning looked-up rows
in index order is equivalent to performing a sort on the input ordering
with the index columns appended.

This patch teaches the optimizer that lookup joins can preserve the index
ordering. This allows the optimizer to avoid sorting in some cases, which
can significantly improve performance because sorts have to buffer all input
rows.

Fixes #84685

Release note: None

@DrewKimball DrewKimball requested a review from a team as a code owner July 20, 2022 01:20
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@DrewKimball DrewKimball requested review from rytaft and yuzefovich July 20, 2022 01:20
Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! I didn't review the optimizer changes too carefully, but the execution changes LGTM modulo one comment.

Reviewed 12 of 12 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball and @rytaft)


pkg/sql/opt/ordering/lookup_join.go line 52 at r1 (raw file):

	}
	if !canProjectCols {
		var remainingRequired props.OrderingChoice

nit: might be good to add a quick comment about the case we're handling inside of this if.


pkg/sql/opt/ordering/lookup_join.go line 261 at r1 (raw file):

	var choiceIdx, providedIdx int
	for choiceIdx < len(required.Columns) {
		if providedIdx >= len(provided) {

nit: would it be cleaner to push up this condition into the for condition?


pkg/sql/rowexec/joinreader.go line 321 at r1 (raw file):

		// Additionally, we need to disable parallelism for the traditional fetcher
		// in order to ensure the lookups are ordered, so set shouldLimitBatches.
		spec.MaintainOrdering, shouldLimitBatches = true, true

I think we also need to set jr.streamerInfo.maintainOrdering to true (as well as update the comment on it) so that the streamer uses the InOrder mode.

I would imagine that the logic tests you added in lookup_join should fail in fakedist config (because we currently would use the OutOfOrder mode), do they? Maybe if we stress them long enough they would?

Also I think it would be worthwhile to add local and fakedist logic test configs where the streamer is disabled, and then include those configs for all index join / lookup join tests (I imagine that we wouldn't include those new configs in the set of default-configs since the non-streamer code-path is still exercised enough by use cases other than index/lookup joins, so we'd have to explicitly spell the configs out in the relevant test files). That could be a part of a separate PR or a separate commit in this PR, up to you.


pkg/sql/logictest/testdata/logic_test/lookup_join line 822 at r1 (raw file):

1  1  NULL  NULL  NULL

# Test lookup join that supplies an ordering on input and lookup columns.

nit: something is off, maybe do s/lookup join that supplies/that lookup join supplies/?

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized that the execution is broken if the lookup columns have DESC ordering in the lookup index, at least with the streamer code path, but maybe in both. I think in order to preserve the ordering with DESC index we need to use the ReverseScanRequests as well as Descending scan iteration in Streamer.Enqueue, and we do neither of those things. Maybe the join reader never actually constructs ReverseScan requests - then the non-streamer code path is broken too.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball and @rytaft)

@yuzefovich
Copy link
Member

I just realized that the execution is broken if the lookup columns have DESC ordering in the lookup index, at least with the streamer code path, but maybe in both. I think in order to preserve the ordering with DESC index we need to use the ReverseScanRequests as well as Descending scan iteration in Streamer.Enqueue, and we do neither of those things. Maybe the join reader never actually constructs ReverseScan requests - then the non-streamer code path is broken too.

Actually, maybe I take that back - we can only maintain lookup ordering IFF lookup columns form a key, right? If so, then we're guaranteed that each input row results in at most one looked-up row, so it'll be trivially sorted (and the index direction doesn't matter).

Copy link
Collaborator Author

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If so, then we're guaranteed that each input row results in at most one looked-up row

We require that the ordering is a key on the input, but a given input row could have multiple lookup rows associated with it, say if the key columns aren't unique on the index or if an inequality is used.

I think in order to preserve the ordering with DESC index we need to use the ReverseScanRequests as well as Descending scan iteration in Streamer.Enqueue, and we do neither of those things.

So unless we do this, ordering can only be maintained if the index has only ASC columns? That would actually be ok for the current use case, although we'd probably want to remove the limitation at some point.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball and @rytaft)

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So unless we do this, ordering can only be maintained if the index has only ASC columns?

Yes. The changes would have to be made both in the streamer and in the join reader itself (because we currently don't ever issue ReverseScan requests for lookup joins). The changes seem relatively minor, but if the current use case can be served only by ASC columns, then I say we defer these changes until after 22.2.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball and @rytaft)

Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool! I just did an initial pass and left a few comments for now.

Reviewed 3 of 12 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball, @rytaft, and @yuzefovich)


pkg/sql/opt/ordering/lookup_join.go line 19 at r1 (raw file):

)

func lookupOrIndexJoinCanProvideOrdering(expr memo.RelExpr, required *props.OrderingChoice) bool {

nit: break this into two functions now that its grown quite a bit: lookupJoinCanProvideOrdering and indexJoinCanProvideOrdering.


pkg/sql/opt/ordering/lookup_join.go line 52 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: might be good to add a quick comment about the case we're handling inside of this if.

+1 some examples here would be helpful


pkg/sql/opt/ordering/lookup_join.go line 257 at r1 (raw file):

// safe to mutate.
func trySatisfyRequired(
	required *props.OrderingChoice, provided opt.Ordering,

Can we convert provided to an ordering choice and just use methods (Intersects, Intersection, or maybe CommonPrefix) that `OrderingChoice provides? It doesn't seem like this case shold require bespoke logic, but maybe I am missing something?


pkg/sql/opt/ordering/lookup_join.go line 268 at r1 (raw file):

		choice, providedCol := required.Columns[choiceIdx], provided[providedIdx]
		if !choice.Group.Contains(providedCol.ID()) {
			if required.Optional.Contains(providedCol.ID()) {

nit: move this if outside/above the if !choice.Group.Contains(providedCol.ID()) {}. optional column should never appear in Columns:

// ordering. Columns in Optional must not appear in the Columns sequence.

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

Reviewed 12 of 12 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball and @yuzefovich)


pkg/sql/opt/ordering/lookup_join.go line 53 at r1 (raw file):

	if !canProjectCols {
		var remainingRequired props.OrderingChoice
		inputOrdCols := required.Optional.Copy()

Feels like this variable could use a better name. Something like canProjectPrefix perhaps?


pkg/sql/opt/xform/testdata/physprops/ordering line 2785 at r1 (raw file):

      └── filters (true)

# Cannot supply the requested ordering because the direction of the 'w' column

Where is w? Do you mean c?


pkg/sql/opt/ordering/lookup_join_test.go line 216 at r1 (raw file):

			canProvide: true,
		},
		{ // Case 2: the ordering cannot project only input columns, but the lookup

Seems like you could project +5, +6, no?


pkg/sql/opt/ordering/lookup_join_test.go line 281 at r1 (raw file):

			canProvide: false,
		},
	}

can you add a test case where canProvide is true with the secondaryIndex?

Copy link
Collaborator Author

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick feedback!

if the current use case can be served only by ASC columns, then I say we defer these changes until after 22.2

SGTM. I've modified the logic to only admit cases where all index columns are ASC, and added tests.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball, @mgartner, and @yuzefovich)


pkg/sql/opt/ordering/lookup_join.go line 19 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: break this into two functions now that its grown quite a bit: lookupJoinCanProvideOrdering and indexJoinCanProvideOrdering.

Done.


pkg/sql/opt/ordering/lookup_join.go line 52 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: might be good to add a quick comment about the case we're handling inside of this if.

Done.


pkg/sql/opt/ordering/lookup_join.go line 257 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Can we convert provided to an ordering choice and just use methods (Intersects, Intersection, or maybe CommonPrefix) that `OrderingChoice provides? It doesn't seem like this case shold require bespoke logic, but maybe I am missing something?

CommonPrefix is close, but we also need to get the postfix of the required ordering that isn't satisfied by the provided ordering, which would require additional logic anyway. Also, we can avoid allocating this way by just reslicing the arguments. That being said, I'm happy to make the change if you don't think those are sufficient reason.


pkg/sql/opt/ordering/lookup_join.go line 261 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: would it be cleaner to push up this condition into the for condition?

Good idea. Done.


pkg/sql/opt/ordering/lookup_join.go line 268 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: move this if outside/above the if !choice.Group.Contains(providedCol.ID()) {}. optional column should never appear in Columns:

// ordering. Columns in Optional must not appear in the Columns sequence.

Done.


pkg/sql/rowexec/joinreader.go line 321 at r1 (raw file):
Ah, good catch. i forgot to check this after rebasing. Done.

They didn't fail for me even under stress, but maybe I was testing on a stale build (before rebasing).

Also I think it would be worthwhile to add local and fakedist logic test configs where the streamer is disabled

I think I'll keep that separate, since it doesn't feel too closely related.


pkg/sql/logictest/testdata/logic_test/lookup_join line 822 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: something is off, maybe do s/lookup join that supplies/that lookup join supplies/?

Changed it.

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Execution :lgtm:

Reviewed 9 of 9 files at r2, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball and @mgartner)


-- commits line 10 at r2:
nit: mention the limitation of DESC columns in the commit message.


pkg/sql/opt/ordering/lookup_join.go line 60 at r2 (raw file):

		// input columns that form a key over the input, followed by the index
		// columns in index order. Due to implementation details, currently the
		// index can only contain columns sorted in ascending order. The following

I think we could loosen up this restriction a bit to allow DESC columns when they are not required to provide the ordering. Adjusting your example from below to

		//   CREATE TABLE ab (a INT, b INT, PRIMARY KEY(a, b));
		//   CREATE TABLE xyz (x INT, y INT, z INT, PRIMARY KEY(x, y, z DESC));
		//   SELECT * FROM ab INNER LOOKUP JOIN xyz ON a = x ORDER BY a, b, x, y;

the lookup join can still preserve the ordering on x and y regardless of the ordering direction on column z. But it does seem like an edge case, so up to you whether it's worth extending the support.


pkg/sql/rowexec/joinreader.go line 490 at r2 (raw file):

		// The index joiner can rely on the streamer to maintain the input ordering,
		// but the lookup joiner currently handles this logic itself, so the
		// streamer can operator in OutOfOrder mode. The exception is when the

nit: s/can operator/can operate/. It might also be good to add a similar mention on the definition of jr.streamerInfo.maintainOrdering variable too.

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 9 of 9 files at r2, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball and @mgartner)


-- commits line 17 at r2:
Seems like this deserves a release note


pkg/sql/opt/ordering/lookup_join.go line 73 at r2 (raw file):

		// example can supply the ordering itself. On the other hand, switching 'b'
		// and 'y' in the ordering or removing 'b' would mean the query would
		// require a sort.

Nice comment!


pkg/sql/opt/ordering/lookup_join.go line 290 at r2 (raw file):

	required *props.OrderingChoice, provided opt.Ordering,
) (prefix opt.Ordering, toExtend *props.OrderingChoice) {
	var choiceIdx, providedIdx int

nit: instead of choiceIdx I think requiredIdx would be clearer


pkg/sql/opt/ordering/lookup_join.go line 292 at r2 (raw file):

	var choiceIdx, providedIdx int
	for choiceIdx < len(required.Columns) && providedIdx < len(provided) {
		choice, providedCol := required.Columns[choiceIdx], provided[providedIdx]

nit: I'd change choice to requiredCol

Copy link
Collaborator Author

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @mgartner, @rytaft, and @yuzefovich)


-- commits line 10 at r2:

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: mention the limitation of DESC columns in the commit message.

Done.


-- commits line 17 at r2:

Previously, rytaft (Rebecca Taft) wrote…

Seems like this deserves a release note

Done.


pkg/sql/opt/ordering/lookup_join.go line 53 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Feels like this variable could use a better name. Something like canProjectPrefix perhaps?

I changed to canProjectOnlyInputCols - lmk if you think something else makes more sense.


pkg/sql/opt/ordering/lookup_join.go line 60 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

I think we could loosen up this restriction a bit to allow DESC columns when they are not required to provide the ordering. Adjusting your example from below to

		//   CREATE TABLE ab (a INT, b INT, PRIMARY KEY(a, b));
		//   CREATE TABLE xyz (x INT, y INT, z INT, PRIMARY KEY(x, y, z DESC));
		//   SELECT * FROM ab INNER LOOKUP JOIN xyz ON a = x ORDER BY a, b, x, y;

the lookup join can still preserve the ordering on x and y regardless of the ordering direction on column z. But it does seem like an edge case, so up to you whether it's worth extending the support.

Good point, I think you're right. It shouldn't add much complication and will allow us to handle more cases, so I'll make the change.


pkg/sql/opt/ordering/lookup_join.go line 73 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Nice comment!

Thanks!


pkg/sql/opt/ordering/lookup_join.go line 290 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

nit: instead of choiceIdx I think requiredIdx would be clearer

Done.


pkg/sql/opt/ordering/lookup_join.go line 292 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

nit: I'd change choice to requiredCol

Done.


pkg/sql/rowexec/joinreader.go line 490 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: s/can operator/can operate/. It might also be good to add a similar mention on the definition of jr.streamerInfo.maintainOrdering variable too.

Done.


pkg/sql/opt/xform/testdata/physprops/ordering line 2785 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Where is w? Do you mean c?

Yes, forgot to change that. Done.


pkg/sql/opt/ordering/lookup_join_test.go line 216 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Seems like you could project +5, +6, no?

You're right. Changed it to be another case where the ordering can project input columns.


pkg/sql/opt/ordering/lookup_join_test.go line 281 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

can you add a test case where canProvide is true with the secondaryIndex?

Done.

Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: But I'd wait for a final stamp from Becca.

I'd like for there to be a session setting that can disable this feature (you can add in a future PR, not necessarily here). Would it be as simple avoid the newly added block in lookupJoinCanProvideOrdering depending on the session setting?

Reviewed 4 of 12 files at r1, 2 of 9 files at r2, 7 of 7 files at r3, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball, @rytaft, and @yuzefovich)


pkg/sql/opt/ordering/lookup_join.go line 53 at r1 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

I changed to canProjectOnlyInputCols - lmk if you think something else makes more sense.

Maybe canProjectWithOnlyInputCols or canProjectUsingOnlyInputCols would be better?


pkg/sql/opt/ordering/lookup_join.go line 257 at r1 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

CommonPrefix is close, but we also need to get the postfix of the required ordering that isn't satisfied by the provided ordering, which would require additional logic anyway. Also, we can avoid allocating this way by just reslicing the arguments. That being said, I'm happy to make the change if you don't think those are sufficient reason.

I see, thanks for the explanation. No need to change anything.


pkg/sql/opt/xform/testdata/physprops/ordering line 2736 at r3 (raw file):

 └── filters (true)

nit: remove extra newline

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 7 of 7 files at r3, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball, @mgartner, and @yuzefovich)


pkg/sql/distsql_physical_planner.go line 2347 at r3 (raw file):

				// descending columns.
				panic(errors.AssertionFailedf("ordering on a lookup index with descending columns"))
			}

nit: you can put the break back here


pkg/sql/opt/ordering/lookup_join.go line 53 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Maybe canProjectWithOnlyInputCols or canProjectUsingOnlyInputCols would be better?

I like the change you made with canProjectOnlyInputCols, but I was actually suggesting renaming inputOrdCols here


pkg/sql/opt/ordering/lookup_join.go line 110 at r3 (raw file):

			if !requiredCols.Contains(idxColID) {
				// This index column is not required.
				continue

shouldn't we break in this case? Or only continue if the column is also optional?

@DrewKimball DrewKimball force-pushed the order branch 2 times, most recently from fee51ab to 3570fea Compare July 21, 2022 23:59
Copy link
Collaborator Author

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like for there to be a session setting that can disable this feature

That's a good idea. We'd also have to avoid the new logic in lookupJoinBuildProvided since it isn't lookupJoinCanProvideOrdering isn't always called first, but it should be easy enough.

I've performed some refactoring since there was some shared logic between lookupJoinBuildProvided and lookupJoinCanProvideOrdering, so that could use another look.

I also changed the code in distsql_physical_planner.go to use plan.GetResultTypes() instead of plan.ResultColumns in order to decide whether maintainLookupOrdering should be true, since it turns out that ResultColumns can be empty in some cases even when input columns are used. Does anyone have experience with that part of the codebase?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @mgartner, @rytaft, and @yuzefovich)


pkg/sql/distsql_physical_planner.go line 2347 at r3 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

nit: you can put the break back here

I wanted to run the validation for each ordering column, but if you don't think the check is necessary we could just break there instead. Thoughts?


pkg/sql/opt/ordering/lookup_join.go line 53 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

I like the change you made with canProjectOnlyInputCols, but I was actually suggesting renaming inputOrdCols here

Changed to canProjectUsingOnlyInputCols and canProjectPrefixCols respectively.


pkg/sql/opt/ordering/lookup_join.go line 110 at r3 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

shouldn't we break in this case? Or only continue if the column is also optional?

Thanks for catching that. We should break when the index column isn't in the required ordering at all, and continue if the column is optional.


pkg/sql/opt/xform/testdata/physprops/ordering line 2736 at r3 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: remove extra newline

Done.

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does anyone have experience with that part of the codebase?

That would be me, but I haven't touched those parts in like two years 😅

On a quick glance, plan.ResultColumns is only used during the execbuilding by the new DistSQL spec exec factory whereas plan.GetResultTypes() is used by the old distsql physical planning code. I think it makes sense that you'd use the latter since you're modifying the old distsql physical planner.

Reviewed 3 of 7 files at r3, 1 of 1 files at r4, 3 of 3 files at r5, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @rytaft)

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Very nice!

Reviewed 1 of 1 files at r4, 3 of 3 files at r5, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 2 stale) (waiting on @DrewKimball)


pkg/sql/distsql_physical_planner.go line 2347 at r3 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

I wanted to run the validation for each ordering column, but if you don't think the check is necessary we could just break there instead. Thoughts?

Oh ok makes sense


pkg/sql/opt/ordering/lookup_join.go line 71 at r5 (raw file):

				// columns that cannot be satisfied by an ordering on the input.
				remainingRequired.Columns = required.Columns[i:]
				remainingRequired.Optional = required.Optional.Copy()

should this be canProjectPrefixCols instead of required.Optional.Copy()?

Edit: I see that you're effectively doing this on line 244. I think you could do it here and save an allocation


pkg/sql/opt/ordering/lookup_join.go line 205 at r5 (raw file):

//
// It is possible for a lookup join to supply an ordering that references index
// columns if ordering consists of a series of input columns that form a key

nit: if the ordering

It is possible for lookup joins to return the results of each lookup
in the order of the lookup index. In the case when the input is ordered
on a key, preserving the input ordering and then returning looked-up rows
in index order is equivalent to performing a sort on the input ordering
with the index columns appended.

This patch teaches the optimizer that lookup joins can preserve the index
ordering. This allows the optimizer to avoid sorting in some cases, which
can significantly improve performance because sorts have to buffer all input
rows. Due to implementation details of the lookup join, order can only be
preserved when none of the index columns involved in the ordering are
sorted in descending order.

Fixes cockroachdb#84685

Release note (performance improvement): The optimizer can now return the
results of a join in sorted order in more cases. This can allow the
optimizer to avoid expensive sorts that need to buffer all input rows.
Copy link
Collaborator Author

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTRs!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 3 stale) (waiting on @rytaft and @yuzefovich)


pkg/sql/opt/ordering/lookup_join.go line 71 at r5 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

should this be canProjectPrefixCols instead of required.Optional.Copy()?

Edit: I see that you're effectively doing this on line 244. I think you could do it here and save an allocation

Good point, there's no need to change the optional columns field, or to union it with the input ordering columns in getLookupOrdCols. Done.


pkg/sql/opt/ordering/lookup_join.go line 205 at r5 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

nit: if the ordering

Done.

Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 1 files at r4, 2 of 3 files at r5, 1 of 1 files at r6, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 2 stale) (waiting on @rytaft)

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r6, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 2 stale) (waiting on @DrewKimball)

@DrewKimball
Copy link
Collaborator Author

Bazel failure looks like a flake.

@DrewKimball
Copy link
Collaborator Author

bors r+

@craig
Copy link
Contributor

craig bot commented Jul 23, 2022

Build succeeded:

@craig craig bot merged commit f1167d5 into cockroachdb:master Jul 23, 2022
@DrewKimball DrewKimball deleted the order branch July 23, 2022 00:36
DrewKimball added a commit to DrewKimball/cockroach that referenced this pull request Dec 15, 2022
This patch fixes an oversight of cockroachdb#84689 that prevented lookup joins
from maintaining the index ordering for each lookup if the index ordering
contained descending columns. The execution logic will respect descending
index columns as-is, so only the optimizer code needed to be changed.
This will allow plans with lookup joins to avoid sorts in more cases.

Fixes cockroachdb#88319

Release note (performance improvement): The optimizer can now avoid
planning a sort in more cases with joins that perform lookups into an
index with one or more columns sorted in descending order. This can
significantly decrease the number of rows that have to be scanned in
order to satisfy a `LIMIT` clause.
DrewKimball added a commit to DrewKimball/cockroach that referenced this pull request Dec 16, 2022
This patch fixes an oversight of cockroachdb#84689 that prevented lookup joins
from maintaining the index ordering for each lookup if the index ordering
contained descending columns. The execution logic will respect descending
index columns as-is, so only the optimizer code needed to be changed.
This will allow plans with lookup joins to avoid sorts in more cases.

Fixes cockroachdb#88319

Release note (performance improvement): The optimizer can now avoid
planning a sort in more cases with joins that perform lookups into an
index with one or more columns sorted in descending order. This can
significantly decrease the number of rows that have to be scanned in
order to satisfy a `LIMIT` clause.
craig bot pushed a commit that referenced this pull request Dec 16, 2022
93483: sql: introduce hash group-join operator r=yuzefovich a=yuzefovich

This PR introduces the hash group-join operator (which combines a hash join followed by a hash aggregation when the join's equality columns are the same as aggregation's grouping columns into a single operator) to the execution engines . The optimizer is currently unaware of this new operator - the changes are plumbed only from the DistSQL physical planning. Naive implementations (which simply use a hash joiner followed by a hash aggregator) are introduced to both engines with the proper disk-spilling. The usage of this new operator is gated behind an experimental session variable.

See each commit for details.

Addresses: #38307.

Epic: None

93513: sql: add voting_replicas, non_voting_replicas columns to SHOW RANGES r=arulajmani a=ecwall

Fixes #93508

Some of the multitenant admin functions accept VOTERS, NONVOTERS as input.

Add voting_replicas, non_voting_replicas columns to SHOW RANGE(S) to make working with the admin functions easier.

Release note (sql change): Add voting_replicas, non_voting_replicas columns to output of SHOW RANGE(S) statements.

93673: opt: allow lookup joins to preserve index ordering with DESC columns r=DrewKimball a=DrewKimball

This patch fixes an oversight of #84689 that prevented lookup joins from maintaining the index ordering for each lookup if the index ordering contained descending columns. The execution logic will respect descending index columns as-is, so only the optimizer code needed to be changed. This will allow plans with lookup joins to avoid sorts in more cases.

Fixes #88319

Release note (performance improvement): The optimizer can now avoid planning a sort in more cases with joins that perform lookups into an index with one or more columns sorted in descending order. This can significantly decrease the number of rows that have to be scanned in order to satisfy a `LIMIT` clause.

Co-authored-by: Yahor Yuzefovich <[email protected]>
Co-authored-by: Evan Wall <[email protected]>
Co-authored-by: Drew Kimball <[email protected]>
blathers-crl bot pushed a commit that referenced this pull request Dec 16, 2022
This patch fixes an oversight of #84689 that prevented lookup joins
from maintaining the index ordering for each lookup if the index ordering
contained descending columns. The execution logic will respect descending
index columns as-is, so only the optimizer code needed to be changed.
This will allow plans with lookup joins to avoid sorts in more cases.

Fixes #88319

Release note (performance improvement): The optimizer can now avoid
planning a sort in more cases with joins that perform lookups into an
index with one or more columns sorted in descending order. This can
significantly decrease the number of rows that have to be scanned in
order to satisfy a `LIMIT` clause.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

opt: take advantage of index ordering for lookup joins
5 participants