Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: add support for hash and merge joins in the new factory #50450

Merged
merged 2 commits into from
Jun 25, 2020

Conversation

yuzefovich
Copy link
Member

@yuzefovich yuzefovich commented Jun 21, 2020

sql: minor cleanup of joiner planning

joinNode.mergeJoinOrdering is now set to non-zero length by the
optimizer only when we can use a merge join (meaning that number of
equality columns is non-zero and equals the length of the ordering we
have). This allows us to slightly simplify the setup up of the merge
joiners.

Additionally, this commit switching to using []exec.NodeColumnOrdinal
instead of int for equality columns in joinPredicate which allows us
to remove one conversion step when planning hash joiners.

Also we introduce a small helper that will be reused by the follow-up
work.

Release note: None

sql: add support for hash and merge joins in the new factory

This commit adds implementation of ConstructHashJoin and
ConstructMergeJoin in the new factory by mostly refactoring and
reusing already existing code in the physical planner. Notably,
interleaved joins are not supported yet.

Fixes: #50291.
Addresses: #47473.

Release note: None

@yuzefovich yuzefovich added the do-not-merge bors won't merge a PR with this label. label Jun 21, 2020
@yuzefovich yuzefovich requested a review from a team June 21, 2020 01:27
@yuzefovich yuzefovich requested a review from a team as a code owner June 21, 2020 01:27
@yuzefovich yuzefovich requested review from miretskiy and removed request for a team June 21, 2020 01:27
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@yuzefovich yuzefovich removed the request for review from miretskiy June 21, 2020 01:27
@yuzefovich yuzefovich changed the title WIP sql: add support for hash and merge joins in the new factory sql: add support for hash and merge joins in the new factory Jun 22, 2020
@yuzefovich yuzefovich requested a review from asubiotto June 22, 2020 22:52
@yuzefovich
Copy link
Member Author

Only the last 3 commits should be looked at, but they are RFAL.

@yuzefovich yuzefovich removed the do-not-merge bors won't merge a PR with this label. label Jun 23, 2020
@yuzefovich
Copy link
Member Author

Rebased, RFAL.

Copy link
Contributor

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 5 of 5 files at r1, 7 of 8 files at r2, 8 of 8 files at r3, 8 of 8 files at r4, 7 of 7 files at r5.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @yuzefovich)


pkg/sql/distsql_physical_planner.go, line 2227 at r1 (raw file):

			RightEqColumnsAreKey: n.pred.rightEqKey,
		}
	} else {

Maybe add a comment that if the mergeJoinOrdering is non-zero, it must be the length of the equality columns. Perhaps we should also add an assertion to be safe.


pkg/sql/distsql_physical_planner.go, line 2123 at r2 (raw file):

type joinPlanningInfo struct {
	leftPlan, rightPlan                         *PhysicalPlan
	getCoreSpec                                 func(info *joinPlanningInfo) execinfrapb.ProcessorCoreUnion

Instead of having this be a field I think it'd be nicer for it to be a method:

func (i joinPlanningInfo) makeCoreSpec(n Ordering) ProcessorCoreUnion {
   // Check len(n) and return spec based on that and fields in `joinPlanningInfo`
}

pkg/sql/logictest/testdata/logic_test/distsql_join, line 61 at r2 (raw file):

SELECT feature_name FROM crdb_internal.feature_usage WHERE feature_name='sql.exec.query.is-distributed' AND usage_count > 0
----
sql.exec.query.is-distributed

Did this fail with the new config?


pkg/sql/logictest/testdata/logic_test/experimental_distsql_planning_5node, line 156 at r5 (raw file):

5  5

# Check that merge join is supported by the new factory.

Is it worth running expain to double check merge join is used?

`joinNode.mergeJoinOrdering` is now set to non-zero length by the
optimizer only when we can use a merge join (meaning that number of
equality columns is non-zero and equals the length of the ordering we
have). This allows us to slightly simplify the setup up of the merge
joiners.

Additionally, this commit switching to using `[]exec.NodeColumnOrdinal`
instead of `int` for equality columns in `joinPredicate` which allows us
to remove one conversion step when planning hash joiners.

Also we introduce a small helper that will be reused by the follow-up
work.

Release note: None
This commit adds implementation of `ConstructHashJoin` and
`ConstructMergeJoin` in the new factory by mostly refactoring and
reusing already existing code in the physical planner. Notably,
interleaved joins are not supported yet.

Release note: None
Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto)


pkg/sql/distsql_physical_planner.go, line 2227 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

Maybe add a comment that if the mergeJoinOrdering is non-zero, it must be the length of the equality columns. Perhaps we should also add an assertion to be safe.

When mergeJoinOrdering is non-zero, then we must be planning a merge join, the spec for which doesn't have equality columns, so I don't think it's important to sanity check that equality columns are of the same length as the merge join ordering.


pkg/sql/distsql_physical_planner.go, line 2123 at r2 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

Instead of having this be a field I think it'd be nicer for it to be a method:

func (i joinPlanningInfo) makeCoreSpec(n Ordering) ProcessorCoreUnion {
   // Check len(n) and return spec based on that and fields in `joinPlanningInfo`
}

Good point, done.


pkg/sql/logictest/testdata/logic_test/distsql_join, line 61 at r2 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

Did this fail with the new config?

Yes, because of the test deficiency - although distsql_join uses 5node configs, all the data lives on a single node, so the queries are not distributed. That's why I moved it to a file that manually distributes the data.


pkg/sql/logictest/testdata/logic_test/experimental_distsql_planning_5node, line 156 at r5 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

Is it worth running expain to double check merge join is used?

I confirmed it manually and don't think it is worth adding an explain as well. The reason is that the optimizer definitely chooses a plan with a merge join because we have indexes on the equality columns. Additionally, we currently don't support ConstructSort method in the new factory, so if other join type is used, then a sort would be planned, and the query will error out.

Also, I think eventually this logic test file will only contain the plans that are partially distributed, everything else could be removed as redundant (or moved to the other logic test files).

Copy link
Contributor

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 12 of 12 files at r6, 13 of 13 files at r7.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @yuzefovich)


pkg/sql/distsql_plan_join.go, line 58 at r7 (raw file):

	if len(info.leftMergeOrd.Columns) != len(info.rightMergeOrd.Columns) {
		panic(fmt.Sprintf(
			"unexpectedly different merge join ordering lengths: left %d, right %d",

nit: s/unexpectedly/unexpected

Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

bors r+

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @asubiotto)


pkg/sql/distsql_plan_join.go, line 58 at r7 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

nit: s/unexpectedly/unexpected

I did mean to use "unexpectedly" - to me an adverb sounds cleaner. Alternatively, it could be unexpected: different ..., but I'll keep the original wording.

@yuzefovich
Copy link
Member Author

bors r+

@craig
Copy link
Contributor

craig bot commented Jun 25, 2020

Build succeeded

@craig craig bot merged commit ae0f360 into cockroachdb:master Jun 25, 2020
@yuzefovich yuzefovich deleted the distsql-joins branch June 25, 2020 18:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql: implement ConstructHashJoin in the new factory
3 participants