-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: add support for hash and merge joins in the new factory #50450
Conversation
da994bc
to
2e5962a
Compare
2e5962a
to
83516ea
Compare
Only the last 3 commits should be looked at, but they are RFAL. |
83516ea
to
bbbab13
Compare
Rebased, RFAL. |
bbbab13
to
4b9eea5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 5 files at r1, 7 of 8 files at r2, 8 of 8 files at r3, 8 of 8 files at r4, 7 of 7 files at r5.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @yuzefovich)
pkg/sql/distsql_physical_planner.go, line 2227 at r1 (raw file):
RightEqColumnsAreKey: n.pred.rightEqKey, } } else {
Maybe add a comment that if the mergeJoinOrdering
is non-zero, it must be the length of the equality columns. Perhaps we should also add an assertion to be safe.
pkg/sql/distsql_physical_planner.go, line 2123 at r2 (raw file):
type joinPlanningInfo struct { leftPlan, rightPlan *PhysicalPlan getCoreSpec func(info *joinPlanningInfo) execinfrapb.ProcessorCoreUnion
Instead of having this be a field I think it'd be nicer for it to be a method:
func (i joinPlanningInfo) makeCoreSpec(n Ordering) ProcessorCoreUnion {
// Check len(n) and return spec based on that and fields in `joinPlanningInfo`
}
pkg/sql/logictest/testdata/logic_test/distsql_join, line 61 at r2 (raw file):
SELECT feature_name FROM crdb_internal.feature_usage WHERE feature_name='sql.exec.query.is-distributed' AND usage_count > 0 ---- sql.exec.query.is-distributed
Did this fail with the new config?
pkg/sql/logictest/testdata/logic_test/experimental_distsql_planning_5node, line 156 at r5 (raw file):
5 5 # Check that merge join is supported by the new factory.
Is it worth running expain to double check merge join is used?
`joinNode.mergeJoinOrdering` is now set to non-zero length by the optimizer only when we can use a merge join (meaning that number of equality columns is non-zero and equals the length of the ordering we have). This allows us to slightly simplify the setup up of the merge joiners. Additionally, this commit switching to using `[]exec.NodeColumnOrdinal` instead of `int` for equality columns in `joinPredicate` which allows us to remove one conversion step when planning hash joiners. Also we introduce a small helper that will be reused by the follow-up work. Release note: None
This commit adds implementation of `ConstructHashJoin` and `ConstructMergeJoin` in the new factory by mostly refactoring and reusing already existing code in the physical planner. Notably, interleaved joins are not supported yet. Release note: None
4b9eea5
to
d55f5eb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto)
pkg/sql/distsql_physical_planner.go, line 2227 at r1 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
Maybe add a comment that if the
mergeJoinOrdering
is non-zero, it must be the length of the equality columns. Perhaps we should also add an assertion to be safe.
When mergeJoinOrdering
is non-zero, then we must be planning a merge join, the spec for which doesn't have equality columns, so I don't think it's important to sanity check that equality columns are of the same length as the merge join ordering.
pkg/sql/distsql_physical_planner.go, line 2123 at r2 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
Instead of having this be a field I think it'd be nicer for it to be a method:
func (i joinPlanningInfo) makeCoreSpec(n Ordering) ProcessorCoreUnion { // Check len(n) and return spec based on that and fields in `joinPlanningInfo` }
Good point, done.
pkg/sql/logictest/testdata/logic_test/distsql_join, line 61 at r2 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
Did this fail with the new config?
Yes, because of the test deficiency - although distsql_join
uses 5node configs, all the data lives on a single node, so the queries are not distributed. That's why I moved it to a file that manually distributes the data.
pkg/sql/logictest/testdata/logic_test/experimental_distsql_planning_5node, line 156 at r5 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
Is it worth running expain to double check merge join is used?
I confirmed it manually and don't think it is worth adding an explain as well. The reason is that the optimizer definitely chooses a plan with a merge join because we have indexes on the equality columns. Additionally, we currently don't support ConstructSort
method in the new factory, so if other join type is used, then a sort would be planned, and the query will error out.
Also, I think eventually this logic test file will only contain the plans that are partially distributed, everything else could be removed as redundant (or moved to the other logic test files).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 12 of 12 files at r6, 13 of 13 files at r7.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @yuzefovich)
pkg/sql/distsql_plan_join.go, line 58 at r7 (raw file):
if len(info.leftMergeOrd.Columns) != len(info.rightMergeOrd.Columns) { panic(fmt.Sprintf( "unexpectedly different merge join ordering lengths: left %d, right %d",
nit: s/unexpectedly/unexpected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
bors r+
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @asubiotto)
pkg/sql/distsql_plan_join.go, line 58 at r7 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
nit:
s/unexpectedly/unexpected
I did mean to use "unexpectedly" - to me an adverb sounds cleaner. Alternatively, it could be unexpected: different ...
, but I'll keep the original wording.
bors r+ |
Build succeeded |
sql: minor cleanup of joiner planning
joinNode.mergeJoinOrdering
is now set to non-zero length by theoptimizer only when we can use a merge join (meaning that number of
equality columns is non-zero and equals the length of the ordering we
have). This allows us to slightly simplify the setup up of the merge
joiners.
Additionally, this commit switching to using
[]exec.NodeColumnOrdinal
instead of
int
for equality columns injoinPredicate
which allows usto remove one conversion step when planning hash joiners.
Also we introduce a small helper that will be reused by the follow-up
work.
Release note: None
sql: add support for hash and merge joins in the new factory
This commit adds implementation of
ConstructHashJoin
andConstructMergeJoin
in the new factory by mostly refactoring andreusing already existing code in the physical planner. Notably,
interleaved joins are not supported yet.
Fixes: #50291.
Addresses: #47473.
Release note: None