Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

colexec: add vectorized cross joiner #58280

Merged
merged 5 commits into from
Jan 14, 2021
Merged

Conversation

yuzefovich
Copy link
Member

@yuzefovich yuzefovich commented Dec 26, 2020

descpb: add helper method for computing output types of joins

Release note: None

colexec: shorten merge joiner benchmark

This commit adjust the merge joiner benchmark to run faster (from 145s
to 54s on my laptop) by removing the smallest row options and reducing
the number of options from 5 to 4.

Release note: None

colexec: extract cross joiner base from the merge joiner

This commit extracts the logic of building from the buffered groups in
the merge joiner into a cross joiner base struct which will allow us to
reuse it for implementing the disk-backed vectorized cross joiner. The
logical changes were kept to minimum.

Release note: None

colexec: add vectorized cross joiner

This commit adds the vectorized cross joiner with the core building
logic being reused from the base struct extracted from the merge joiner.
The new operator simply needs to consume the inputs to set up the base
struct while also paying attention to possibly unmatched tuples (this
was never the case for the buffered group in the merge joiner). It also
optimizes the behavior in how the inputs are consumed depending on the
join type.

Fixes: #46205

Release note: None

colexec: optimize cross joiner for some join types

This commit optimizes the cross joiner a bit by taking advantage that
for some join types we need to repeat left tuples only once. It
additionally cleans up the logic around exiting from the build methods
so that we don't run into a situation where toAppend is 0 (which is
pointless).

Release note: None

@yuzefovich yuzefovich added the do-not-merge bors won't merge a PR with this label. label Dec 26, 2020
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@yuzefovich yuzefovich force-pushed the cross-join branch 3 times, most recently from 5528f1d to 8e29bbb Compare December 26, 2020 20:48
@yuzefovich yuzefovich changed the title colexec: add cross joiner colexec: add vectorized cross joiner Dec 26, 2020
@yuzefovich yuzefovich force-pushed the cross-join branch 2 times, most recently from 8802604 to 96249ce Compare December 27, 2020 02:37
@yuzefovich
Copy link
Member Author

The benchmarks of the cross joiner base extraction are here and of the cross joiner optimization are here.

@yuzefovich yuzefovich requested review from asubiotto and a team December 28, 2020 19:52
Copy link
Contributor

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will take a look at this once the spilling queue PR merges.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto)

@yuzefovich yuzefovich removed the do-not-merge bors won't merge a PR with this label. label Jan 12, 2021
@yuzefovich
Copy link
Member Author

Rebased on top of the spilling queue fix, RFAL.

@yuzefovich yuzefovich force-pushed the cross-join branch 2 times, most recently from 0c96b91 to d438891 Compare January 12, 2021 21:33
Copy link
Contributor

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 7 of 7 files at r1, 1 of 1 files at r2, 21 of 21 files at r3, 5 of 9 files at r4.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto and @yuzefovich)


pkg/sql/catalog/descpb/join_type.go, line 123 at r1 (raw file):

	outputTypes := make([]*types.T, 0, numOutputTypes)
	if j.ShouldIncludeLeftColsInOutput() {
		outputTypes = append(outputTypes, left...)

nit: why not put these lines into the conditionals above to avoid having to check both conditions twice? You can give outputTypes a len(left)+len(right) capacity.


pkg/sql/colexec/crossjoiner.go, line 29 at r3 (raw file):

// TODO(yuzefovich): use two separate unlimited allocators and give each
// spillingQueue a half of the memory limit.

IIUC, I think we should do this in this PR to limit up to the work mem. It shouldn't be too hard, right?


pkg/sql/colexec/colbuilder/execplan.go, line 898 at r4 (raw file):

			copy(rightTypes, spec.Input[1].ColumnTypes)

			if len(core.HashJoiner.LeftEqColumns) == 0 {

Is it unnecessary to check RightEqColumns?

This commit adjust the merge joiner benchmark to run faster (from 145s
to 54s on my laptop) by removing the smallest row options and reducing
the number of options from 5 to 4.

Release note: None
This commit extracts the logic of building from the buffered groups in
the merge joiner into a cross joiner base struct which will allow us to
reuse it for implementing the disk-backed vectorized cross joiner. The
logical changes were kept to minimum.

Release note: None
This commit adds the vectorized cross joiner with the core building
logic being reused from the base struct extracted from the merge joiner.
The new operator simply needs to consume the inputs to set up the base
struct while also paying attention to possibly unmatched tuples (this
was never the case for the buffered group in the merge joiner). It also
optimizes the behavior in how the inputs are consumed depending on the
join type.

Release note: None
This commit optimizes the cross joiner a bit by taking advantage that
for some join types we need to repeat left tuples only once. It
additionally cleans up the logic around exiting from the build methods
so that we don't run into a situation where `toAppend` is 0 (which is
pointless).

Release note: None
Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto)


pkg/sql/catalog/descpb/join_type.go, line 123 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

nit: why not put these lines into the conditionals above to avoid having to check both conditions twice? You can give outputTypes a len(left)+len(right) capacity.

I agree that this looks a bit ugly, but I'm guessing this approach should be more performant overall due to not having wasteful allocations. I also checked that the ShouldInclude* methods get inlined, so I'd prefer to keep as is, but let me know if you feel strongly otherwise.


pkg/sql/colexec/crossjoiner.go, line 29 at r3 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

IIUC, I think we should do this in this PR to limit up to the work mem. It shouldn't be too hard, right?

Note that the workmem limit will be respected, but the current behavior is somewhat unfortunate in some cases - it is possible that the spilling queue for the left side will use up all of the memory, and then the right spilling queue will have to spill to disk right away. But the optimal behavior is probably giving larger limit to the right queue since it might be read multiple times.

Anyway, I think it is ok to keep it as a TODO since changing the merge joiner to have two separate unlimited allocators seems a bit invasive.


pkg/sql/colexec/colbuilder/execplan.go, line 898 at r4 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

Is it unnecessary to check RightEqColumns?

We must have the same number of equality columns on both sides, so it is not necessary. If somehow we have no left eq columns but have some right eq columns, that indicates that something really bad happened during the physical planning, but I don't think it's worth adding an assertion for it.

Copy link
Contributor

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 29 of 29 files at r5, 1 of 1 files at r6, 21 of 21 files at r7, 9 of 9 files at r8, 2 of 2 files at r9.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained

@yuzefovich
Copy link
Member Author

TFTR!

bors r+

@craig
Copy link
Contributor

craig bot commented Jan 14, 2021

Build succeeded:

@craig craig bot merged commit 80e6094 into cockroachdb:master Jan 14, 2021
@yuzefovich yuzefovich deleted the cross-join branch January 14, 2021 16:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

colexec: implement specialized cross join operator
3 participants