-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
colexec: add vectorized cross joiner #58280
Conversation
5528f1d
to
8e29bbb
Compare
8802604
to
96249ce
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will take a look at this once the spilling queue PR merges.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto)
96249ce
to
c146caf
Compare
Rebased on top of the spilling queue fix, RFAL. |
0c96b91
to
d438891
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 7 of 7 files at r1, 1 of 1 files at r2, 21 of 21 files at r3, 5 of 9 files at r4.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto and @yuzefovich)
pkg/sql/catalog/descpb/join_type.go, line 123 at r1 (raw file):
outputTypes := make([]*types.T, 0, numOutputTypes) if j.ShouldIncludeLeftColsInOutput() { outputTypes = append(outputTypes, left...)
nit: why not put these lines into the conditionals above to avoid having to check both conditions twice? You can give outputTypes
a len(left)+len(right)
capacity.
pkg/sql/colexec/crossjoiner.go, line 29 at r3 (raw file):
// TODO(yuzefovich): use two separate unlimited allocators and give each // spillingQueue a half of the memory limit.
IIUC, I think we should do this in this PR to limit up to the work mem. It shouldn't be too hard, right?
pkg/sql/colexec/colbuilder/execplan.go, line 898 at r4 (raw file):
copy(rightTypes, spec.Input[1].ColumnTypes) if len(core.HashJoiner.LeftEqColumns) == 0 {
Is it unnecessary to check RightEqColumns?
Release note: None
This commit adjust the merge joiner benchmark to run faster (from 145s to 54s on my laptop) by removing the smallest row options and reducing the number of options from 5 to 4. Release note: None
This commit extracts the logic of building from the buffered groups in the merge joiner into a cross joiner base struct which will allow us to reuse it for implementing the disk-backed vectorized cross joiner. The logical changes were kept to minimum. Release note: None
This commit adds the vectorized cross joiner with the core building logic being reused from the base struct extracted from the merge joiner. The new operator simply needs to consume the inputs to set up the base struct while also paying attention to possibly unmatched tuples (this was never the case for the buffered group in the merge joiner). It also optimizes the behavior in how the inputs are consumed depending on the join type. Release note: None
This commit optimizes the cross joiner a bit by taking advantage that for some join types we need to repeat left tuples only once. It additionally cleans up the logic around exiting from the build methods so that we don't run into a situation where `toAppend` is 0 (which is pointless). Release note: None
d438891
to
f535dbe
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto)
pkg/sql/catalog/descpb/join_type.go, line 123 at r1 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
nit: why not put these lines into the conditionals above to avoid having to check both conditions twice? You can give
outputTypes
alen(left)+len(right)
capacity.
I agree that this looks a bit ugly, but I'm guessing this approach should be more performant overall due to not having wasteful allocations. I also checked that the ShouldInclude*
methods get inlined, so I'd prefer to keep as is, but let me know if you feel strongly otherwise.
pkg/sql/colexec/crossjoiner.go, line 29 at r3 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
IIUC, I think we should do this in this PR to limit up to the work mem. It shouldn't be too hard, right?
Note that the workmem limit will be respected, but the current behavior is somewhat unfortunate in some cases - it is possible that the spilling queue for the left side will use up all of the memory, and then the right spilling queue will have to spill to disk right away. But the optimal behavior is probably giving larger limit to the right queue since it might be read multiple times.
Anyway, I think it is ok to keep it as a TODO since changing the merge joiner to have two separate unlimited allocators seems a bit invasive.
pkg/sql/colexec/colbuilder/execplan.go, line 898 at r4 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
Is it unnecessary to check RightEqColumns?
We must have the same number of equality columns on both sides, so it is not necessary. If somehow we have no left eq columns but have some right eq columns, that indicates that something really bad happened during the physical planning, but I don't think it's worth adding an assertion for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 29 of 29 files at r5, 1 of 1 files at r6, 21 of 21 files at r7, 9 of 9 files at r8, 2 of 2 files at r9.
Reviewable status: complete! 1 of 0 LGTMs obtained
TFTR! bors r+ |
Build succeeded: |
descpb: add helper method for computing output types of joins
Release note: None
colexec: shorten merge joiner benchmark
This commit adjust the merge joiner benchmark to run faster (from 145s
to 54s on my laptop) by removing the smallest row options and reducing
the number of options from 5 to 4.
Release note: None
colexec: extract cross joiner base from the merge joiner
This commit extracts the logic of building from the buffered groups in
the merge joiner into a cross joiner base struct which will allow us to
reuse it for implementing the disk-backed vectorized cross joiner. The
logical changes were kept to minimum.
Release note: None
colexec: add vectorized cross joiner
This commit adds the vectorized cross joiner with the core building
logic being reused from the base struct extracted from the merge joiner.
The new operator simply needs to consume the inputs to set up the base
struct while also paying attention to possibly unmatched tuples (this
was never the case for the buffered group in the merge joiner). It also
optimizes the behavior in how the inputs are consumed depending on the
join type.
Fixes: #46205
Release note: None
colexec: optimize cross joiner for some join types
This commit optimizes the cross joiner a bit by taking advantage that
for some join types we need to repeat left tuples only once. It
additionally cleans up the logic around exiting from the build methods
so that we don't run into a situation where
toAppend
is 0 (which ispointless).
Release note: None