Skip to content

Commit

Permalink
colexecjoin: avoid buffering tuples from the right in merge joiner
Browse files Browse the repository at this point in the history
Depending on the join type, we don't need to fully buffer the tuples
from the right input in order to produce the output. Namely, for
set-operation joins we only need to know the number of right tuples
whereas for LEFT SEMI and RIGHT ANTI we know exactly the behavior of the
builder for the buffered group.

Release note: None
  • Loading branch information
yuzefovich committed Sep 9, 2021
1 parent 4e4cfe9 commit 8b8f12e
Showing 1 changed file with 15 additions and 8 deletions.
23 changes: 15 additions & 8 deletions pkg/sql/colexec/colexecjoin/mergejoiner.go
Original file line number Diff line number Diff line change
Expand Up @@ -571,14 +571,6 @@ func (o *mergeJoinBase) appendToRightBufferedGroup(sel []int, groupStartIdx int,
sourceTypes := o.right.sourceTypes
numBufferedTuples := o.bufferedGroup.helper.numRightTuples
o.bufferedGroup.helper.numRightTuples += groupLength
// TODO(yuzefovich): for LEFT/RIGHT ANTI joins we only need to store the
// first tuple (in order to find the boundaries of the groups) since all
// of the buffered tuples do have a match and, thus, don't contribute to
// the output.
// TODO(yuzefovich): for INTERSECT/EXCEPT ALL joins we can buffer only
// tuples from the left side and count the number of tuples on the right.
// TODO(yuzefovich): for LEFT/RIGHT SEMI joins we only need to buffer tuples
// from one side (left/right respectively).
if numBufferedTuples == 0 && groupStartIdx+groupLength == o.proberState.rLength {
// Set the right first tuple only if this is the first call to this
// method for the current right buffered group and if the group doesn't
Expand All @@ -598,6 +590,21 @@ func (o *mergeJoinBase) appendToRightBufferedGroup(sel []int, groupStartIdx int,
})
}

// TODO(yuzefovich): check whether it's worth templating this method out as
// well as having join-type-specific crossJoinerBase.
switch o.joinType {
case descpb.LeftSemiJoin, descpb.RightAntiJoin:
// For LEFT SEMI and RIGHT ANTI joins we only need to store the first
// tuple (in order to find the boundaries of the groups) since all of
// the buffered tuples don't/do have a match and, thus, do/don't
// contribute to the output.
return
case descpb.IntersectAllJoin, descpb.ExceptAllJoin:
// For INTERSECT/EXCEPT ALL joins we only need the number of tuples on
// the right side (which we have already updated above).
return
}

// We don't impose any memory limits on the scratch batch because we rely on
// the inputs to the merge joiner to produce reasonably sized batches.
const maxBatchMemSize = math.MaxInt64
Expand Down

0 comments on commit 8b8f12e

Please sign in to comment.