Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
68921: colexecjoin: optimize merge/cross joins r=yuzefovich a=yuzefovich **colexecjoin: make cross/merge join streaming with regards to left input** This commit refactors the cross and merge join to be streaming with regards to the left input. Previously, we were using two spilling queues to consume both inputs first before proceeding to building the cross product (in case of the merge join this is needed when building from the buffered group). That approach is suboptimal because buffering only one side is sufficient, so this commit switches the cross join builder to operate in a streaming fashion with regards to the left input. This is done by building all result rows that correspond to the current left batch before proceeding to the next left batch and allows us to significantly reduce amount of copying and, thus, improving the performance. Fixes: #67816. Release note: None **colexecjoin: improve probing in the merge joiner with nulls** For non set-operation joins whenever we have nulls in both columns we can advance both pointers since neither of the rows will have a match. This commit takes advantage of this observation as well as refactors (hopefully making it cleaner) the probing mechanism a bit. Release note: None **colexecjoin: avoid buffering tuples from the right in merge joiner** Depending on the join type, we don't need to fully buffer the tuples from the right input in order to produce the output. Namely, for set-operation joins we only need to know the number of right tuples whereas for LEFT SEMI and RIGHT ANTI we know exactly the behavior of the builder for the buffered group. Release note: None **colexecjoin: remove a copy when buffering the right group** Previously, before enqueueing the tuples from the right buffered group into the spiling queue we would perform a deep-copy. This is an overkill because the spilling queue itself performs the deep copy. This commit refactors the enqueueing code to modify the right batch directly to include only the tuples from the group. Release note: None Co-authored-by: Yahor Yuzefovich <[email protected]>
- Loading branch information