-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SMJ: Add more tests and improve comments #10784
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like an improvement to me -- thank you @comphead
if (i < streamed_indices_length - 1 | ||
&& streamed_indices.value(i) != streamed_indices.value(i + 1)) | ||
|| (i == streamed_indices_length - 1 | ||
&& *scanning_buffered_batch_idx == buffered_batches_len - 1) | ||
&& *scanning_buffered_offset == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, why you use scanning_buffered_offset == 0
as condition? It is set to zero when moving to next buffered batch or all buffered batches are scanned for current streamed batch. It doesn't mean anything to the position of streamed indices for the streamed batch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @viirya for the review. We exactly need to check the condition where all buffered batches are scanned for current streamed batch. This is because LeftAnti doesnt know if it matches or not until the very last buffered row comes in. This scenario already tested in slt file
query II
select * from (
with
t1 as (
select 11 a, 12 b),
t2 as (
select 11 a, 12 c union all
select 11 a, 11 c union all
select 11 a, 15 c
)
select t1.* from t1 where not exists (select 1 from t2 where t2.a = t1.a and t1.b > t2.c)
) order by 1, 2
----
it works for small and large batches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We exactly need to check the condition where all buffered batches are scanned for current streamed batch.
This is correct, but what I meant is that scanning_buffered_offset == 0
condition could also be true when it moves to next buffered batch and output size is equal to batch size. When it happens, SMJ also goes to output batches, but obviously not all buffered batches are scanned for current row in the streamed batch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll check that
* SMJ: Add more tests and improve comments
Which issue does this PR close?
A followup on #10724 (comment)
Related to #9846
Rationale for this change
Addressing comments from PR review, improving the comments and adding more test cases
What changes are included in this PR?
Are these changes tested?
Yes
Are there any user-facing changes?
No