Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SMJ: Add more tests and improve comments #10784

Merged
merged 4 commits into from
Jun 5, 2024
Merged

Conversation

comphead
Copy link
Contributor

@comphead comphead commented Jun 3, 2024

Which issue does this PR close?

A followup on #10724 (comment)
Related to #9846

Rationale for this change

Addressing comments from PR review, improving the comments and adding more test cases

What changes are included in this PR?

Are these changes tested?

Yes

Are there any user-facing changes?

No

@comphead comphead requested a review from viirya June 3, 2024 23:27
@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Jun 3, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like an improvement to me -- thank you @comphead

@comphead comphead merged commit df5dab7 into apache:main Jun 5, 2024
23 checks passed
if (i < streamed_indices_length - 1
&& streamed_indices.value(i) != streamed_indices.value(i + 1))
|| (i == streamed_indices_length - 1
&& *scanning_buffered_batch_idx == buffered_batches_len - 1)
&& *scanning_buffered_offset == 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, why you use scanning_buffered_offset == 0 as condition? It is set to zero when moving to next buffered batch or all buffered batches are scanned for current streamed batch. It doesn't mean anything to the position of streamed indices for the streamed batch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @viirya for the review. We exactly need to check the condition where all buffered batches are scanned for current streamed batch. This is because LeftAnti doesnt know if it matches or not until the very last buffered row comes in. This scenario already tested in slt file

query II
select * from (
with
t1 as (
    select 11 a, 12 b),
t2 as (
    select 11 a, 12 c union all
    select 11 a, 11 c union all
    select 11 a, 15 c
    )
select t1.* from t1 where not exists (select 1 from t2 where t2.a = t1.a and t1.b > t2.c)
) order by 1, 2
----

it works for small and large batches.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We exactly need to check the condition where all buffered batches are scanned for current streamed batch.

This is correct, but what I meant is that scanning_buffered_offset == 0 condition could also be true when it moves to next buffered batch and output size is equal to batch size. When it happens, SMJ also goes to output batches, but obviously not all buffered batches are scanned for current row in the streamed batch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll check that

findepi pushed a commit to findepi/datafusion that referenced this pull request Jul 16, 2024
* SMJ: Add more tests and improve comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants