Making stream joins extensible: A new Trait implementation for SHJ #8234

metesynnada · 2023-11-16T14:01:57Z

Which issue does this PR close?

Closes #.

Rationale for this change

In our ongoing efforts to increase join use cases in our Datafusion, this PR introduces a trait: EagerJoinStream. This trait is designed to provide a more structured and efficient way to implement more join use cases in future, ensuring better maintainability and ease of use.

EagerJoinStream: This trait ensures that all join operations are evaluated eagerly, providing faster response times for scenarios where immediate results are required.

What changes are included in this PR?

This PR includes the implementation of the EagerJoinStream trait, along with necessary modifications to existing code to integrate these new traits. The changes are as follows:

Implementation of EagerJoinStream: This trait is implemented to support eager evaluation of join operations.
Code restructure: stream_hash_utils.rs introduced to separate responsibility on HJ and SHJ. It becomes easier to maintain.
Integration and Refactoring: Existing code has been refactored and integrated with these new traits to ensure seamless operation and maintain compatibility.
Proto support for SHJ: datafusion.proto file and ser/de features are updated to support SHJ in proto.

Changes are mostly code restructuring and proto implementations, instead of adding new functionality to the joins.

Are these changes tested?

Yes, comprehensive tests have been added to cover the new functionality introduced by these traits. The tests ensure the new features' correctness, performance, and reliability.

Are there any user-facing changes?

NA

alamb · 2023-11-16T16:01:25Z

I plan to review this PR, hopefully later today

ozankabak · 2023-11-16T16:25:51Z

For context: This enables downstream users (like us) to implement various join algorithms without duplicating a bunch of code every time.

alamb

Thank you @metesynnada -- this looks like a nice refactoring to me. I had some suggestions on documentation and API design but I think they could also be done as follow on PRs

I had some questions:

Do you plan to add new join implementations to DataFusion, and if so are your plans written anywhere? A trait is a nice way to keep specialized implementation in other crates as well.
Is it possible to extend "eager join" to MergeJoin? Would that even be a good idea?
I don't understand how always reading alternately from left and right inputs would work (I left a more detailed question / comment below)

datafusion/physical-plan/src/joins/stream_join_utils.rs

alamb · 2023-11-17T18:34:44Z

datafusion/physical-plan/src/joins/utils.rs

+/// | 0 | 0 | 0 | 2 | 4 | <--- hash value 10 maps to 5,4,2 (which means indices values 4,3,1)
+/// ---------------------
+/// ```
+pub struct JoinHashMap {


If we are going to move this structure anyways perhaps we can put it into its own module (e.g datafusion/physical-plan/src/joins/join_hash_map.rs or something)

It is only struct that hash join uses different from other joins, this is why I put it under utils. I can create a new folder as well.

Actually, it will be a single struct in the new file. We can split more similar structs into a new file in future.

datafusion/physical-plan/src/joins/stream_join_utils.rs

alamb · 2023-11-17T18:46:40Z

datafusion/physical-plan/src/joins/stream_join_utils.rs

+    /// # Returns
+    ///
+    /// * `Result<StreamJoinStateResult<Option<RecordBatch>>>` - The state result after pulling the batch.
+    async fn fetch_next_from_right_stream(


I am somewhat confused by this default implementation as it implies that join will always "ping pong" back and forth between fetching left and right inputs, while in realty I think the details of how the stream is implemented and how the join keys are distributed across batches could require fetching multiple batches from one (or both) inputs before progress can be made.

I am thinking of a join on a = b where all the rows in the batch have the same join key, for example:

Batch 1

a

100

100

Batch 2

a

100

100

Batch 3

a

100

200

Wouldn't the symmetric hash join have to read all three batches to find the next join key (200) before reading a batch from the other input / producing output?

I didn't see how the symmetric hash handles this case, so I must be missing something

Both this and previous implementations buffer rows on both sides, adhering to deletion criteria set by the interval library; this aspect remains unchanged.

The core proposal focuses on retaining control over the SendableRecordBatch streams instead of merging them with futures::select. Additionally, there are plans to develop more efficient yielding strategies in the future. The suggested alternation strategy is expected to advance into sophisticated load-balancing techniques. This characteristic is fundamental to the realization of these advanced features.

Both this and previous implementations buffer rows on both sides, adhering to deletion criteria set by the interval library; this aspect remains unchanged.

I agree this PR doesn't seem to change any behavior.

Additionally, there are plans to develop more efficient yielding strategies in the future. The suggested alternation strategy is expected to advance into sophisticated load-balancing techniques.

Are these plans described anywhere?

Are these plans described anywhere?

Not yet, but hopefully soon. We will continue publishing blog posts about this stuff in Datafusion and will talk about future goodies there :)

metesynnada · 2023-11-20T10:45:29Z

Thank you @metesynnada -- this looks like a nice refactoring to me. I had some suggestions on documentation and API design but I think they could also be done as follow on PRs

I had some questions:

Do you plan to add new join implementations to DataFusion, and if so are your plans written anywhere? A trait is a nice way to keep specialized implementation in other crates as well.

Is it possible to extend "eager join" to MergeJoin? Would that even be a good idea?

I don't understand how always reading alternately from left and right inputs would work (I left a more detailed question / comment below)

While we have additional joins planned for our roadmap, we haven't announced these publicly yet.
Thank you for the suggestion – it's a great idea!
To reiterate, our current implementation of streaming joins is just the beginning. The polling mechanism you see now is foundational, serving as a template for more advanced features and enhancements we plan to introduce moving forward.

ozankabak

Thanks for reviewing @alamb. We enriched the comments per your suggestions and will short open an issue to track the work on SortMergeJoin, which will be addressed in a follow-on PR.

metesynnada · 2023-11-20T11:27:47Z

SortMergeJoin: #8273

alamb

Thank you for the improved comments 🙏

alamb · 2023-11-20T14:32:50Z

datafusion/physical-plan/src/joins/stream_join_utils.rs

-/// Represents the asynchronous trait for an eager join stream.
-/// This trait defines the core methods for handling asynchronous join operations
-/// between two streams (left and right).
+/// `EagerJoinStream` is an asynchronous trait designed for managing incremental join operations


metesynnada and others added 4 commits November 16, 2023 10:02

Upstream

7f87a0b

Merge remote-tracking branch 'upstream/main' into upstream/shj-change

92548ba

Update utils.rs

505cbc1

Review

7cedb07

Name change and remove ignore on test

7ab637f

alamb approved these changes Nov 17, 2023

View reviewed changes

metesynnada and others added 2 commits November 20, 2023 13:45

Comment revisions

fe6b97e

Improve comments

a0be9c3

ozankabak approved these changes Nov 20, 2023

View reviewed changes

ozankabak merged commit 2156dde into apache:main Nov 20, 2023
22 checks passed

alamb reviewed Nov 20, 2023

View reviewed changes

alamb mentioned this pull request Nov 20, 2023

Enhance SortMergeJoinExec by implementing EagerJoinStream for the join #8273

Open

matthewgapp mentioned this pull request Jan 11, 2024

matt/feat/recursive ctes/config flag matthewgapp/arrow-datafusion#3

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Making stream joins extensible: A new Trait implementation for SHJ #8234

Making stream joins extensible: A new Trait implementation for SHJ #8234

metesynnada commented Nov 16, 2023

alamb commented Nov 16, 2023

ozankabak commented Nov 16, 2023

alamb left a comment

alamb Nov 17, 2023

metesynnada Nov 20, 2023

metesynnada Nov 20, 2023

alamb Nov 17, 2023

metesynnada Nov 20, 2023

alamb Nov 20, 2023

ozankabak Nov 20, 2023

metesynnada commented Nov 20, 2023

ozankabak left a comment

metesynnada commented Nov 20, 2023

alamb left a comment

alamb Nov 20, 2023

Making stream joins extensible: A new Trait implementation for SHJ #8234

Making stream joins extensible: A new Trait implementation for SHJ #8234

Conversation

metesynnada commented Nov 16, 2023

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

alamb commented Nov 16, 2023

ozankabak commented Nov 16, 2023

alamb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

metesynnada commented Nov 20, 2023

ozankabak left a comment

Choose a reason for hiding this comment

metesynnada commented Nov 20, 2023

alamb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment