Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor: Update JoinHashMap comment example to make it clearer #8154

Merged
merged 1 commit into from
Nov 15, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions datafusion/physical-plan/src/joins/hash_join.rs
Original file line number Diff line number Diff line change
Expand Up @@ -609,6 +609,8 @@ impl ExecutionPlan for HashJoinExec {
}
}

/// Reads the left (build) side of the input, buffering it in memory, to build a
/// hash table (`LeftJoinData`)
async fn collect_left_input(
partition: Option<usize>,
random_state: RandomState,
Expand Down Expand Up @@ -801,7 +803,7 @@ impl RecordBatchStream for HashJoinStream {
/// # Example
///
/// For `LEFT.b1 = RIGHT.b2`:
/// LEFT Table:
/// LEFT (build) Table:
/// ```text
/// a1 b1 c1
/// 1 1 10
Expand All @@ -813,7 +815,7 @@ impl RecordBatchStream for HashJoinStream {
/// 13 10 130
/// ```
///
/// RIGHT Table:
/// RIGHT (probe) Table:
/// ```text
/// a2 b2 c2
/// 2 2 20
Expand Down
55 changes: 28 additions & 27 deletions datafusion/physical-plan/src/joins/hash_join_utils.rs
Original file line number Diff line number Diff line change
Expand Up @@ -61,44 +61,45 @@ use hashbrown::HashSet;
///
/// ``` text
/// See the example below:
/// Insert (1,1)
///
/// Insert (10,1) <-- insert hash value 10 with row index 1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the overloaded use of 1 as both a hash value and an index VERY confusing. Therefore I propose to use different values for the hash value and the index values

/// map:
/// ---------
/// | 1 | 2 |
/// ---------
/// ----------
/// | 10 | 2 |
/// ----------
/// next:
/// ---------------------
/// | 0 | 0 | 0 | 0 | 0 |
/// ---------------------
/// Insert (2,2)
/// Insert (20,2)
/// map:
/// ---------
/// | 1 | 2 |
/// | 2 | 3 |
/// ---------
/// ----------
/// | 10 | 2 |
/// | 20 | 3 |
/// ----------
/// next:
/// ---------------------
/// | 0 | 0 | 0 | 0 | 0 |
/// ---------------------
/// Insert (1,3)
/// Insert (10,3) <-- collision! row index 3 has a hash value of 10 as well
/// map:
/// ---------
/// | 1 | 4 |
/// | 2 | 3 |
/// ---------
/// ----------
/// | 10 | 4 |
/// | 20 | 3 |
/// ----------
/// next:
/// ---------------------
/// | 0 | 0 | 0 | 2 | 0 | <--- hash value 1 maps to 4,2 (which means indices values 3,1)
/// | 0 | 0 | 0 | 2 | 0 | <--- hash value 10 maps to 4,2 (which means indices values 3,1)
/// ---------------------
/// Insert (1,4)
/// Insert (10,4) <-- another collision! row index 4 ALSO has a hash value of 10
/// map:
/// ---------
/// | 1 | 5 |
/// | 2 | 3 |
/// | 10 | 5 |
/// | 20 | 3 |
/// ---------
/// next:
/// ---------------------
/// | 0 | 0 | 0 | 2 | 4 | <--- hash value 1 maps to 5,4,2 (which means indices values 4,3,1)
/// | 0 | 0 | 0 | 2 | 4 | <--- hash value 10 maps to 5,4,2 (which means indices values 4,3,1)
/// ---------------------
/// ```
pub struct JoinHashMap {
Expand All @@ -119,7 +120,7 @@ impl JoinHashMap {

/// Trait defining methods that must be implemented by a hash map type to be used for joins.
pub trait JoinHashMapType {
/// The type of list used to store the hash values.
/// The type of list used to store the next list
type NextType: IndexMut<usize, Output = u64>;
/// Extend with zero
fn extend_zero(&mut self, len: usize);
Expand Down Expand Up @@ -196,15 +197,15 @@ impl fmt::Debug for JoinHashMap {
/// Let's continue the example of `JoinHashMap` and then show how `PruningJoinHashMap` would
/// handle the pruning scenario.
///
/// Insert the pair (1,4) into the `PruningJoinHashMap`:
/// Insert the pair (10,4) into the `PruningJoinHashMap`:
/// map:
/// ---------
/// | 1 | 5 |
/// | 2 | 3 |
/// ---------
/// ----------
/// | 10 | 5 |
/// | 20 | 3 |
/// ----------
/// list:
/// ---------------------
/// | 0 | 0 | 0 | 2 | 4 | <--- hash value 1 maps to 5,4,2 (which means indices values 4,3,1)
/// | 0 | 0 | 0 | 2 | 4 | <--- hash value 10 maps to 5,4,2 (which means indices values 4,3,1)
/// ---------------------
///
/// Now, let's prune 3 rows from `PruningJoinHashMap`:
Expand All @@ -214,7 +215,7 @@ impl fmt::Debug for JoinHashMap {
/// ---------
/// list:
/// ---------
/// | 2 | 4 | <--- hash value 1 maps to 2 (5 - 3), 1 (4 - 3), NA (2 - 3) (which means indices values 1,0)
/// | 2 | 4 | <--- hash value 10 maps to 2 (5 - 3), 1 (4 - 3), NA (2 - 3) (which means indices values 1,0)
/// ---------
///
/// After pruning, the | 2 | 3 | entry is deleted from `PruningJoinHashMap` since
Expand Down