Skip to content

Commit

Permalink
fmt
Browse files Browse the repository at this point in the history
Signed-off-by: jayzhan211 <[email protected]>
  • Loading branch information
jayzhan211 committed Oct 29, 2024
1 parent 1b418c4 commit 8297f58
Showing 1 changed file with 9 additions and 9 deletions.
18 changes: 9 additions & 9 deletions datafusion/physical-plan/src/sorts/merge.rs
Original file line number Diff line number Diff line change
Expand Up @@ -98,19 +98,19 @@ pub(crate) struct SortPreservingMergeStream<C: CursorValues> {
cursors: Vec<Option<Cursor<C>>>,

/// Configuration parameter to enable round-robin selection of tied winners of loser tree.
///
/// To address the issue of unbalanced polling between partitions due to tie-breakers being based
/// on partition index, especially in cases of low cardinality, we are making changes to the winner
///
/// To address the issue of unbalanced polling between partitions due to tie-breakers being based
/// on partition index, especially in cases of low cardinality, we are making changes to the winner
/// selection mechanism. Previously, partitions with smaller indices were consistently chosen as the winners,
/// leading to an uneven distribution of polling. This caused upstream operator buffers for the other partitions
/// to grow excessively, as they continued receiving data without consuming it.
///
/// For example, an upstream operator like a repartition execution would keep sending data to certain partitions,
///
/// For example, an upstream operator like a repartition execution would keep sending data to certain partitions,
/// but those partitions wouldn't consume the data if they weren't selected as winners. This resulted in inefficient buffer usage.
///
/// To resolve this, we are modifying the tie-breaking logic. Instead of always choosing the partition with the smallest index,
/// we now select the partition that has the fewest poll counts for the same value.
/// This ensures that multiple partitions with the same value are chosen equally, distributing the polling load in a round-robin fashion.
///
/// To resolve this, we are modifying the tie-breaking logic. Instead of always choosing the partition with the smallest index,
/// we now select the partition that has the fewest poll counts for the same value.
/// This ensures that multiple partitions with the same value are chosen equally, distributing the polling load in a round-robin fashion.
/// This approach balances the workload more effectively across partitions and avoids excessive buffer growth.Round robin tie breaker
enable_round_robin_tie_breaker: bool,

Expand Down

0 comments on commit 8297f58

Please sign in to comment.