-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Streaming Aggregation: Do not break pipeline in aggregation if group by columns are ordered (V2) #6124
Implement Streaming Aggregation: Do not break pipeline in aggregation if group by columns are ordered (V2) #6124
Conversation
# Conflicts: # datafusion/core/src/physical_optimizer/repartition.rs # datafusion/core/src/physical_plan/joins/sort_merge_join.rs # datafusion/core/src/physical_plan/joins/symmetric_hash_join.rs # datafusion/core/src/physical_plan/mod.rs # datafusion/core/src/physical_plan/sorts/sort_preserving_merge.rs # datafusion/core/src/physical_plan/windows/bounded_window_agg_exec.rs # datafusion/core/src/physical_plan/windows/window_agg_exec.rs # datafusion/physical-expr/src/utils.rs
# Conflicts: # datafusion/core/src/test_util/mod.rs # datafusion/physical-expr/src/utils.rs
# Conflicts: # datafusion/common/src/utils.rs
# Conflicts: # datafusion/core/src/physical_optimizer/repartition.rs # datafusion/core/src/physical_optimizer/sort_pushdown.rs
# Conflicts: # datafusion/core/src/physical_plan/aggregates/mod.rs
# Conflicts: # datafusion/core/src/physical_plan/aggregates/row_hash.rs
# Conflicts: # datafusion/core/src/physical_plan/aggregates/mod.rs # datafusion/core/src/physical_plan/aggregates/row_hash.rs # datafusion/core/src/test_util/mod.rs # datafusion/core/tests/sql/window.rs
# Conflicts: # datafusion/core/tests/sqllogictests/test_files/window.slt
# Conflicts: # datafusion/common/src/utils.rs # datafusion/core/src/physical_optimizer/sort_enforcement.rs # datafusion/core/src/physical_plan/streaming.rs # datafusion/core/src/physical_plan/windows/bounded_window_agg_exec.rs # datafusion/core/src/physical_plan/windows/window_agg_exec.rs # datafusion/physical-expr/src/utils.rs
# Conflicts: # datafusion/core/src/physical_plan/aggregates/row_hash.rs
# Conflicts: # datafusion/core/src/physical_plan/union.rs
# Conflicts: # datafusion/core/src/physical_plan/windows/bounded_window_agg_exec.rs # datafusion/core/tests/sql/group_by.rs # datafusion/core/tests/sqllogictests/test_files/window.slt
# Conflicts: # datafusion/core/src/physical_plan/aggregates/mod.rs
@@ -107,22 +107,6 @@ pub(crate) struct GroupedHashAggregateStream { | |||
indices: [Vec<Range<usize>>; 2], | |||
} | |||
|
|||
#[derive(Debug)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have moved functions, structs common for both in row_hash.rs
and bounded_aggregate_stream.rs
to the inside util.rs
file. All the changes in this file, util.rs
comes because of this reason.
I am running my benchmark script on this PR now... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good to me -- thank you @mustafasrepo for adding the second stream implementation
I do agree this approach results in duplication of code that would be nice to eliminate eventually if possible but I think that is sometimes unavoidable in performance critical code.
If we are worried about performance regressions (which I am not) we can also contemplate adding a config option to disable this code. I do think it is important to display the GroupByOrderMode
somehow in the explain plan, but that can done as a follow on PR I think
cc @mingmwang
Here are the results of my running the benchmarks against this branch. I would say they look good to me and no longer show a consistent slowdown (though there is still a bunch of deviation)
****** TPCH SF1 (Parquet) ******
+ python3 /home/alamb/arrow-datafusion/benchmarks/compare.py /home/alamb/benchmarking/feature%2Fstream_groupby5/tpch_sf1_parquet_main.json /home/alamb/benchmarking/feature%2Fstream_groupby5/tpch_sf1_pa\
rquet_branch.json
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query ┃ /home/alamb… ┃ /home/alamb… ┃ Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1 │ 1420.49ms │ 1453.38ms │ no change │
│ QQuery 2 │ 407.92ms │ 382.98ms │ +1.07x faster │
│ QQuery 3 │ 528.76ms │ 550.12ms │ no change │
│ QQuery 4 │ 227.23ms │ 215.10ms │ +1.06x faster │
│ QQuery 5 │ 684.92ms │ 714.44ms │ no change │
│ QQuery 6 │ 424.28ms │ 449.18ms │ 1.06x slower │
│ QQuery 7 │ 1146.86ms │ 1200.87ms │ no change │
│ QQuery 8 │ 704.81ms │ 708.85ms │ no change │
│ QQuery 9 │ 1283.33ms │ 1344.45ms │ no change │
│ QQuery 10 │ 762.92ms │ 781.07ms │ no change │
│ QQuery 11 │ 334.14ms │ 348.14ms │ no change │
│ QQuery 12 │ 328.68ms │ 323.94ms │ no change │
│ QQuery 13 │ 1094.20ms │ 1086.12ms │ no change │
│ QQuery 14 │ 419.90ms │ 447.86ms │ 1.07x slower │
│ QQuery 15 │ 378.17ms │ 412.97ms │ 1.09x slower │
│ QQuery 16 │ 337.94ms │ 339.26ms │ no change │
│ QQuery 17 │ 2720.90ms │ 2801.68ms │ no change │
│ QQuery 18 │ 3469.81ms │ 3564.84ms │ no change │
│ QQuery 19 │ 725.54ms │ 765.05ms │ 1.05x slower │
│ QQuery 20 │ 1254.63ms │ 1203.19ms │ no change │
│ QQuery 21 │ 1654.42ms │ 1617.61ms │ no change │
│ QQuery 22 │ 192.72ms │ 186.06ms │ no change │
└──────────────┴──────────────┴──────────────┴───────────────┘
+ echo '****** TPCH SF1 (mem) ******'
****** TPCH SF1 (mem) ******
+ python3 /home/alamb/arrow-datafusion/benchmarks/compare.py /home/alamb/benchmarking/feature%2Fstream_groupby5/tpch_sf1_mem_main.json /home/alamb/benchmarking/feature%2Fstream_groupby5/tpch_sf1_mem_br\
anch.json
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query ┃ -o ┃ -o ┃ Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1 │ 753.34ms │ 733.06ms │ no change │
│ QQuery 2 │ 294.36ms │ 270.73ms │ +1.09x faster │
│ QQuery 3 │ 162.23ms │ 178.63ms │ 1.10x slower │
│ QQuery 4 │ 110.43ms │ 104.67ms │ +1.06x faster │
│ QQuery 5 │ 474.97ms │ 464.57ms │ no change │
│ QQuery 6 │ 35.15ms │ 39.75ms │ 1.13x slower │
│ QQuery 7 │ 1098.87ms │ 1140.80ms │ no change │
│ QQuery 8 │ 247.10ms │ 254.99ms │ no change │
│ QQuery 9 │ 603.15ms │ 604.26ms │ no change │
│ QQuery 10 │ 332.75ms │ 324.70ms │ no change │
│ QQuery 11 │ 273.60ms │ 269.05ms │ no change │
│ QQuery 12 │ 140.39ms │ 140.95ms │ no change │
│ QQuery 13 │ 657.01ms │ 666.64ms │ no change │
│ QQuery 14 │ 46.10ms │ 53.48ms │ 1.16x slower │
│ QQuery 15 │ 104.16ms │ 94.35ms │ +1.10x faster │
│ QQuery 16 │ 247.33ms │ 248.69ms │ no change │
│ QQuery 17 │ 2340.60ms │ 2363.40ms │ no change │
│ QQuery 18 │ 3019.72ms │ 3160.72ms │ no change │
│ QQuery 19 │ 139.29ms │ 153.35ms │ 1.10x slower │
│ QQuery 20 │ 904.06ms │ 947.16ms │ no change │
│ QQuery 21 │ 1368.68ms │ 1441.11ms │ 1.05x slower │
│ QQuery 22 │ 136.17ms │ 149.34ms │ 1.10x slower │
└──────────────┴──────────────┴──────────────┴───────────────┘
I am also hoping to make running such benchmarks easier in the future -- #6127
datafusion/core/src/physical_plan/aggregates/bounded_aggregate_stream.rs
Outdated
Show resolved
Hide resolved
🎉 thank you @mustafasrepo and @mingmwang |
Which issue does this PR close?
Closes #5133
Closes #6034
Rationale for this change
In the discussion under the #6034 , we have decided to implement a new Stream for non-pipeline breaking Aggregates, to prevent performance downgrade. This PR implements it.
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?