Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Streaming Aggregation: Do not break pipeline in aggregation if group by columns are ordered (V2) #6124

Merged
merged 44 commits into from
Apr 27, 2023
Merged

Conversation

mustafasrepo
Copy link
Contributor

@mustafasrepo mustafasrepo commented Apr 26, 2023

Which issue does this PR close?

Closes #5133
Closes #6034

Rationale for this change

In the discussion under the #6034 , we have decided to implement a new Stream for non-pipeline breaking Aggregates, to prevent performance downgrade. This PR implements it.

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

# Conflicts:
#	datafusion/core/src/physical_optimizer/repartition.rs
#	datafusion/core/src/physical_plan/joins/sort_merge_join.rs
#	datafusion/core/src/physical_plan/joins/symmetric_hash_join.rs
#	datafusion/core/src/physical_plan/mod.rs
#	datafusion/core/src/physical_plan/sorts/sort_preserving_merge.rs
#	datafusion/core/src/physical_plan/windows/bounded_window_agg_exec.rs
#	datafusion/core/src/physical_plan/windows/window_agg_exec.rs
#	datafusion/physical-expr/src/utils.rs
# Conflicts:
#	datafusion/core/src/test_util/mod.rs
#	datafusion/physical-expr/src/utils.rs
# Conflicts:
#	datafusion/common/src/utils.rs
# Conflicts:
#	datafusion/core/src/physical_optimizer/repartition.rs
#	datafusion/core/src/physical_optimizer/sort_pushdown.rs
# Conflicts:
#	datafusion/core/src/physical_plan/aggregates/mod.rs
# Conflicts:
#	datafusion/core/src/physical_plan/aggregates/row_hash.rs
# Conflicts:
#	datafusion/core/src/physical_plan/aggregates/mod.rs
#	datafusion/core/src/physical_plan/aggregates/row_hash.rs
#	datafusion/core/src/test_util/mod.rs
#	datafusion/core/tests/sql/window.rs
# Conflicts:
#	datafusion/core/tests/sqllogictests/test_files/window.slt
# Conflicts:
#	datafusion/common/src/utils.rs
#	datafusion/core/src/physical_optimizer/sort_enforcement.rs
#	datafusion/core/src/physical_plan/streaming.rs
#	datafusion/core/src/physical_plan/windows/bounded_window_agg_exec.rs
#	datafusion/core/src/physical_plan/windows/window_agg_exec.rs
#	datafusion/physical-expr/src/utils.rs
ozankabak and others added 13 commits April 14, 2023 21:57
# Conflicts:
#	datafusion/core/src/physical_plan/aggregates/row_hash.rs
# Conflicts:
#	datafusion/core/src/physical_plan/union.rs
# Conflicts:
#	datafusion/core/src/physical_plan/windows/bounded_window_agg_exec.rs
#	datafusion/core/tests/sql/group_by.rs
#	datafusion/core/tests/sqllogictests/test_files/window.slt
# Conflicts:
#	datafusion/core/src/physical_plan/aggregates/mod.rs
@github-actions github-actions bot added core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Apr 26, 2023
@@ -107,22 +107,6 @@ pub(crate) struct GroupedHashAggregateStream {
indices: [Vec<Range<usize>>; 2],
}

#[derive(Debug)]
Copy link
Contributor Author

@mustafasrepo mustafasrepo Apr 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have moved functions, structs common for both in row_hash.rs and bounded_aggregate_stream.rs to the inside util.rs file. All the changes in this file, util.rs comes because of this reason.

@mustafasrepo mustafasrepo marked this pull request as draft April 26, 2023 12:41
@mustafasrepo mustafasrepo changed the title Feature/stream groupby5 Implement Streaming Aggregation: Do not break pipeline in aggregation if group by columns are ordered (V2) Apr 26, 2023
@mustafasrepo mustafasrepo marked this pull request as ready for review April 26, 2023 13:27
@alamb
Copy link
Contributor

alamb commented Apr 26, 2023

I am running my benchmark script on this PR now...

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good to me -- thank you @mustafasrepo for adding the second stream implementation

I do agree this approach results in duplication of code that would be nice to eliminate eventually if possible but I think that is sometimes unavoidable in performance critical code.

If we are worried about performance regressions (which I am not) we can also contemplate adding a config option to disable this code. I do think it is important to display the GroupByOrderMode somehow in the explain plan, but that can done as a follow on PR I think

cc @mingmwang

Here are the results of my running the benchmarks against this branch. I would say they look good to me and no longer show a consistent slowdown (though there is still a bunch of deviation)

****** TPCH SF1 (Parquet) ******
+ python3 /home/alamb/arrow-datafusion/benchmarks/compare.py /home/alamb/benchmarking/feature%2Fstream_groupby5/tpch_sf1_parquet_main.json /home/alamb/benchmarking/feature%2Fstream_groupby5/tpch_sf1_pa\
rquet_branch.json
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃ /home/alamb… ┃ /home/alamb… ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │    1420.49ms │    1453.38ms │     no change │
│ QQuery 2     │     407.92ms │     382.98ms │ +1.07x faster │
│ QQuery 3     │     528.76ms │     550.12ms │     no change │
│ QQuery 4     │     227.23ms │     215.10ms │ +1.06x faster │
│ QQuery 5     │     684.92ms │     714.44ms │     no change │
│ QQuery 6     │     424.28ms │     449.18ms │  1.06x slower │
│ QQuery 7     │    1146.86ms │    1200.87ms │     no change │
│ QQuery 8     │     704.81ms │     708.85ms │     no change │
│ QQuery 9     │    1283.33ms │    1344.45ms │     no change │
│ QQuery 10    │     762.92ms │     781.07ms │     no change │
│ QQuery 11    │     334.14ms │     348.14ms │     no change │
│ QQuery 12    │     328.68ms │     323.94ms │     no change │
│ QQuery 13    │    1094.20ms │    1086.12ms │     no change │
│ QQuery 14    │     419.90ms │     447.86ms │  1.07x slower │
│ QQuery 15    │     378.17ms │     412.97ms │  1.09x slower │
│ QQuery 16    │     337.94ms │     339.26ms │     no change │
│ QQuery 17    │    2720.90ms │    2801.68ms │     no change │
│ QQuery 18    │    3469.81ms │    3564.84ms │     no change │
│ QQuery 19    │     725.54ms │     765.05ms │  1.05x slower │
│ QQuery 20    │    1254.63ms │    1203.19ms │     no change │
│ QQuery 21    │    1654.42ms │    1617.61ms │     no change │
│ QQuery 22    │     192.72ms │     186.06ms │     no change │
└──────────────┴──────────────┴──────────────┴───────────────┘
+ echo '****** TPCH SF1 (mem) ******'
****** TPCH SF1 (mem) ******
+ python3 /home/alamb/arrow-datafusion/benchmarks/compare.py /home/alamb/benchmarking/feature%2Fstream_groupby5/tpch_sf1_mem_main.json /home/alamb/benchmarking/feature%2Fstream_groupby5/tpch_sf1_mem_br\
anch.json
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃           -o ┃           -o ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │     753.34ms │     733.06ms │     no change │
│ QQuery 2     │     294.36ms │     270.73ms │ +1.09x faster │
│ QQuery 3     │     162.23ms │     178.63ms │  1.10x slower │
│ QQuery 4     │     110.43ms │     104.67ms │ +1.06x faster │
│ QQuery 5     │     474.97ms │     464.57ms │     no change │
│ QQuery 6     │      35.15ms │      39.75ms │  1.13x slower │
│ QQuery 7     │    1098.87ms │    1140.80ms │     no change │
│ QQuery 8     │     247.10ms │     254.99ms │     no change │
│ QQuery 9     │     603.15ms │     604.26ms │     no change │
│ QQuery 10    │     332.75ms │     324.70ms │     no change │
│ QQuery 11    │     273.60ms │     269.05ms │     no change │
│ QQuery 12    │     140.39ms │     140.95ms │     no change │
│ QQuery 13    │     657.01ms │     666.64ms │     no change │
│ QQuery 14    │      46.10ms │      53.48ms │  1.16x slower │
│ QQuery 15    │     104.16ms │      94.35ms │ +1.10x faster │
│ QQuery 16    │     247.33ms │     248.69ms │     no change │
│ QQuery 17    │    2340.60ms │    2363.40ms │     no change │
│ QQuery 18    │    3019.72ms │    3160.72ms │     no change │
│ QQuery 19    │     139.29ms │     153.35ms │  1.10x slower │
│ QQuery 20    │     904.06ms │     947.16ms │     no change │
│ QQuery 21    │    1368.68ms │    1441.11ms │  1.05x slower │
│ QQuery 22    │     136.17ms │     149.34ms │  1.10x slower │
└──────────────┴──────────────┴──────────────┴───────────────┘

I am also hoping to make running such benchmarks easier in the future -- #6127

datafusion/core/tests/sql/group_by.rs Show resolved Hide resolved
datafusion/core/tests/sql/group_by.rs Show resolved Hide resolved
datafusion/core/tests/aggregate_fuzz.rs Show resolved Hide resolved
datafusion/core/tests/aggregate_fuzz.rs Outdated Show resolved Hide resolved
datafusion/core/tests/window_fuzz.rs Show resolved Hide resolved
@mustafasrepo mustafasrepo merged commit aec3420 into apache:main Apr 27, 2023
@alamb
Copy link
Contributor

alamb commented Apr 27, 2023

🎉 thank you @mustafasrepo and @mingmwang

@andygrove andygrove added the enhancement New feature or request label May 6, 2023
@mustafasrepo mustafasrepo deleted the feature/stream_groupby5 branch May 8, 2023 12:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate enhancement New feature or request sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support StreamAggregation / streaming group by
5 participants