Add new physical rule CombinePartialFinalAggregate #5837

mingmwang · 2023-04-03T07:36:39Z

Which issue does this PR close?

Closes #5836
Closes #5774.

Rationale for this change

Improve the performance of Aggregate

What changes are included in this PR?

Implement PartialEq for AggregateExpr
Add a new Aggregate mode: AggregateMode:Single
Add a new rule CombinePartialFinalAggregate to combine the adjacent Partial and Final AggregateExecs

Are these changes tested?

TPCH-q17

cargo run --bin tpch -- benchmark datafusion --iterations 1 --path ./parquet_data --format parquet --query 17 -n 1 --disable-statistics --debug

Before this PR

=== Physical plan with metrics ===
ProjectionExec: expr=[CAST(SUM(lineitem.l_extendedprice)@0 AS Float64) / 7 as avg_yearly], metrics=[output_rows=1, elapsed_compute=4.708µs, spill_count=0, spilled_bytes=0, mem_used=0]
  AggregateExec: mode=Final, gby=[], aggr=[SUM(lineitem.l_extendedprice)], metrics=[output_rows=1, elapsed_compute=2.25µs, spill_count=0, spilled_bytes=0, mem_used=0]
    AggregateExec: mode=Partial, gby=[], aggr=[SUM(lineitem.l_extendedprice)], metrics=[output_rows=1, elapsed_compute=8.125µs, spill_count=0, spilled_bytes=0, mem_used=0]
      ProjectionExec: expr=[l_extendedprice@1 as l_extendedprice], metrics=[output_rows=587, elapsed_compute=292ns, spill_count=0, spilled_bytes=0, mem_used=0]
        CoalesceBatchesExec: target_batch_size=8192, metrics=[output_rows=587, elapsed_compute=23.22µs, spill_count=0, spilled_bytes=0, mem_used=0]
          HashJoinExec: mode=CollectLeft, join_type=Inner, on=[(Column { name: "p_partkey", index: 2 }, Column { name: "l_partkey", index: 0 })], filter=BinaryExpr { left: CastExpr { expr: Column { name: "l_quantity", index: 0 }, cast_type: Decimal128(30, 15), cast_options: CastOptions { safe: false } }, op: Lt, right: CastExpr { expr: Column { name: "__value", index: 1 }, cast_type: Decimal128(30, 15), cast_options: CastOptions { safe: false } } }, metrics=[output_rows=200000, build_input_batches=1, input_rows=200000, output_batches=25, input_batches=25, build_input_rows=6088, build_mem_used=514520, build_time=441.023722ms, join_time=1.100547ms]
            ProjectionExec: expr=[l_quantity@1 as l_quantity, l_extendedprice@2 as l_extendedprice, p_partkey@3 as p_partkey], metrics=[output_rows=6088, elapsed_compute=375ns, spill_count=0, spilled_bytes=0, mem_used=0]
              CoalesceBatchesExec: target_batch_size=8192, metrics=[output_rows=6088, elapsed_compute=2.5µs, spill_count=0, spilled_bytes=0, mem_used=0]
                HashJoinExec: mode=CollectLeft, join_type=Inner, on=[(Column { name: "l_partkey", index: 0 }, Column { name: "p_partkey", index: 0 })], metrics=[output_rows=204, build_input_batches=733, input_rows=204, output_batches=1, input_batches=1, build_input_rows=6001215, build_mem_used=517375864, build_time=428.277166ms, join_time=4.338959ms]
                  ParquetExec: limit=None, partitions={1 group: [[Users/mingmwang/gitrepo/apache/arrow-datafusion/benchmarks/parquet_data/lineitem/part-0.parquet]]}, projection=[l_partkey, l_quantity, l_extendedprice], metrics=[output_rows=6001215, elapsed_compute=1ns, spill_count=0, spilled_bytes=0, mem_used=0, predicate_evaluation_errors=0, bytes_scanned=45261954, pushdown_rows_filtered=0, row_groups_pruned=0, num_predicate_creation_errors=0, page_index_rows_filtered=0, page_index_eval_time=2ns, time_elapsed_processing=171.43936ms, pushdown_eval_time=2ns, time_elapsed_scanning_total=180.158936ms, time_elapsed_opening=1.045458ms, time_elapsed_scanning_until_data=4.35775ms]
                  ProjectionExec: expr=[p_partkey@0 as p_partkey], metrics=[output_rows=204, elapsed_compute=208ns, spill_count=0, spilled_bytes=0, mem_used=0]
                    CoalesceBatchesExec: target_batch_size=8192, metrics=[output_rows=204, elapsed_compute=13.712µs, spill_count=0, spilled_bytes=0, mem_used=0]
                      FilterExec: p_brand@1 = Brand#23 AND p_container@2 = MED BOX, metrics=[output_rows=204, elapsed_compute=2.42554ms, spill_count=0, spilled_bytes=0, mem_used=0]
                        ParquetExec: limit=None, partitions={1 group: [[Users/mingmwang/gitrepo/apache/arrow-datafusion/benchmarks/parquet_data/part/part-0.parquet]]}, predicate=p_brand@3 = Brand#23 AND p_container@6 = MED BOX, pruning_predicate=p_brand_min@0 <= Brand#23 AND Brand#23 <= p_brand_max@1 AND p_container_min@2 <= MED BOX AND MED BOX <= p_container_max@3, projection=[p_partkey, p_brand, p_container], metrics=[output_rows=200000, elapsed_compute=1ns, spill_count=0, spilled_bytes=0, mem_used=0, predicate_evaluation_errors=0, bytes_scanned=744742, pushdown_rows_filtered=0, row_groups_pruned=0, num_predicate_creation_errors=0, page_index_rows_filtered=0, page_index_eval_time=2ns, time_elapsed_processing=5.893912ms, pushdown_eval_time=2ns, time_elapsed_scanning_total=8.75517ms, time_elapsed_opening=2.033041ms, time_elapsed_scanning_until_data=2.282542ms]
            ProjectionExec: expr=[l_partkey@0 as l_partkey, 0.2 * CAST(AVG(lineitem.l_quantity)@1 AS Float64) as __value], metrics=[output_rows=200000, elapsed_compute=604.917µs, spill_count=0, spilled_bytes=0, mem_used=0]
              AggregateExec: mode=Final, gby=[l_partkey@0 as l_partkey], aggr=[AVG(lineitem.l_quantity)], metrics=[output_rows=200000, elapsed_compute=59.743373ms, spill_count=0, spilled_bytes=0, mem_used=0]
                AggregateExec: mode=Partial, gby=[l_partkey@0 as l_partkey], aggr=[AVG(lineitem.l_quantity)], metrics=[output_rows=200000, elapsed_compute=2.991533862s, spill_count=0, spilled_bytes=0, mem_used=0]
                  ParquetExec: limit=None, partitions={1 group: [[Users/mingmwang/gitrepo/apache/arrow-datafusion/benchmarks/parquet_data/lineitem/part-0.parquet]]}, projection=[l_partkey, l_quantity], metrics=[output_rows=6001215, elapsed_compute=1ns, spill_count=0, spilled_bytes=0, mem_used=0, predicate_evaluation_errors=0, bytes_scanned=24308735, pushdown_rows_filtered=0, row_groups_pruned=0, num_predicate_creation_errors=0, page_index_rows_filtered=0, page_index_eval_time=2ns, time_elapsed_processing=96.39778ms, pushdown_eval_time=2ns, time_elapsed_scanning_total=3.083229128s, time_elapsed_opening=1.210625ms, time_elapsed_scanning_until_data=2.773209ms]

After this PR:

ProjectionExec: expr=[CAST(SUM(lineitem.l_extendedprice)@0 AS Float64) / 7 as avg_yearly], metrics=[output_rows=1, elapsed_compute=3.292µs, spill_count=0, spilled_bytes=0, mem_used=0]
  AggregateExec: mode=Single, gby=[], aggr=[SUM(lineitem.l_extendedprice)], metrics=[output_rows=1, elapsed_compute=7.041µs, spill_count=0, spilled_bytes=0, mem_used=0]
    ProjectionExec: expr=[l_extendedprice@1 as l_extendedprice], metrics=[output_rows=587, elapsed_compute=209ns, spill_count=0, spilled_bytes=0, mem_used=0]
      CoalesceBatchesExec: target_batch_size=8192, metrics=[output_rows=587, elapsed_compute=23.677µs, spill_count=0, spilled_bytes=0, mem_used=0]
        HashJoinExec: mode=CollectLeft, join_type=Inner, on=[(Column { name: "p_partkey", index: 2 }, Column { name: "l_partkey", index: 0 })], filter=BinaryExpr { left: CastExpr { expr: Column { name: "l_quantity", index: 0 }, cast_type: Decimal128(30, 15), cast_options: CastOptions { safe: false } }, op: Lt, right: CastExpr { expr: Column { name: "__value", index: 1 }, cast_type: Decimal128(30, 15), cast_options: CastOptions { safe: false } } }, metrics=[output_rows=200000, build_input_rows=6088, input_rows=200000, output_batches=25, build_input_batches=1, input_batches=25, build_mem_used=514520, join_time=1.089208ms, build_time=444.242481ms]
          ProjectionExec: expr=[l_quantity@1 as l_quantity, l_extendedprice@2 as l_extendedprice, p_partkey@3 as p_partkey], metrics=[output_rows=6088, elapsed_compute=375ns, spill_count=0, spilled_bytes=0, mem_used=0]
            CoalesceBatchesExec: target_batch_size=8192, metrics=[output_rows=6088, elapsed_compute=2.46µs, spill_count=0, spilled_bytes=0, mem_used=0]
              HashJoinExec: mode=CollectLeft, join_type=Inner, on=[(Column { name: "l_partkey", index: 0 }, Column { name: "p_partkey", index: 0 })], metrics=[output_rows=204, build_input_rows=6001215, input_rows=204, output_batches=1, build_input_batches=733, input_batches=1, build_mem_used=517375864, join_time=3.597959ms, build_time=432.300961ms]
                ParquetExec: limit=None, partitions={1 group: [[Users/mingmwang/gitrepo/apache/arrow-datafusion/benchmarks/parquet_data/lineitem/part-0.parquet]]}, projection=[l_partkey, l_quantity, l_extendedprice], metrics=[output_rows=6001215, elapsed_compute=1ns, spill_count=0, spilled_bytes=0, mem_used=0, num_predicate_creation_errors=0, predicate_evaluation_errors=0, pushdown_rows_filtered=0, row_groups_pruned=0, page_index_rows_filtered=0, bytes_scanned=45261954, page_index_eval_time=2ns, time_elapsed_processing=172.17813ms, pushdown_eval_time=2ns, time_elapsed_opening=1.031708ms, time_elapsed_scanning_total=180.707651ms, time_elapsed_scanning_until_data=4.388ms]
                ProjectionExec: expr=[p_partkey@0 as p_partkey], metrics=[output_rows=204, elapsed_compute=375ns, spill_count=0, spilled_bytes=0, mem_used=0]
                  CoalesceBatchesExec: target_batch_size=8192, metrics=[output_rows=204, elapsed_compute=13.627µs, spill_count=0, spilled_bytes=0, mem_used=0]
                    FilterExec: p_brand@1 = Brand#23 AND p_container@2 = MED BOX, metrics=[output_rows=204, elapsed_compute=2.366041ms, spill_count=0, spilled_bytes=0, mem_used=0]
                      ParquetExec: limit=None, partitions={1 group: [[Users/mingmwang/gitrepo/apache/arrow-datafusion/benchmarks/parquet_data/part/part-0.parquet]]}, predicate=p_brand@3 = Brand#23 AND p_container@6 = MED BOX, pruning_predicate=p_brand_min@0 <= Brand#23 AND Brand#23 <= p_brand_max@1 AND p_container_min@2 <= MED BOX AND MED BOX <= p_container_max@3, projection=[p_partkey, p_brand, p_container], metrics=[output_rows=200000, elapsed_compute=1ns, spill_count=0, spilled_bytes=0, mem_used=0, num_predicate_creation_errors=0, predicate_evaluation_errors=0, pushdown_rows_filtered=0, row_groups_pruned=0, page_index_rows_filtered=0, bytes_scanned=744742, page_index_eval_time=2ns, time_elapsed_processing=5.880331ms, pushdown_eval_time=2ns, time_elapsed_opening=1.377625ms, time_elapsed_scanning_total=8.683539ms, time_elapsed_scanning_until_data=2.290333ms]
          ProjectionExec: expr=[l_partkey@0 as l_partkey, 0.2 * CAST(AVG(lineitem.l_quantity)@1 AS Float64) as __value], metrics=[output_rows=200000, elapsed_compute=585.001µs, spill_count=0, spilled_bytes=0, mem_used=0]
            AggregateExec: mode=Single, gby=[l_partkey@0 as l_partkey], aggr=[AVG(lineitem.l_quantity)], metrics=[output_rows=200000, elapsed_compute=2.682685209s, spill_count=0, spilled_bytes=0, mem_used=0]
              ParquetExec: limit=None, partitions={1 group: [[Users/mingmwang/gitrepo/apache/arrow-datafusion/benchmarks/parquet_data/lineitem/part-0.parquet]]}, projection=[l_partkey, l_quantity], metrics=[output_rows=6001215, elapsed_compute=1ns, spill_count=0, spilled_bytes=0, mem_used=0, num_predicate_creation_errors=0, predicate_evaluation_errors=0, pushdown_rows_filtered=0, row_groups_pruned=0, page_index_rows_filtered=0, bytes_scanned=24308735, page_index_eval_time=2ns, time_elapsed_processing=98.302065ms, pushdown_eval_time=2ns, time_elapsed_opening=1.118791ms, time_elapsed_scanning_total=2.782967665s, time_elapsed_scanning_until_data=2.825542ms]

Before this PR:

Query 17 iteration 0 took 3395.1 ms and returned 1 rows
Query 17 iteration 1 took 3598.1 ms and returned 1 rows
Query 17 iteration 2 took 3554.1 ms and returned 1 rows
Query 17 avg time: 3515.76 ms

After this PR:

Query 17 iteration 0 took 3486.8 ms and returned 1 rows
Query 17 iteration 1 took 3211.4 ms and returned 1 rows
Query 17 iteration 2 took 3201.6 ms and returned 1 rows
Query 17 avg time: 3299.93 ms

Are there any user-facing changes?

…o single_agg

mingmwang · 2023-04-03T07:53:23Z

Will add some UT soon.

yjshen · 2023-04-03T09:24:33Z

I understand the collapsing rule as it removes the requirement of creating a RecordBatch from states and then reading them back for final evaluation.

As for naming this new aggregation mode, I find Complete more descriptive when displayed as output, but I have no strong preference.

ProjectionExec: expr=[l_partkey@0 as l_partkey, ....
  AggregateExec: mode=Single...
      ParquetExec ...

ProjectionExec: expr=[l_partkey@0 as l_partkey, ...
  AggregateExec: mode=Complete...
      ParquetExec ...

Dandandan · 2023-04-03T13:17:19Z

As far as I can see, this only works for single partitions as input and not repartitioning in between (e.g. no concurrency), could you confirm?

datafusion/core/src/physical_plan/aggregates/mod.rs

mingmwang · 2023-04-03T13:57:28Z

As far as I can see, this only works for single partitions as input and not repartitioning in between (e.g. no concurrency), could you confirm?

No always. We will see the adjacent Partial + Final Aggregator for normal join and aggregation on the same key.
I will add more UTs and intg tests tomorrow to show the cases:

select distinct(t1.t1_id) from t1 inner join t2 on t1.t1_id = t2.t2_id;

AggregateExec: mode=Single, gby=[t1_id@0 as t1_id], aggr=[]",
      ProjectionExec: expr=[t1_id@0 as t1_id]",
        CoalesceBatchesExec: target_batch_size=4096",
          HashJoinExec: mode=Partitioned, join_type=Inner, on=[(Column { name: \"t1_id\", index: 0 }, Column { name: \"t2_id\", index: 0 })]",
            CoalesceBatchesExec: target_batch_size=4096",
              RepartitionExec: partitioning=Hash([Column { name: \"t1_id\", index: 0 }], 2), input_partitions=2",
                RepartitionExec: partitioning=RoundRobinBatch(2), input_partitions=1",
                  MemoryExec: partitions=1, partition_sizes=[1]",
            CoalesceBatchesExec: target_batch_size=4096",
              RepartitionExec: partitioning=Hash([Column { name: \"t2_id\", index: 0 }], 2), input_partitions=2",
                RepartitionExec: partitioning=RoundRobinBatch(2), input_partitions=1",
                  MemoryExec: partitions=1, partition_sizes=[1]",

Dandandan · 2023-04-03T14:08:13Z

As far as I can see, this only works for single partitions as input and not repartitioning in between (e.g. no concurrency), could you confirm?

No always. We will see the adjacent Partial + Final Aggregator for normal join and aggregation on the same key. I will add more UTs and intg tests tomorrow to show the cases:

select distinct(t1.t1_id) from t1 inner join t2 on t1.t1_id = t2.t2_id;

AggregateExec: mode=Single, gby=[t1_id@0 as t1_id], aggr=[]",
      ProjectionExec: expr=[t1_id@0 as t1_id]",
        CoalesceBatchesExec: target_batch_size=4096",
          HashJoinExec: mode=Partitioned, join_type=Inner, on=[(Column { name: \"t1_id\", index: 0 }, Column { name: \"t2_id\", index: 0 })]",
            CoalesceBatchesExec: target_batch_size=4096",
              RepartitionExec: partitioning=Hash([Column { name: \"t1_id\", index: 0 }], 2), input_partitions=2",
                RepartitionExec: partitioning=RoundRobinBatch(2), input_partitions=1",
                  MemoryExec: partitions=1, partition_sizes=[1]",
            CoalesceBatchesExec: target_batch_size=4096",
              RepartitionExec: partitioning=Hash([Column { name: \"t2_id\", index: 0 }], 2), input_partitions=2",
                RepartitionExec: partitioning=RoundRobinBatch(2), input_partitions=1",
                  MemoryExec: partitions=1, partition_sizes=[1]",

Ah yes - in the case the underlying partition is already hash-repartitioned on the key. Makes sense, thanks.

mingmwang · 2023-04-09T15:47:13Z

@Dandandan @yjshen @alamb
Would you please help to review and approve this PR?

alamb · 2023-04-10T16:08:52Z

I will review this PR carefully today

alamb

I reviewed the code carefully. I have some suggestions on testing and documentation which I think would improve this PR but are not absolutely required to merge.

Thank you @mingmwang and sorry for the delay in reviewing

alamb · 2023-04-10T19:40:32Z

datafusion/core/src/physical_plan/aggregates/mod.rs

+                .expr
+                .iter()
+                .zip(other.expr.iter())
+                .all(|((expr1, name1), (expr2, name2))| expr1.eq(expr2) && name1 == name2)


I wondered why this needed to be manually derived, so I tried removing it and got this error:

error[E0369]: binary operation `==` cannot be applied to type `Vec<(Arc<dyn PhysicalExpr>, std::string::String)>` --> datafusion/core/src/physical_plan/aggregates/mod.rs:91:5 | 88 | #[derive(Clone, Debug, Default, PartialEq)] | --------- in this derive macro expansion ... 91 | expr: Vec<(Arc<dyn PhysicalExpr>, String)>, | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | = note: this error originates in the derive macro `PartialEq` (in Nightly builds, run with -Z macro-backtrace for more info)

It looks like if Struct contains any boxed Trait Object, we can not use the PartialEq derive macros.

rust-lang/rust#39128

alamb · 2023-04-10T19:43:36Z

datafusion/core/src/physical_plan/aggregates/mod.rs

@@ -65,6 +65,8 @@ pub enum AggregateMode {
    /// with Hash repartitioning on the group keys. If a group key is
    /// duplicated, duplicate groups would be produced
    FinalPartitioned,
+    /// Single aggregate is a combination of Partial and Final aggregate mode


Suggested change

/// Single aggregate is a combination of Partial and Final aggregate mode

/// Applies the entire logical aggregation operation in a single operator,

/// as opposed to Partial / Final modes which apply the logical aggregation using

/// two operators.

alamb · 2023-04-10T19:45:42Z

datafusion/core/tests/sql/joins.rs

+    let physical_plan = dataframe.create_physical_plan().await?;
+    let expected =
+        vec![
+            "AggregateExec: mode=Single, gby=[t1_id@0 as t1_id], aggr=[]",


Is it correct that this plan can use a single aggregate because is is already partitioned on the group key (t1_id) after the join

alamb · 2023-04-10T19:49:50Z

datafusion/physical-expr/src/aggregate/utils.rs

@@ -31,3 +34,17 @@ pub fn get_accum_scalar_values_as_arrays(
        .map(|s| s.to_array_of_size(1))
        .collect::<Vec<_>>())
 }
+
+pub fn down_cast_any_ref(any: &dyn Any) -> &dyn Any {


Can you please document what this function does (with an example) given it is a new pub function?

Yes, just add more comments. I have an example in the count unitest.

#[test] fn count_eq() -> Result<()> { let count = Count::new(lit(1i8), "COUNT(1)".to_string(), DataType::Int64); let arc_count: Arc<dyn AggregateExpr> = Arc::new(Count::new( lit(1i8), "COUNT(1)".to_string(), DataType::Int64, )); let box_count: Box<dyn AggregateExpr> = Box::new(Count::new( lit(1i8), "COUNT(1)".to_string(), DataType::Int64, )); let count2 = Count::new(lit(1i8), "COUNT(2)".to_string(), DataType::Int64); assert!(arc_count.eq(&box_count)); assert!(box_count.eq(&arc_count)); assert!(arc_count.eq(&count)); assert!(count.eq(&box_count)); assert!(count.eq(&arc_count)); assert!(count2.ne(&arc_count)); Ok(()) }

datafusion/physical-expr/src/aggregate/mod.rs

datafusion/core/src/physical_optimizer/combine_partial_final_agg.rs

mingmwang · 2023-04-12T03:02:00Z

The group expression comparing between the partial and final aggregation is problematic, because the column indexes might be different.

mingmwang · 2023-04-12T03:08:00Z

@yahoNanJing @alamb
Please help to move this PR to Draft.

yahoNanJing · 2023-04-12T03:08:30Z

Convert it to draft

yahoNanJing · 2023-04-12T03:16:27Z

datafusion/core/src/physical_optimizer/combine_partial_final_agg.rs

+                    ) {
+                        final_input
+                            .as_any()
+                            .downcast_ref::<AggregateExec>()


Since there's no RepartitionExec, it means the distribution of AggregateExec with final mode and AggregateExec with partial mode are the same. Therefore, there's no need to do two-phase aggregations.

Thanks @mingmwang for introducing this rule, which will significantly improve the query performances for the SQL patterns shown in UTs.

Actually the performance improve will not that significant, because usually the Final aggregation step is not that heavy.

yahoNanJing

LGTM

* add CombinePartialFinalAggregate rule * Implement PartialEq for AggregateExpr * fix compile error * refine logic in the rule * add UT * resolve review comments * fix compare grouping columns

mingmwang added 3 commits April 2, 2023 16:53

add CombinePartialFinalAggregate rule

c0e7de2

Merge branch 'main' of https://github.com/apache/arrow-datafusion int…

be353c1

…o single_agg

Implement PartialEq for AggregateExpr

9b48539

github-actions bot added core Core DataFusion crate physical-expr Physical Expressions labels Apr 3, 2023

fix compile error

8db03af

mingmwang mentioned this pull request Apr 3, 2023

Improve the performance of Aggregator, grouping, aggregation #4973

Closed

4 tasks

refine logic in the rule

f32fbfc

Dandandan reviewed Apr 3, 2023

View reviewed changes

datafusion/core/src/physical_plan/aggregates/mod.rs Show resolved Hide resolved

mingmwang added 2 commits April 4, 2023 15:41

add UT

ce3a125

merge with upstream

4b8cd30

alamb added the api change Changes the API exposed to users of the crate label Apr 10, 2023

alamb approved these changes Apr 10, 2023

View reviewed changes

mingmwang added 4 commits April 11, 2023 18:56

resolve review comments

25957b1

merge with upstream

c17e23f

merge with upstream

0ba0cb6

merge with upstream

4948319

yahoNanJing marked this pull request as draft April 12, 2023 03:08

yahoNanJing reviewed Apr 12, 2023

View reviewed changes

fix compare grouping columns

9a505da

github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Apr 12, 2023

mingmwang marked this pull request as ready for review April 12, 2023 06:37

yahoNanJing approved these changes Apr 12, 2023

View reviewed changes

yahoNanJing merged commit 1600a30 into apache:main Apr 12, 2023

yahoNanJing mentioned this pull request Apr 14, 2023

Row accumulator support update Scalar values #6003

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new physical rule CombinePartialFinalAggregate #5837

Add new physical rule CombinePartialFinalAggregate #5837

mingmwang commented Apr 3, 2023 •

edited by alamb

Loading

mingmwang commented Apr 3, 2023

yjshen commented Apr 3, 2023

Dandandan commented Apr 3, 2023

mingmwang commented Apr 3, 2023

Dandandan commented Apr 3, 2023

mingmwang commented Apr 9, 2023

alamb commented Apr 10, 2023

alamb left a comment

alamb Apr 10, 2023

mingmwang Apr 11, 2023 •

edited

Loading

alamb Apr 10, 2023

alamb Apr 10, 2023

mingmwang Apr 11, 2023

alamb Apr 10, 2023

mingmwang Apr 11, 2023

mingmwang commented Apr 12, 2023 •

edited

Loading

mingmwang commented Apr 12, 2023

yahoNanJing commented Apr 12, 2023

yahoNanJing Apr 12, 2023

yahoNanJing Apr 12, 2023

mingmwang Apr 12, 2023

yahoNanJing left a comment

-    /// Single aggregate is a combination of Partial and Final aggregate mode
+    /// Applies the entire logical aggregation operation in a single operator,
+    /// as opposed to Partial / Final modes which apply the logical aggregation using
+    /// two operators.

Add new physical rule CombinePartialFinalAggregate #5837

Add new physical rule CombinePartialFinalAggregate #5837

Conversation

mingmwang commented Apr 3, 2023 • edited by alamb Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

mingmwang commented Apr 3, 2023

yjshen commented Apr 3, 2023

Dandandan commented Apr 3, 2023

mingmwang commented Apr 3, 2023

Dandandan commented Apr 3, 2023

mingmwang commented Apr 9, 2023

alamb commented Apr 10, 2023

alamb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mingmwang Apr 11, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mingmwang commented Apr 12, 2023 • edited Loading

mingmwang commented Apr 12, 2023

yahoNanJing commented Apr 12, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yahoNanJing left a comment

Choose a reason for hiding this comment

mingmwang commented Apr 3, 2023 •

edited by alamb

Loading

mingmwang Apr 11, 2023 •

edited

Loading

mingmwang commented Apr 12, 2023 •

edited

Loading