Handle ordering of first last aggregation inside aggregator #8662

mustafasrepo · 2023-12-27T11:45:16Z

Which issue does this PR close?

Closes #.
Improves situation on #8662
Related to #8582

Rationale for this change

This PR implements the observation by @alamb at the PR that for first and last value aggregation we do not need to sort entire data at its input.

In other words, This PR is the FIRST_VALUE and LAST_VALUE aggregation support of the approach 3 in the design document

What changes are included in this PR?

Are these changes tested?

Yes

Are there any user-facing changes?

mustafasrepo · 2023-12-27T11:50:34Z

datafusion/sqllogictest/test_files/distinct_on.slt

@@ -78,7 +78,7 @@ c 4
 query I
 SELECT DISTINCT ON (c1) c2 FROM aggregate_test_100 ORDER BY c1, c3;
 ----
-5
+4


I ran same query in the postgre, it gave the same result with the new version.

mustafasrepo · 2023-12-27T12:16:25Z

datafusion/physical-plan/src/aggregates/mod.rs

+                // Append ordering requirements to expressions' results.
+                // This way order sensitive aggregators can satisfy requirement
+                // themselves.
+                if let Some(ordering_req) = agg.order_bys() {


Since aggregators themselves handle ordering. We append ordering expression values to the field also for all modes.

mustafasrepo · 2023-12-27T12:21:34Z

datafusion/physical-expr/src/aggregate/first_last.rs

+                }
+            })
+            .collect::<Vec<_>>();
+        let indices = lexsort_to_indices(&sort_columns, Some(1))?;


If there is a min max alternative to this we can use that one also. However, as far as I know there is no util for this support. Maybe @tustvold can answer this, if he is familiar with.

I'm not aware of a min/max kernel that returns the ordinal position of the min/max

BTW I had the same basic need (find the position of min/max so I could find a value in a corresponding column) while implementing our special selector_first, selector_last, etc functions in InfluxDB 3.0 (I also had to code them specially)

Do you think you implementation is more efficient? If that is the case, maybe we can use that code instead?

I think our implementation is (slightly) more efficient, but it is less general (only works for timestamp columns). You can see the basic idea here

https://github.com/influxdata/influxdb/blob/main/query_functions/src/selectors.rs

And the comparision is here: https://github.com/influxdata/influxdb/blob/acfef87659c9a8c4c49e4628264369569e04cad1/query_functions/src/selectors/internal.rs#L119-L127

I think we should stay with the ScalarValue implementation unless we find some query where this calculation is taking most of the time

mustafasrepo · 2023-12-27T12:24:34Z

datafusion/sqllogictest/test_files/groupby.slt

----------------------CoalesceBatchesExec: target_batch_size=8192
------------------------RepartitionExec: partitioning=Hash([col0@0], 4), input_partitions=1
--------------------------MemoryExec: partitions=1, partition_sizes=[3]
+------------AggregateExec: mode=Partial, gby=[col0@0 as col0, col1@1 as col1, col2@2 as col2], aggr=[LAST_VALUE(r.col1)]


Since first_value and last_value no longer requires ordering at its input. SortExecs are removed from the plan.

mustafasrepo · 2023-12-27T12:33:29Z

datafusion/sqllogictest/test_files/groupby.slt

@@ -2209,7 +2208,7 @@ ProjectionExec: expr=[a@0 as a, b@1 as b, LAST_VALUE(annotated_data_infinite2.c)
 ----StreamingTableExec: partition_sizes=1, projection=[a, b, c], infinite_source=true, output_ordering=[a@0 ASC NULLS LAST, b@1 ASC NULLS LAST, c@2 ASC NULLS LAST]

 query III
-SELECT a, b, LAST_VALUE(c ORDER BY a DESC) as last_c
+SELECT a, b, LAST_VALUE(c ORDER BY a DESC, c ASC) as last_c


The result of this test was not unique according to specifications (Since column a is not unique). I changed test to make result unique.

tustvold · 2023-12-27T13:11:37Z

datafusion/physical-expr/src/aggregate/first_last.rs

+        // - There is a more recent entry in terms of requirement
+        if !self.is_set
+            || self.orderings.is_empty()
+            || compare_rows(


I'm sure you are aware but https://docs.rs/arrow-row/latest/arrow_row/ will be a much faster way to perform row-based comparisons than relying on ScalarValue

Indeed, however, here we are checking just a single row (row that have lowest value). Hence I don't think it is worth to conversion here.

I agree that since it is a single column max comparison this is probably fine (and no worse than the current implementation). If we need to optimize performance we could probably implement specialized implementations (like FirstValue<ArrowPrimitiveType> and skip the copying entirely.

~~That is likely a premature optimization at this point~~

Update: Row format may well be a good idea (not for this PR). I will wait until I have reviewed this code to offer a more informed opinion

I re-reviewed and I agree that the RowFormat is not needed here (and in fact it may actually be slower) because, as @mustafasrepo points out, this code uses ScalarValue to compare a single row per batch (it finds the largest/smallest row per batch using lexsort_to_indices). We would have to benchmark to be sure.

alamb · 2023-12-27T15:39:58Z

I plan to review this carefully either later today or tomorrow. I want to get a draft of #8491 first

ozankabak

I reviewed this PR and it looks good to me. @alamb, let us know what you think and if we can improve it

alamb

Thank you @mustafasrepo and @ozankabak -- this PR looks good to me. ❤️

I believe if we applied the same change to ArrayAgg I think we can remove the limitation of a single compatible ORDER BY in a query -- aka #8582 -- is that your understanding too?

I am sorry for the delay in reviewing, I am partly on holiday this week so don't have as much time to devote to these endeavors as normal.

I think that in many common queries, this implementation is likely faster than what is on main because it doesn't potentially re-sort the entire input (it instead used lexsort_to_indices)

As we discussed in the design document the potential downside of this approach is that if multiple aggregates share the same ORDER BY clause, they will each independently sort the input batches, which is unfortunate but could be optimized in future PTs

alamb · 2023-12-28T19:00:39Z

datafusion/physical-expr/src/aggregate/first_last.rs

+        };
+        // Update when there is no entry in the state, or we have an "earlier"
+        // entry according to sort requirements.
+        if !self.is_set


In theory, we may be able to use a Option<ScalarValue> instead of ScalarValue and is_set flag, but I don't think it matters for performance and this PR follows the existing implementation as well 👍

alamb · 2023-12-28T19:01:36Z

datafusion/physical-expr/src/aggregate/first_last.rs

+        // - There is a more recent entry in terms of requirement
+        if !self.is_set
+            || self.orderings.is_empty()
+            || compare_rows(


I re-reviewed and I agree that the RowFormat is not needed here (and in fact it may actually be slower) because, as @mustafasrepo points out, this code uses ScalarValue to compare a single row per batch (it finds the largest/smallest row per batch using lexsort_to_indices). We would have to benchmark to be sure.

alamb · 2023-12-28T19:07:08Z

datafusion/physical-expr/src/aggregate/first_last.rs

+                }
+            })
+            .collect::<Vec<_>>();
+        let indices = lexsort_to_indices(&sort_columns, Some(1))?;


I think our implementation is (slightly) more efficient, but it is less general (only works for timestamp columns). You can see the basic idea here

https://github.com/influxdata/influxdb/blob/main/query_functions/src/selectors.rs

And the comparision is here: https://github.com/influxdata/influxdb/blob/acfef87659c9a8c4c49e4628264369569e04cad1/query_functions/src/selectors/internal.rs#L119-L127

I think we should stay with the ScalarValue implementation unless we find some query where this calculation is taking most of the time

alamb · 2023-12-28T19:08:31Z

datafusion/physical-expr/src/aggregate/mod.rs

-    aggr_expr.as_any().is::<FirstValue>()
-        || aggr_expr.as_any().is::<LastValue>()
-        || aggr_expr.as_any().is::<OrderSensitiveArrayAgg>()
+    aggr_expr.as_any().is::<OrderSensitiveArrayAgg>()


Eventually this would be a nice thing to move into the AggregateExpr trait directly so we could override it and avoid special casing built in functions. Not for this PR though :)

I like this idea 👍

alamb · 2023-12-28T19:09:19Z

datafusion/sqllogictest/test_files/distinct_on.slt

@@ -100,10 +100,9 @@ ProjectionExec: expr=[FIRST_VALUE(aggregate_test_100.c3) ORDER BY [aggregate_tes
 ------AggregateExec: mode=FinalPartitioned, gby=[c1@0 as c1], aggr=[FIRST_VALUE(aggregate_test_100.c3), FIRST_VALUE(aggregate_test_100.c2)]
 --------CoalesceBatchesExec: target_batch_size=8192
 ----------RepartitionExec: partitioning=Hash([c1@0], 4), input_partitions=4
------------AggregateExec: mode=Partial, gby=[c1@0 as c1], aggr=[FIRST_VALUE(aggregate_test_100.c3), FIRST_VALUE(aggregate_test_100.c2)], ordering_mode=Sorted
--------------SortExec: expr=[c1@0 ASC NULLS LAST,c3@2 ASC NULLS LAST]


I do love the lack of Sort here

ozankabak · 2023-12-28T19:26:33Z

I believe if we applied the same change to ArrayAgg I think we can remove the limitation of a single compatible ORDER BY in a query -- aka #8582 -- is that your understanding too?

Yes, if we make ARRAY_AGG sort internally, we can do this

As we discussed in the design document the potential downside of this approach is that if multiple aggregates share the same ORDER BY clause, they will each independently sort the input batches, which is unfortunate but could be optimized in future PTs

Exactly. We will explore both the split/diamond approach and the approach above in the upcoming weeks

mustafasrepo added 3 commits December 27, 2023 14:35

Initial commit

edcef77

Update tests in distinct_on

da1cf71

Update group by joins slt

202936d

github-actions bot added physical-expr Physical Expressions sqllogictest SQL Logic Tests (.slt) labels Dec 27, 2023

mustafasrepo commented Dec 27, 2023

View reviewed changes

mustafasrepo marked this pull request as draft December 27, 2023 11:53

mustafasrepo added 3 commits December 27, 2023 14:53

Remove unused code

05bdc81

Minor changes

416ac3a

Minor changes

06adf25

mustafasrepo commented Dec 27, 2023

View reviewed changes

mustafasrepo marked this pull request as ready for review December 27, 2023 12:22

mustafasrepo commented Dec 27, 2023

View reviewed changes

Simplifications

e208ebf

mustafasrepo commented Dec 27, 2023

View reviewed changes

Update comments

0ece593

tustvold reviewed Dec 27, 2023

View reviewed changes

Review

298fcf0

ozankabak approved these changes Dec 28, 2023

View reviewed changes

alamb approved these changes Dec 28, 2023

View reviewed changes

Fix clippy

64666df

github-actions bot added optimizer Optimizer rules core Core DataFusion crate labels Dec 28, 2023

ozankabak merged commit 06ed3dd into apache:main Dec 28, 2023
22 checks passed

This was referenced Dec 28, 2023

Start setting up tpch planning benchmarks #8665

Merged

Implement the contained method of RowGroupPruningStatistics #8669

Closed

mustafasrepo mentioned this pull request Dec 29, 2023

Change first/last implementation to prevent redundant comparisons when data is already sorted #8678

Merged

matthewgapp mentioned this pull request Jan 11, 2024

matt/feat/recursive ctes/config flag matthewgapp/arrow-datafusion#3

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle ordering of first last aggregation inside aggregator #8662

Handle ordering of first last aggregation inside aggregator #8662

mustafasrepo commented Dec 27, 2023 •

edited by alamb

Loading

mustafasrepo Dec 27, 2023

mustafasrepo Dec 27, 2023

mustafasrepo Dec 27, 2023 •

edited

Loading

tustvold Dec 27, 2023

alamb Dec 27, 2023

ozankabak Dec 27, 2023

alamb Dec 28, 2023

mustafasrepo Dec 27, 2023

mustafasrepo Dec 27, 2023 •

edited

Loading

tustvold Dec 27, 2023

mustafasrepo Dec 27, 2023

alamb Dec 27, 2023 •

edited

Loading

alamb Dec 28, 2023

alamb commented Dec 27, 2023

ozankabak left a comment •

edited

Loading

alamb left a comment

alamb Dec 28, 2023

alamb Dec 28, 2023

alamb Dec 28, 2023

alamb Dec 28, 2023

ozankabak Dec 28, 2023

alamb Dec 28, 2023

ozankabak commented Dec 28, 2023

Handle ordering of first last aggregation inside aggregator #8662

Handle ordering of first last aggregation inside aggregator #8662

Conversation

mustafasrepo commented Dec 27, 2023 • edited by alamb Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mustafasrepo Dec 27, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mustafasrepo Dec 27, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alamb Dec 27, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alamb commented Dec 27, 2023

ozankabak left a comment • edited Loading

Choose a reason for hiding this comment

alamb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ozankabak commented Dec 28, 2023

mustafasrepo commented Dec 27, 2023 •

edited by alamb

Loading

mustafasrepo Dec 27, 2023 •

edited

Loading

mustafasrepo Dec 27, 2023 •

edited

Loading

alamb Dec 27, 2023 •

edited

Loading

ozankabak left a comment •

edited

Loading