Aggregation fuzz testing #12114

alamb · 2024-08-22T15:59:27Z

Is your feature request related to a problem or challenge?

While reviewing #11943 from @Rachelint it is becoming clear to me that the hash aggregate code is now pretty sophisticated and I am not sure our testing has kept up. In fact I couldn't come up with a great way to systematically test the new code added in #11943

Also, the code in #11627 from @korowa for skipping partial aggregates has a similar problem as it is not invoked There is also code for streaming and partial streaming group by.

All this code has unit tests, but I am not confident that all the combinations are checked. For example the code paths are affected by:

Sort order of the input
partitioning of the input
The type of the group keys
The number of groups
The number of rows in each group
The type of the aggregate
The number of aggregates
If the aggregate supports group aggregation
If the groups aggregator supports partial aggregation skipping

Describe the solution you'd like

I would like a more systematic way to test this code to ensure out current code is correct but also to ensure that future changes do not introduce subtle hard to debug regressions / wrong results

Describe alternatives you've considered

What I think would be good is a test framework that:

Describes an input data set (e.g. RecordBatches)
Run the same query on the same input data set with different configurations (e.g. block size, input sort order, distribution of input blocks, etc)
Compare the results and ensure it is the same in all cases

Parameters to randomly vary for each input:

Sort order if the input
target block size
Number of input partitions
memory limit (to force spilling)
Shuffled input row distribution across blocks
the skipping partial aggregation enabling or not

Test cases:
2. Types of the group keys
2. single/multiple column groups
3. Number of groups (low/high cardinality)
4. Different aggregates

Additional context

We also have some great sql fuzz coverage in https://github.com/datafusion-contrib/datafusion-sqlancer from @2010YOUY01, but I think that focuses on the queries themselves, rather than the setup (block size, input order, etc)

Existing aggregate coverage in datafusion core fuzz test (cargo test --test fuzz

datafusion/datafusion/core/tests/fuzz_cases/distinct_count_string_fuzz.rs

Lines 33 to 34 in 3c2b542

#[tokio::test(flavor = "multi_thread")]

async fn distinct_count_string_test() {
datafusion/datafusion/core/tests/fuzz_cases/aggregate_fuzz.rs

Lines 48 to 49 in e088945

/// same results

#[tokio::test(flavor = "multi_thread")]

Subtasks

Add fuzz support for Timestamp, Binary and Float #13279

The text was updated successfully, but these errors were encountered:

2010YOUY01 · 2024-08-23T05:12:44Z

Additional context

We also have some great sql fuzz coverage in https://github.com/datafusion-contrib/datafusion-sqlancer from @2010YOUY01, but I think that focuses on the queries themselves, rather than the setup (block size, input order, etc)

I agree SQLancer is not the best choice for aggregation-specific fuzzing (though doable), due to:

It takes a lot of effort to try all possible configuration knobs on randomly generated data
It's random SQL + random config, the randomly generated SQL will be complex and with deeply nested exprs, which will be hard to reduce and investigate

So now I plan to cover more SQL features and try to find easy to identify and fix bugs, configuration fuzzing is less prioritized for SQLancer

So I think rust-level fuzzing is better.

Besides, I think we can also find some comprehensive aggregation queries to do some SQL level fuzzing (Fixed SQL + random config, and check under different config the query always gives the same result)

2010YOUY01 · 2024-08-23T05:25:05Z

I am also curious what is the compatible matrix for all aggregation optimizations (like can skip-partial-aggregation and external-aggregation triggered in the same execution, for all combinations)
Specifying them in configuration manual and code doc can make it easier to understand the aggregation details, and also write more effective tests

Rachelint · 2024-08-23T06:22:24Z

I am also curious what is the compatible matrix for all aggregation optimizations (like can skip-partial-aggregation and external-aggregation triggered in the same execution, for all combinations) Specifying them in configuration manual and code doc can make it easier to understand the aggregation details, and also write more effective tests

In my knowledge, it may be:

	spilling	streaming(sorted)	skip partial	blocked emission
spilling		x	o	x
streaming(sorted)	x		o	x
skip partial	o	x		o
blocked emission	x	x	o

Rachelint · 2024-08-23T06:32:19Z

As I think, can we run the basic aggregation without any optimizations enabled and use its output as expected first,
and then we modify the options to enable different optimizations and their combinations, and compare their result with expected?

alamb · 2024-08-26T19:32:36Z

As I think, can we run the basic aggregation without any optimizations enabled and use its output as expected first,
and then we modify the options to enable different optimizations and their combinations, and compare their result with expected?

Yes, I think that is likely a good plan. In my mind, as long as all the code paths get the same answer that will increase our confidence that the system is computing the correct results in the different places

Rachelint · 2024-08-27T04:43:55Z

As I think, can we run the basic aggregation without any optimizations enabled and use its output as expected first,
and then we modify the options to enable different optimizations and their combinations, and compare their result with expected?

Yes, I think that is likely a good plan. In my mind, as long as all the code paths get the same answer that will increase our confidence that the system is computing the correct results in the different places

Ok, maybe just start from making a simple sketch, and try to impl current aggr fuzz tests based on it?

I can have a try on it, and help to push forward about enabling #11943 by default,

alamb · 2024-08-27T13:21:41Z

As I think, can we run the basic aggregation without any optimizations enabled and use its output as expected first,
and then we modify the options to enable different optimizations and their combinations, and compare their result with expected?

Yes, I think that is likely a good plan. In my mind, as long as all the code paths get the same answer that will increase our confidence that the system is computing the correct results in the different places

Ok, maybe just start from making a simple sketch, and try to impl current aggr fuzz tests based on it?

I can have a try on it, and help to push forward about enabling #11943 by default,

Thank you -- that would be awesome. I can't keep up anymore with everything that is going on

In terms of helping along DataFusion performance, my plan was to focus first on getting StringView enabled and then switch more to focusing on the blocked intermediate state.

I will however, prioritize time for reviewing aggregation testing as I think testing in general is really important for DataFusion

Rachelint · 2024-08-27T13:25:01Z

take

alamb · 2024-10-09T15:52:06Z

@Rachelint has made a great start here: #12667

What would you suggest the next steps here be @Rachelint ? Do you want to fill out the coverage? Would it be helpful if I did?

Rachelint · 2024-10-09T16:13:45Z

What would you suggest the next steps here be @Rachelint ?

I personally plan to implement some necessary features of the framework firstly, like:

support test spilling
support more data type in dataset generator(e.g. bool, binary)
improvment about reproducing the failed case for debugging

Do you want to fill out the coverage? Would it be helpful if I did?

It will surely be helpful! It may be a help wanted work for me.

alamb · 2024-10-09T18:23:26Z

I will try and do so over the next few days. Thanks @Rachelint

LeslieKid · 2024-10-17T13:35:41Z

This fuzzer framework looks great!

support more data type in dataset generator(e.g. bool, binary)

And I want to work on this feature if nobody else take it. @Rachelint

Rachelint · 2024-10-17T14:05:19Z

This fuzzer framework looks great!

support more data type in dataset generator(e.g. bool, binary)

And I want to work on this feature if nobody else take it. @Rachelint

Really thanks, just feel free to do it

alamb · 2024-10-17T16:19:39Z

Update here:

We have the basic framework in place
I have a PR up to restructure the tests to make it easier to add queries Improve AggregateFuzz testing: generate random queries #12847

Here is a list of additional coverage I think is needed

Add coverage of StringView/BinaryView
Add coverage of Decimal128
Add coverage of Date/Time types (Timestamp, Duration, Interval, etc)
Add boolean columns (e.g. for group by and min/max/count)
Add coverage of streaming group by (I am working on this)
Add other aggregate functions listed in https://datafusion.apache.org/user-guide/sql/aggregate_functions_new.html

@LeslieKid perhaps you could make a PR based on #12847 for one of those items (StringView or Decimal or Date type would be super great)

LeslieKid · 2024-10-17T16:54:01Z

@LeslieKid perhaps you could make a PR based on #12847 for one of those items (StringView or Decimal or Date type would be super great)

OK! I will work on adding some new types for this framework in the next few days.

And I think maybe we can introduce a new trait named ArrayGenerator to unify the PrimitiveArrayGenerator and StringArrayGenerator? And it may make it easier to introduce new types.

alamb · 2024-10-17T16:58:29Z

And I think maybe we can introduce a new trait named ArrayGenerator to unify the PrimitiveArrayGenerator and StringArrayGenerator? And it may make it easier to introduce new types.

That would be great 🙏

alamb · 2024-11-05T19:58:15Z

@LeslieKid added time/interval/ decimal/utf8view in #13226

Additional types that would be good to cover are:

Float32/Float64
Date and Timestamp

Any chance you are interested in doing that too @LeslieKid ? If not no worries I can file a ticket and I bet others can follow your good example

LeslieKid · 2024-11-06T02:57:59Z

Additional types that would be good to cover are:

Float32/Float64

Date and Timestamp

🤔The Date type is already supported in #13041 . And I think Binary would be good to cover also.

Any chance you are interested in doing that too @LeslieKid ? If not no worries I can file a ticket and I bet others can follow your good example

Sorry, I'm currently unable to take that on at the moment. I think filing a ticket is a good way to forward. Thanks @alamb

alamb · 2024-11-06T16:21:49Z

Sorry, I'm currently unable to take that on at the moment. I think filing a ticket is a good way to forward. Thanks @alamb

Thank you -- filed #13279

alamb · 2024-11-13T15:23:42Z

Now that we have most of the datatypes filled in, perhaps we can start adding coverage for the other aggregation functions. Like bit_and etc 🤔

jonathanc-n · 2024-11-13T15:26:50Z

This sounds good. I can probably try to start working on some of them, should we aim to cover the entire list of aggregate functions that are in datafusion?

alamb added enhancement New feature or request help wanted Extra attention is needed labels Aug 22, 2024

alamb mentioned this issue Aug 22, 2024

[EPIC] Improve aggregate performance with adaptive sizing in accumulators / avoiding reallocations in accumulators #7065

Open

2 tasks

github-actions bot assigned Rachelint Aug 27, 2024

Rachelint mentioned this issue Oct 4, 2024

Add Aggregation fuzzer framework #12667

Merged

This was referenced Oct 9, 2024

Improve AggregationFuzzer error reporting #12832

Merged

Fix convert_to_state bug in GroupsAccumulatorAdapter #12834

Merged

This was referenced Oct 10, 2024

Improve AggregateFuzz testing: generate random queries #12847

Merged

Oct 16, 2024: This week in DataFusion #12973

Closed

This was referenced Oct 17, 2024

Increase fuzz testing of streaming group by / low cardinality columns #12990

Merged

Oct 21, 2024: This week in DataFusion #13035

Closed

LeslieKid mentioned this issue Oct 21, 2024

feat: Add Date32/Date64 in aggregate fuzz testing #13041

Merged

alamb mentioned this issue Oct 29, 2024

Oct 28, 2024: This week in DataFusion #13167

Closed

3 tasks

LeslieKid mentioned this issue Nov 1, 2024

feat: Add Time/Interval/Decimal/Utf8View in aggregate fuzz testing #13226

Merged

alamb mentioned this issue Nov 5, 2024

Nov 5. 2024: This week in DataFusion #13265

Closed

3 tasks

jonathanc-n mentioned this issue Nov 7, 2024

Add boolean columns for fuzz testing #13297

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aggregation fuzz testing #12114

Aggregation fuzz testing #12114

alamb commented Aug 22, 2024 •

edited

Loading

2010YOUY01 commented Aug 23, 2024

Additional context

2010YOUY01 commented Aug 23, 2024

Rachelint commented Aug 23, 2024 •

edited

Loading

Rachelint commented Aug 23, 2024 •

edited

Loading

alamb commented Aug 26, 2024

Rachelint commented Aug 27, 2024 •

edited

Loading

alamb commented Aug 27, 2024

Rachelint commented Aug 27, 2024

alamb commented Oct 9, 2024

Rachelint commented Oct 9, 2024 •

edited

Loading

alamb commented Oct 9, 2024

LeslieKid commented Oct 17, 2024

Rachelint commented Oct 17, 2024

alamb commented Oct 17, 2024

LeslieKid commented Oct 17, 2024

alamb commented Oct 17, 2024

alamb commented Nov 5, 2024

LeslieKid commented Nov 6, 2024

alamb commented Nov 6, 2024

alamb commented Nov 13, 2024

jonathanc-n commented Nov 13, 2024 •

edited

Loading

Aggregation fuzz testing #12114

Aggregation fuzz testing #12114

Comments

alamb commented Aug 22, 2024 • edited Loading

Is your feature request related to a problem or challenge?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

2010YOUY01 commented Aug 23, 2024

Additional context

2010YOUY01 commented Aug 23, 2024

Rachelint commented Aug 23, 2024 • edited Loading

Rachelint commented Aug 23, 2024 • edited Loading

alamb commented Aug 26, 2024

Rachelint commented Aug 27, 2024 • edited Loading

alamb commented Aug 27, 2024

Rachelint commented Aug 27, 2024

alamb commented Oct 9, 2024

Rachelint commented Oct 9, 2024 • edited Loading

alamb commented Oct 9, 2024

LeslieKid commented Oct 17, 2024

Rachelint commented Oct 17, 2024

alamb commented Oct 17, 2024

LeslieKid commented Oct 17, 2024

alamb commented Oct 17, 2024

alamb commented Nov 5, 2024

LeslieKid commented Nov 6, 2024

alamb commented Nov 6, 2024

alamb commented Nov 13, 2024

jonathanc-n commented Nov 13, 2024 • edited Loading

alamb commented Aug 22, 2024 •

edited

Loading

Rachelint commented Aug 23, 2024 •

edited

Loading

Rachelint commented Aug 23, 2024 •

edited

Loading

Rachelint commented Aug 27, 2024 •

edited

Loading

Rachelint commented Oct 9, 2024 •

edited

Loading

jonathanc-n commented Nov 13, 2024 •

edited

Loading