Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve AggregateFuzz testing: generate random queries #12847

Merged
merged 7 commits into from
Oct 22, 2024

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Oct 10, 2024

Which issue does this PR close?

Part of #12114

Rationale for this change

Now that we have a great aggregation fuzz testing framework from @Rachelint , lets use it to increase coverage

Specifically while thinking about how to improve test coverage I think the key constraints are what types are supported by what aggregate functions. Thus it makes sense to have different functions tested in different combinations

What changes are included in this PR?

  1. Add a query generator to randomly create queries
  2. Update aggregate fuzz testing to cover various permutations of types and aggregates
  3. List other potential things to test

Are these changes tested?

Only tests

Are there any user-facing changes?

No, only tests

@alamb alamb changed the title Alamb/aggregate fuzz coverage Improve AggregateFuzz testing Oct 10, 2024
@alamb alamb force-pushed the alamb/aggregate_fuzz_coverage branch from 4d0cbf5 to 4c5d621 Compare October 14, 2024 17:07
@github-actions github-actions bot added the core Core DataFusion crate label Oct 14, 2024
@alamb alamb force-pushed the alamb/aggregate_fuzz_coverage branch 2 times, most recently from 37636b3 to d0aca65 Compare October 15, 2024 19:12
@@ -62,7 +62,7 @@ impl StringBatchGenerator {
let mut cases = vec![];
let mut rng = thread_rng();
for null_pct in [0.0, 0.01, 0.1, 0.5] {
for _ in 0..100 {
for _ in 0..10 {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reduce the number of strings in the input to reduce test time

@alamb alamb force-pushed the alamb/aggregate_fuzz_coverage branch from d0aca65 to 252623f Compare October 15, 2024 19:15
@alamb alamb added the development-process Related to development process of DataFusion label Oct 15, 2024
//
// Notes on tests:
//
// Since the supported types differ for each aggregation function, the tests
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is my key observation -- if we structure the tests by aggregate function I think that is mostly likely to permit the greatest coverage (as the framework now varies queries, number of columns, etc)

@alamb alamb force-pushed the alamb/aggregate_fuzz_coverage branch from 252623f to 42ffada Compare October 15, 2024 19:25
@github-actions github-actions bot removed the development-process Related to development process of DataFusion label Oct 15, 2024
@alamb alamb force-pushed the alamb/aggregate_fuzz_coverage branch from 42ffada to 4ad1ec4 Compare October 15, 2024 19:26
Copy link
Contributor Author

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran some code coverage on this PR using

cargo llvm-cov --test fuzz -- aggregate_fuzz

The coverage in aggregation has improved, but still has a ways to go:

datafusion/physical-optimizer/src/combine_partial_final_agg.rs                                54                 9    83.33%          10                 2    80.00%          97                10    89.69%           0                 0         -
datafusion/physical-optimizer/src/limit_pushdown.rs                                           92                65    29.35%          15                 8    46.67%         190               132    30.53%           0                 0         -
datafusion/physical-optimizer/src/limited_distinct_aggregation.rs                             63                48    23.81%           9                 4    55.56%         119                89    25.21%           0                 0         -
datafusion/physical-optimizer/src/output_requirements.rs                                      56                14    75.00%          23                 5    78.26%         150                40    73.33%           0                 0         -
datafusion/physical-optimizer/src/topk_aggregation.rs                                         90                61    32.22%          10                 5    50.00%         106                75    29.25%           0                 0         -
datafusion/physical-plan/src/aggregates/group_values/bytes.rs                                 25                 3    88.00%           9                 0   100.00%          77                 2    97.40%           0                 0         -
datafusion/physical-plan/src/aggregates/group_values/bytes_view.rs                            25                25     0.00%           9                 9     0.00%          77                77     0.00%           0                 0         -
datafusion/physical-plan/src/aggregates/group_values/column.rs                               130                32    75.38%          17                 1    94.12%         167                19    88.62%           0                 0         -
datafusion/physical-plan/src/aggregates/group_values/group_column.rs                          84                21    75.00%          19                 0   100.00%         202                21    89.60%           0                 0         -
datafusion/physical-plan/src/aggregates/group_values/mod.rs                                   19                10    47.37%           1                 0   100.00%          21                 8    61.90%           0                 0         -
datafusion/physical-plan/src/aggregates/group_values/null_builder.rs                          27                 0   100.00%           6                 0   100.00%          54                 0   100.00%           0                 0         -
datafusion/physical-plan/src/aggregates/group_values/primitive.rs                             48                 2    95.83%          15                 2    86.67%         102                 4    96.08%           0                 0         -
datafusion/physical-plan/src/aggregates/group_values/row.rs                                   92                92     0.00%          16                16     0.00%         176               176     0.00%           0                 0         -
datafusion/physical-plan/src/aggregates/mod.rs                                               422               205    51.42%         102                35    65.69%         852               311    63.50%           0                 0         -
datafusion/physical-plan/src/aggregates/no_grouping.rs                                        62                62     0.00%          11                11     0.00%         146               146     0.00%           0                 0         -
datafusion/physical-plan/src/aggregates/order/full.rs                                         25                 6    76.00%           7                 1    85.71%          47                 7    85.11%           0                 0         -
datafusion/physical-plan/src/aggregates/order/mod.rs                                          32                 2    93.75%           6                 0   100.00%          56                 1    98.21%           0                 0         -
datafusion/physical-plan/src/aggregates/order/partial.rs                                      61                13    78.69%           9                 0   100.00%         104                 8    92.31%           0                 0         -
datafusion/physical-plan/src/aggregates/row_hash.rs                                          347               112    67.72%          31                 5    83.87%         503                92    81.71%           0                 0         -
datafusion/physical-plan/src/aggregates/topk/hash_table.rs                                    97                97     0.00%          36                36     0.00%         198               198     0.00%           0                 0         -
datafusion/physical-plan/src/aggregates/topk/heap.rs                                         149               149     0.00%          38                38     0.00%         295               295     0.00%           0                 0         -
datafusion/physical-plan/src/aggregates/topk/priority_map.rs                                  17                17     0.00%           5                 5     0.00%          64                64     0.00%           0                 0         -
datafusion/physical-plan/src/aggregates/topk_stream.rs                                        83                83     0.00%           5                 5     0.00%         100               100     0.00%           0                 0         -

Full report is here: cov.zip

// The test framework handles varying combinations of arguments (data types),
// sortedness, and grouping parameters
//
// TODO: Test on floating point values (where output needs to be compared with some
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I plan to work on these tests next. First I will add coverage for StringView / BinaryView

@@ -305,3 +327,172 @@ impl AggregationFuzzTestTask {
)
}
}

/// Pretty prints the `RecordBatch`es, limited to the first 100 rows
fn format_batches_with_limit(batches: &[RecordBatch]) -> impl std::fmt::Display {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

without limiting the size of the output, the output is overwhelming

Copy link
Contributor

@Rachelint Rachelint Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maybe it is unnecessary to output batches?
And what we need seems to be reproducing it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found that seeing the differences in the output made it easier for me to understand what was wrong

Specifically, when I added a test for AVG and it failed intermittently for me, it was easy for me to say "integer overflow" as the values of the averages were e19 or something like that

let data_gen_config = baseline_config();

// Queries like SELECT min(a) FROM fuzz_table GROUP BY b
let query_builder = QueryBuilder::new()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The major "innovation" of this PR is to automatically generate the queries as well to increase coverage (e.g. different numbers of group columns, etc)

@alamb alamb marked this pull request as ready for review October 15, 2024 19:33
@alamb alamb changed the title Improve AggregateFuzz testing Improve AggregateFuzz testing: random queries Oct 15, 2024
@alamb
Copy link
Contributor Author

alamb commented Oct 15, 2024

FYI @Rachelint


/// Generate a random number of aggregate functions (potentially repeating).
///
fn random_aggregate_functions(&self) -> Vec<String> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the challenge for randomly generate aggregate_functions are about their different arguments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is right -- Improved the comments to clarify

}
self.table_name(query_builder.table_name())
}

pub fn add_sql(mut self, sql: &str) -> Self {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems can remove pub of add_sql now?

///
/// Limited to 3 group by columns to ensure coverage for large groups. With
/// larger numbers of columns, each group has many fewer values.
fn random_group_by(&self) -> Vec<String> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems dataset will be sorted by utf8 and u8 columns currently.

But the group by columns will be generated randomly.

Should we do something to ensure that streaming aggregation(partial and full sorted) can be covered?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is a good point. I think there should be some way to generate lower cardinality columns. I will do that as a follow on PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in #12990

@Rachelint
Copy link
Contributor

The query generator seems really excellent!

@alamb alamb changed the title Improve AggregateFuzz testing: random queries Improve AggregateFuzz testing: generate random queries Oct 17, 2024
Copy link
Contributor

@goldmedal goldmedal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @alamb, it looks to me 👍. Just left some comments for typo.
I saw there are some follow-up PR waiting for this. I think we can merge it if no more comments tomorrow.

@alamb
Copy link
Contributor Author

alamb commented Oct 21, 2024

Thank you @goldmedal

@goldmedal goldmedal merged commit c22abb4 into apache:main Oct 22, 2024
24 checks passed
@goldmedal
Copy link
Contributor

Thanks @alamb and @Rachelint for reviewing!

@alamb alamb deleted the alamb/aggregate_fuzz_coverage branch October 22, 2024 15:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants