Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add Time/Interval/Decimal/Utf8View in aggregate fuzz testing #13226

Merged
merged 6 commits into from
Nov 5, 2024

Conversation

LeslieKid
Copy link
Contributor

@LeslieKid LeslieKid commented Nov 1, 2024

Which issue does this PR close?

Part of #12114 .

Rationale for this change

Supporting more types for dataset generator in fuzzer framework is needed to improve aggregation fuzzer coverage.

What changes are included in this PR?

  • Support Interval and Time types for PrimitiveArrayGenerator.
  • Introduce DecimalArrayGenerator to support Decimal type.
  • Support Utf8View type.

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the core Core DataFusion crate label Nov 1, 2024
@alamb
Copy link
Contributor

alamb commented Nov 3, 2024

This PR looks great -- thank you @LeslieKid

@LeslieKid LeslieKid changed the title feat: Add Time/Interval/Decimal in aggregate fuzz testing feat: Add Time/Interval/Decimal/StringView in aggregate fuzz testing Nov 4, 2024
@LeslieKid LeslieKid changed the title feat: Add Time/Interval/Decimal/StringView in aggregate fuzz testing feat: Add Time/Interval/Decimal/Utf8View in aggregate fuzz testing Nov 4, 2024
@LeslieKid LeslieKid marked this pull request as ready for review November 4, 2024 20:18
@@ -338,6 +338,10 @@ impl GroupsAccumulator for MinMaxBytesAccumulator {
/// This is a heuristic to avoid allocating too many small buffers
fn capacity_to_view_block_size(data_capacity: usize) -> u32 {
let max_block_size = 2 * 1024 * 1024;
// Avoid block size equal to zero when calling `with_fixed_block_size()`.
if data_capacity == 0 {
return 1;
Copy link
Contributor Author

@LeslieKid LeslieKid Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The data_capacity might be zero and results in aggregation fuzz tests panicked with message "Block size must be greater than 0".

So I modify this function to ensure that the block size would not be 0 in this case. But I'm not sure if this is a bug...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I bet there is something / somewhere that is passing in an empty batch -- and a small optimization might be to avoid doing so.

Do you happen to have the stack trace still around?

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @LeslieKid -- this is really nice

use rand::Rng;

/// Randomly generate decimal arrays
pub struct DecimalArrayGenerator {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really nice

@@ -338,6 +338,10 @@ impl GroupsAccumulator for MinMaxBytesAccumulator {
/// This is a heuristic to avoid allocating too many small buffers
fn capacity_to_view_block_size(data_capacity: usize) -> u32 {
let max_block_size = 2 * 1024 * 1024;
// Avoid block size equal to zero when calling `with_fixed_block_size()`.
if data_capacity == 0 {
return 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I bet there is something / somewhere that is passing in an empty batch -- and a small optimization might be to avoid doing so.

Do you happen to have the stack trace still around?

basic_random_data!(IntervalYearMonthType);
basic_random_data!(Decimal128Type);

impl RandomNativeData for Date64Type {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@alamb alamb merged commit f2344d2 into apache:main Nov 5, 2024
26 of 28 checks passed
@alamb
Copy link
Contributor

alamb commented Nov 5, 2024

Thanks again @LeslieKid

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate functions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants