Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(agg): remove state = "ref" from bool_and + refactor aggregate macro #19333

Closed
wants to merge 3 commits into from

Conversation

stdrc
Copy link
Member

@stdrc stdrc commented Nov 11, 2024

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

Previously aggregate macro didn't correctly handle aggregates that require the first (non-NULL) input as its "first state". For example, for bool_and, the behaviors of with and without state = "ref" are different, of which the latter one produce wrong results. This PR refactors the state maintanence code generated by the aggregate macro, as a result, we can remove state = "ref" from append-only bool_and to align with bool_or, avoiding further confusion and maintanence error.

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added test labels as necessary. See details.
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

Copy link
Member Author

stdrc commented Nov 11, 2024

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @stdrc and the rest of your teammates on Graphite Graphite

@stdrc stdrc changed the title fix empty argument #[aggregate] generation refactor(agg): remove unnecessary state = "ref" Nov 11, 2024
@stdrc stdrc changed the title refactor(agg): remove unnecessary state = "ref" refactor(agg): remove unnecessary state = "ref" from bool_and Nov 11, 2024
@stdrc stdrc changed the title refactor(agg): remove unnecessary state = "ref" from bool_and refactor(agg): remove state = "ref" from bool_and + refactor aggregate macro Nov 11, 2024
@stdrc stdrc marked this pull request as ready for review November 11, 2024 10:04
@stdrc stdrc removed the type/fix Bug fix label Nov 11, 2024
@stdrc stdrc requested a review from BugenZhao November 11, 2024 10:05
@stdrc stdrc force-pushed the rc/fix-aggregate-macro branch from 27670ca to 66c4ae3 Compare November 12, 2024 06:09
@stdrc stdrc requested a review from a team as a code owner November 12, 2024 06:09
@stdrc stdrc changed the base branch from rc/fix-first-last-value-2 to main November 12, 2024 06:09
@stdrc
Copy link
Member Author

stdrc commented Nov 12, 2024

Tests failed, because count depends on the old behavior that creates a state with default value of the state data type and apply the function with the default state + first input.

To be more concrete, we need to fit all the following cases with one macro:

sum:
1. NULL if no non-NULL input values
2. On first non-NULL input, `state = first input`, or, `state = 0` and apply `agg(state, value)`

sum0:
1. 0 if no non-NULL input values

count:
1. NULL if no non-NULL input values
2. On first non-NULL input, `state = 0` and apply `agg(state, value)`

bool_or:
1. NULL if no non-NULL input values
2. On first non-NULL input, `state = first input`, or, `state = false` and apply `agg(state, value)`

bool_and:
1. NULL if no non-NULL input values
2. On first non-NULL input, `state = first input`, or, `state = true` and apply `agg(state, value)`

first_value:
1. NULL if no input values
2. On first input, `state = first input`, apply or not apply `agg(state, value)`

last_value:
1. NULL if no input values
2. On first input, `state = first input`, apply or not apply `agg(state, value)`, or, `state = anything`, apply `agg(state, value)`

This requires us to revisit the macro arguments set when defining these aggregates. Now I want to leave it as it is for now.

@stdrc stdrc closed this Nov 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant