Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

smith: add all argument columns to ORDER BY #87085

Merged
merged 1 commit into from
Aug 31, 2022

Conversation

DrewKimball
Copy link
Collaborator

@DrewKimball DrewKimball commented Aug 30, 2022

This commit ensures that sqlsmith adds all function argument columns to
the ORDER BY clause when nondetermistic functions are disabled. This is
necessary for functions like string_agg, for which there may be multiple
possible values for each argument when the order is not specified.

As a special case, lead and lag require all input columns to be added
to the ORDER BY clause. This is because while they respect the specified
order, they can still access any value in the partition. This means that we
need to uniquely specify which rows fall on the boundaries of each peer group.
We can ensure this is the case by ordering on a key, and we can trivially
order on a key by ordering on all columns.

Partially addresses #85318
Fixes #87023

Release justification: testing-only change

Release note: None

@DrewKimball DrewKimball requested a review from a team as a code owner August 30, 2022 01:50
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 1 files at r1, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball)


-- commits line 16 at r1:
nit: this will close the issue - is it desirable? If not, maybe do s/fixes/addresses/.

This commit ensures that sqlsmith adds all function argument columns to
the `ORDER BY` clause when nondetermistic functions are disabled. This is
necessary for functions like `string_agg`, for which there may be multiple
possible values for each argument when the order is not specified.

As a special case, `lead` and `lag` require all input columns to be added
to the `ORDER BY` clause. This is because while they respect the specified
order, they can still access any value in the partition. This means that we
need to uniquely specify which rows fall on the boundaries of each peer group.
We can ensure this is the case by ordering on a key, and we can trivially
order on a key by ordering on all columns.

Partially addresses cockroachdb#85318
Fixes cockroachdb#87023

Release justification: testing-only change

Release note: None
Copy link
Collaborator Author

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball)


-- commits line 16 at r1:

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: this will close the issue - is it desirable? If not, maybe do s/fixes/addresses/.

Done.

@DrewKimball
Copy link
Collaborator Author

TFTR!

Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find!

@DrewKimball
Copy link
Collaborator Author

Thanks!
bors r+

@craig
Copy link
Contributor

craig bot commented Aug 30, 2022

Build failed:

@DrewKimball
Copy link
Collaborator Author

bors r+

@craig
Copy link
Contributor

craig bot commented Aug 31, 2022

Build succeeded:

@craig craig bot merged commit 6452f34 into cockroachdb:master Aug 31, 2022
@DrewKimball DrewKimball deleted the smith branch August 31, 2022 01:29
@mgartner
Copy link
Collaborator

mgartner commented Sep 6, 2022

@DrewKimball Do you think the lead/lag special case mentioned about would also apply to other aggregates, like concat_agg? It seems like that might be causing failures like this one: #87335 (comment) and this one: #87353 (comment).

@DrewKimball
Copy link
Collaborator Author

Now that you mention it, I think it's better to think in terms of what window functions don't need to order on a key - rank and dense_rank should be ok, since the result is always the same within a peer group. I think basically everything else (including aggregate functions) needs to be ordering on a key - anything that doesn't explicitly return the same value for every row in a given peer group. As an example, row_number obviously will care if the ith row is switched with the (i-1)th row. Many aggregate functions are order-agnostic, but they respect window frames, which are not order-agnostic (in general).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

roachtest.unoptimized-query-oracle/disable-rules=half
4 participants