Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: fix type checking code for aggregate functions #46649

Merged
merged 2 commits into from
Mar 30, 2020

Conversation

rytaft
Copy link
Collaborator

@rytaft rytaft commented Mar 26, 2020

Prior to this commit, any aggregate function that had an argument with
unknown type was replaced with NULL. This is incorrect for scalar
aggregates when the input relation has multiple rows, because after
replacement, the query result has the same number of rows as the input
relation. It should instead be reduced to a single row.

This commit fixes the issue by avoiding replacing the aggregate with NULL.

Note that for many aggregates, this change results in an ambiguous function
error since the type checking code cannot choose which overload is correct
for the unknown type. This is different behavior than Postgres, which
defaults to type "text".

Fixes #46196

Release justification: this is a low risk, high benefit change to existing
functionality.
Release note (bug fix): fixed an incorrect query result that could occur
when a scalar aggregate was called with a null input.

@rytaft rytaft requested a review from a team as a code owner March 26, 2020 22:01
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@jordanlewis jordanlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... I'm not sure about this. I think we need to make sure that we behave the same way as Postgres here, unfortunately, which i think means returning NULL and not an error in these cases.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @nvanbenschoten, and @RaduBerinde)

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten, @RaduBerinde, and @rytaft)


pkg/sql/sem/tree/type_check.go, line 854 at r1 (raw file):

	// NULL is given as an argument.
	if !def.NullableArgs && def.FunctionProperties.Class != GeneratorClass &&
		def.FunctionProperties.Class != AggregateClass {

I wonder if we could (in the aggregate case) returnmax(NULL::desired) (and default desired to String if it's Any).

@rytaft
Copy link
Collaborator Author

rytaft commented Mar 27, 2020

Agreed that we should try to match Postgres' behavior... but the existing code is definitely incorrect (and performs differently from Postgres). For example, consider the regression test in this PR:

SELECT MAX(t0.c0) FROM (VALUES (NULL), (NULL)) t0(c0)

Postgres (correctly) returns a single NULL row. Prior to this commit, Cockroach would return two NULL rows. Obviously it's best to match Postgres' behavior, but I think it's still better to return an error than incorrect results.

I was thinking that the change to make Cockroach choose the string overload as the default would be a separate PR, but I can just add that as another commit to this PR if that would be better.

Copy link
Collaborator Author

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, after digging deep into the Postgres type checking code, I've added another commit based on @RaduBerinde's suggestion of casting the argument to type string. Please see the new commit message for more details.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @RaduBerinde)


pkg/sql/sem/tree/type_check.go, line 854 at r1 (raw file):

Previously, RaduBerinde wrote…

I wonder if we could (in the aggregate case) returnmax(NULL::desired) (and default desired to String if it's Any).

Defaulting desired to String doesn't work if there are no overloads available that match, but I've done something similar...

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten and @RaduBerinde)

rytaft added 2 commits March 30, 2020 13:49
Prior to this commit, any aggregate function that had an argument with
unknown type was replaced with NULL. This is incorrect for scalar
aggregates when the input relation has multiple rows, because after
replacement, the query result has the same number of rows as the input
relation. It should instead be reduced to a single row.

This commit fixes the issue by avoiding replacing the aggregate with NULL.

Note that for many aggregates, this change results in an ambiguous function
error since the type checking code cannot choose which overload is correct
for the unknown type. This is different behavior than Postgres, which
defaults to type "text".

Fixes cockroachdb#46196

Release justification: this is a low risk, high benefit change to existing
functionality.
Release note (bug fix): fixed an incorrect query result that could occur
when a scalar aggregate was called with a null input.
Prior to this commit, the type checking code was not able to choose between
different possible aggregate overloads if the arguments had type unknown.
This commit changes the logic to match Postgres, which always prefers
overloads with arguments of type string if available.

Note that this commit still doesn't completely match Postgres' behavior,
because it doesn't handle the case when there are no overloads available
with string inputs for the arguments with unknown type. If there are no
overloads with string arguments, Postgres chooses the overload with
preferred type for the given category. For example, float8 is the preferred
type for the numeric category in Postgres. Since we don't support the concept
of preferred types within type categories, supporting this behavior will be
a more involved change. For now, this commit should cover most of our
supported aggregates.

Release justification: low risk, high benefit change to existing
functionality.
Release note (sql change): the type checking code now prefers aggregate
overloads with string inputs if there are multiple possible candidates
due to arguments of unknown type.
Copy link
Collaborator Author

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

bors r+

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten and @RaduBerinde)

@craig
Copy link
Contributor

craig bot commented Mar 30, 2020

Build succeeded

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

VALUES table expression malfunctions with MAX
4 participants