sql: fix type checking code for aggregate functions #46649

rytaft · 2020-03-26T22:01:04Z

Prior to this commit, any aggregate function that had an argument with
unknown type was replaced with NULL. This is incorrect for scalar
aggregates when the input relation has multiple rows, because after
replacement, the query result has the same number of rows as the input
relation. It should instead be reduced to a single row.

This commit fixes the issue by avoiding replacing the aggregate with NULL.

Note that for many aggregates, this change results in an ambiguous function
error since the type checking code cannot choose which overload is correct
for the unknown type. This is different behavior than Postgres, which
defaults to type "text".

Fixes #46196

Release justification: this is a low risk, high benefit change to existing
functionality.
Release note (bug fix): fixed an incorrect query result that could occur
when a scalar aggregate was called with a null input.

cockroach-teamcity · 2020-03-26T22:01:11Z

This change is

jordanlewis

Hmm... I'm not sure about this. I think we need to make sure that we behave the same way as Postgres here, unfortunately, which i think means returning NULL and not an error in these cases.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @nvanbenschoten, and @RaduBerinde)

RaduBerinde

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten, @RaduBerinde, and @rytaft)

pkg/sql/sem/tree/type_check.go, line 854 at r1 (raw file):

	// NULL is given as an argument.
	if !def.NullableArgs && def.FunctionProperties.Class != GeneratorClass &&
		def.FunctionProperties.Class != AggregateClass {

I wonder if we could (in the aggregate case) returnmax(NULL::desired) (and default desired to String if it's Any).

rytaft · 2020-03-27T03:06:50Z

Agreed that we should try to match Postgres' behavior... but the existing code is definitely incorrect (and performs differently from Postgres). For example, consider the regression test in this PR:

SELECT MAX(t0.c0) FROM (VALUES (NULL), (NULL)) t0(c0)

Postgres (correctly) returns a single NULL row. Prior to this commit, Cockroach would return two NULL rows. Obviously it's best to match Postgres' behavior, but I think it's still better to return an error than incorrect results.

I was thinking that the change to make Cockroach choose the string overload as the default would be a separate PR, but I can just add that as another commit to this PR if that would be better.

rytaft

Ok, after digging deep into the Postgres type checking code, I've added another commit based on @RaduBerinde's suggestion of casting the argument to type string. Please see the new commit message for more details.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @RaduBerinde)

pkg/sql/sem/tree/type_check.go, line 854 at r1 (raw file):

Previously, RaduBerinde wrote…

I wonder if we could (in the aggregate case) returnmax(NULL::desired) (and default desired to String if it's Any).

Defaulting desired to String doesn't work if there are no overloads available that match, but I've done something similar...

RaduBerinde

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten and @RaduBerinde)

Prior to this commit, any aggregate function that had an argument with unknown type was replaced with NULL. This is incorrect for scalar aggregates when the input relation has multiple rows, because after replacement, the query result has the same number of rows as the input relation. It should instead be reduced to a single row. This commit fixes the issue by avoiding replacing the aggregate with NULL. Note that for many aggregates, this change results in an ambiguous function error since the type checking code cannot choose which overload is correct for the unknown type. This is different behavior than Postgres, which defaults to type "text". Fixes cockroachdb#46196 Release justification: this is a low risk, high benefit change to existing functionality. Release note (bug fix): fixed an incorrect query result that could occur when a scalar aggregate was called with a null input.

Prior to this commit, the type checking code was not able to choose between different possible aggregate overloads if the arguments had type unknown. This commit changes the logic to match Postgres, which always prefers overloads with arguments of type string if available. Note that this commit still doesn't completely match Postgres' behavior, because it doesn't handle the case when there are no overloads available with string inputs for the arguments with unknown type. If there are no overloads with string arguments, Postgres chooses the overload with preferred type for the given category. For example, float8 is the preferred type for the numeric category in Postgres. Since we don't support the concept of preferred types within type categories, supporting this behavior will be a more involved change. For now, this commit should cover most of our supported aggregates. Release justification: low risk, high benefit change to existing functionality. Release note (sql change): the type checking code now prefers aggregate overloads with string inputs if there are multiple possible candidates due to arguments of unknown type.

rytaft

TFTR!

bors r+

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten and @RaduBerinde)

craig · 2020-03-30T20:15:57Z

Build succeeded

GitHub CI (Cockroach)

rytaft requested review from jordanlewis, nvanbenschoten and RaduBerinde March 26, 2020 22:01

rytaft requested a review from a team as a code owner March 26, 2020 22:01

jordanlewis reviewed Mar 26, 2020

View reviewed changes

RaduBerinde reviewed Mar 27, 2020

View reviewed changes

rytaft force-pushed the values-bug branch from 6ecec23 to aec076d Compare March 27, 2020 21:35

rytaft commented Mar 27, 2020

View reviewed changes

rytaft force-pushed the values-bug branch from aec076d to 13e8fff Compare March 27, 2020 21:52

RaduBerinde approved these changes Mar 28, 2020

View reviewed changes

rytaft added 2 commits March 30, 2020 13:49

rytaft force-pushed the values-bug branch from 13e8fff to 420da7a Compare March 30, 2020 18:49

rytaft added the backport-19.2.x label Mar 30, 2020

rytaft commented Mar 30, 2020

View reviewed changes

craig bot merged commit 93cb2eb into cockroachdb:master Mar 30, 2020

rytaft added the backport-20.1 label Mar 31, 2020

This was referenced Mar 31, 2020

release-19.2: sql: fix type checking code for aggregate functions #46807

Merged

release-20.1: sql: fix type checking code for aggregate functions #46898

Merged

release-19.1: sql: fix type checking code for aggregate functions #46902

Merged

rytaft deleted the values-bug branch April 2, 2020 22:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: fix type checking code for aggregate functions #46649

sql: fix type checking code for aggregate functions #46649

rytaft commented Mar 26, 2020

cockroach-teamcity commented Mar 26, 2020

jordanlewis left a comment

RaduBerinde left a comment

rytaft commented Mar 27, 2020

rytaft left a comment

RaduBerinde left a comment

rytaft left a comment

craig bot commented Mar 30, 2020

sql: fix type checking code for aggregate functions #46649

sql: fix type checking code for aggregate functions #46649

Conversation

rytaft commented Mar 26, 2020

cockroach-teamcity commented Mar 26, 2020

jordanlewis left a comment

Choose a reason for hiding this comment

RaduBerinde left a comment

Choose a reason for hiding this comment

rytaft commented Mar 27, 2020

rytaft left a comment

Choose a reason for hiding this comment

RaduBerinde left a comment

Choose a reason for hiding this comment

rytaft left a comment

Choose a reason for hiding this comment

craig bot commented Mar 30, 2020

Build succeeded