Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: cast to identical types for set operations #60560

Merged
merged 1 commit into from
Feb 18, 2021

Conversation

RaduBerinde
Copy link
Member

@RaduBerinde RaduBerinde commented Feb 14, 2021

This change makes the optbuilder more strict when building set
operations. Previously, it could build expressions which have
corresponding left/right types which are Equivalent(), but not
Identical(). This leads to errors in vectorized execution, when we
e.g. try to union a INT8 with an INT4.

We now make the types on both sides Identical(), adding casts as
necessary. We try to do a best-effort attempt to use the larger
numeric type when possible (e.g. int4->int8, int->float, float->decimal).

Fixes #59148.

Release note (bug fix): fixed execution errors for some queries that
use set operations (UNION / EXCEPT / INTERSECT) where a column has
types of different widths on the two sides (e.g. INT4 vs INT8).

@RaduBerinde RaduBerinde requested a review from a team as a code owner February 14, 2021 03:41
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 4 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde and @rytaft)


pkg/sql/opt/optbuilder/union.go, line 48 at r1 (raw file):

	outScope = inScope.push()

	//// propagateTypesLeft/propagateTypesRight indicate whether we need to wrap

This seems like a leftover?


pkg/sql/opt/optbuilder/union.go, line 152 at r1 (raw file):

			// Equivalent but not identical types. Use the type from the left and add
			// a cast on the right-hand side.
			// TODO(radu): perhaps we should do a best effort attempt to choose the

We're actually doing something similar here

// The types are different and both are numeric, so we need to plan
// a cast. There is a hierarchy of valid casts:
// INT2 -> INT4 -> INT8 -> FLOAT -> DECIMAL
// and the cast is valid if 'fromType' is mentioned before 'toType'
// in this chain.
castLeftToRight := false
switch leftType.Family() {
case types.IntFamily:
switch leftType.Width() {
case 16:
castLeftToRight = true
case 32:
castLeftToRight = !rightType.Identical(types.Int2)
default:
castLeftToRight = rightType.Family() != types.IntFamily
}
case types.FloatFamily:
castLeftToRight = rightType.Family() == types.DecimalFamily
}

Possibly we could extract that code and reuse it here to make the determination.

Postgres does support UNIONing an INT with a DECIMAL:

yuzefovich=# create table t (i int, d decimal);
CREATE TABLE
yuzefovich=# insert into t values (1, 1);
INSERT 0 1
yuzefovich=# select u, pg_typeof(u) from (select i from t union all select d from t) as t(u);
 u | pg_typeof 
---+-----------
 1 | numeric
 1 | numeric
(2 rows)

Also, the postgres upcasts:

yuzefovich=# create table t (a int4, b int8);
CREATE TABLE
yuzefovich=# insert into t values (1, 1);
INSERT 0 1
yuzefovich=# select u, pg_typeof(u) from (select a from t union all select b from t) as t(u);
 u | pg_typeof 
---+-----------
 1 | bigint
 1 | bigint
(2 rows)

yuzefovich=# select u, pg_typeof(u) from (select b from t union all select a from t) as t(u);
 u | pg_typeof 
---+-----------
 1 | bigint
 1 | bigint
(2 rows)

I think it might be worth making us resemble Postgres here since we're touching the code.

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have anything to add beyond the comments from @yuzefovich. I can take another look once you've addressed those. Otherwise looks good.

Reviewed 4 of 4 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde)

Copy link
Member Author

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. We now use the larger (according to Width()) type when possible, and support a few numeric upcasts.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @yuzefovich)


pkg/sql/opt/optbuilder/union.go, line 48 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

This seems like a leftover?

Done.


pkg/sql/opt/optbuilder/union.go, line 152 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

We're actually doing something similar here

// The types are different and both are numeric, so we need to plan
// a cast. There is a hierarchy of valid casts:
// INT2 -> INT4 -> INT8 -> FLOAT -> DECIMAL
// and the cast is valid if 'fromType' is mentioned before 'toType'
// in this chain.
castLeftToRight := false
switch leftType.Family() {
case types.IntFamily:
switch leftType.Width() {
case 16:
castLeftToRight = true
case 32:
castLeftToRight = !rightType.Identical(types.Int2)
default:
castLeftToRight = rightType.Family() != types.IntFamily
}
case types.FloatFamily:
castLeftToRight = rightType.Family() == types.DecimalFamily
}

Possibly we could extract that code and reuse it here to make the determination.

Postgres does support UNIONing an INT with a DECIMAL:

yuzefovich=# create table t (i int, d decimal);
CREATE TABLE
yuzefovich=# insert into t values (1, 1);
INSERT 0 1
yuzefovich=# select u, pg_typeof(u) from (select i from t union all select d from t) as t(u);
 u | pg_typeof 
---+-----------
 1 | numeric
 1 | numeric
(2 rows)

Also, the postgres upcasts:

yuzefovich=# create table t (a int4, b int8);
CREATE TABLE
yuzefovich=# insert into t values (1, 1);
INSERT 0 1
yuzefovich=# select u, pg_typeof(u) from (select a from t union all select b from t) as t(u);
 u | pg_typeof 
---+-----------
 1 | bigint
 1 | bigint
(2 rows)

yuzefovich=# select u, pg_typeof(u) from (select b from t union all select a from t) as t(u);
 u | pg_typeof 
---+-----------
 1 | bigint
 1 | bigint
(2 rows)

I think it might be worth making us resemble Postgres here since we're touching the code.

Done.

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! :lgtm:

Reviewed 4 of 4 files at r2.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @RaduBerinde)


pkg/sql/opt/optbuilder/union_test.go, line 1 at r2 (raw file):

// Copyright 2020 The Cockroach Authors.

nit: s/2020/2021/.


pkg/sql/opt/optbuilder/union_test.go, line 54 at r2 (raw file):

		},
		{
			// At the same scale, we use the left type.

nit: seems like we have different scales here.

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 3 of 4 files at r2, 1 of 1 files at r3.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @RaduBerinde)


pkg/sql/opt/optbuilder/with.go, line 196 at r3 (raw file):

				panic(pgerror.Newf(
					pgcode.DatatypeMismatch,
					"UNION types %s and %s cannot be matched",

[nit] it's possible that the types would be allowed in a normal UNION, just not in a recursive CTE, right? Might be worth mentioning the WITH RECURSIVE part if so.

Copy link
Member Author

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @rytaft)


pkg/sql/opt/optbuilder/with.go, line 196 at r3 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

[nit] it's possible that the types would be allowed in a normal UNION, just not in a recursive CTE, right? Might be worth mentioning the WITH RECURSIVE part if so.

Done, and added a test.

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r4.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale)

This change makes the optbuilder more strict when building set
operations. Previously, it could build expressions which have
corresponding left/right types which are `Equivalent()`, but not
`Identical()`. This leads to errors in vectorized execution, when we
e.g. try to union a INT8 with an INT4.

We now make the types on both sides `Identical()`, adding casts as
necessary. We try to do a best-effort attempt to use the larger
numeric type when possible (e.g. int4->int8, int->float, float->decimal).

Fixes cockroachdb#59148.

Release note (bug fix): fixed execution errors for some queries that
use set operations (UNION / EXCEPT / INTERSECT) where a column has
types of different widths on the two sides (e.g. INT4 vs INT8).
@RaduBerinde
Copy link
Member Author

bors r+

@craig
Copy link
Contributor

craig bot commented Feb 18, 2021

Build succeeded:

@craig craig bot merged commit 40a35fe into cockroachdb:master Feb 18, 2021
@yuzefovich
Copy link
Member

Actually, would it be easy to add casts for inputs to a join operator (I'm interested in the merge joiner in particular, but I'm guessing this should also happen during the optbuild stage, before we know which kind of joiner we will be using) so that equality columns are of the same type (in the same cases with numeric types of different widths or families)? This would allow us to get rid of the casts that are planned for the merge joiner in the vectorized engine I mentioned above.

@RaduBerinde
Copy link
Member Author

It would be pretty different - merge joins are generated during exploration. I think probably the xform rule or the execbuilder would be the best place for that. Can you file an issue (and include a pointer to the exec code for mergejoins)?

@yuzefovich
Copy link
Member

I think we should backport this. We've seen a couple of sentry reports that could have been caused by this issue (e.g. #61023).

@RaduBerinde RaduBerinde deleted the union-casts branch February 24, 2021 23:46
RaduBerinde added a commit to RaduBerinde/cockroach that referenced this pull request Mar 30, 2021
In cockroachdb#60560, we made the matching of types in UNIONs more strict. In the
recursive CTE code, we don't allow adding casts to the initial
expression, so the change caused us to regress in terms of supported
queries.

This change fixes this by allowing casts to the initial expression.
Not sure why I didn't allow this from the get-go.

Release note (bug fix): fixed "types cannot be matched for WITH
RECURSIVE" error in cases where we can cast the type in the initial
expression.
craig bot pushed a commit that referenced this pull request Mar 30, 2021
62808: opt: allow casts in initial CTE expression r=RaduBerinde a=RaduBerinde

In #60560, we made the matching of types in UNIONs more strict. In the
recursive CTE code, we don't allow adding casts to the initial
expression, so the change caused us to regress in terms of supported
queries.

This change fixes this by allowing casts to the initial expression.
Not sure why I didn't allow this from the get-go.

Release note (bug fix): fixed "types cannot be matched for WITH
RECURSIVE" error in cases where we can cast the type in the initial
expression.

Co-authored-by: Radu Berinde <[email protected]>
RaduBerinde added a commit to RaduBerinde/cockroach that referenced this pull request Mar 30, 2021
In cockroachdb#60560, we made the matching of types in UNIONs more strict. In the
recursive CTE code, we don't allow adding casts to the initial
expression, so the change caused us to regress in terms of supported
queries.

This change fixes this by allowing casts to the initial expression.
Not sure why I didn't allow this from the get-go.

Release note (bug fix): fixed "types cannot be matched for WITH
RECURSIVE" error in cases where we can cast the type in the initial
expression.
RaduBerinde added a commit to RaduBerinde/cockroach that referenced this pull request Mar 30, 2021
In cockroachdb#60560, we made the matching of types in UNIONs more strict. In the
recursive CTE code, we don't allow adding casts to the initial
expression, so the change caused us to regress in terms of supported
queries.

This change fixes this by allowing casts to the initial expression.
Not sure why I didn't allow this from the get-go.

Release note (bug fix): fixed "types cannot be matched for WITH
RECURSIVE" error in cases where we can cast the type in the initial
expression.
mgartner added a commit to mgartner/cockroach that referenced this pull request Jan 20, 2022
The optbuilder adds casts in order to match non-identical left/right
types of set operations like UNION (see cockroachdb#60560). This change prevents
the optbuilder from adding casts that are invalid, which caused internal
errors. Now, a user error is thrown if no such cast exists from the left
or right input type to the output type.

Release note (bug fix): A bug has been fixed that caused internal errors
in queries with set operations, like UNION, when corresponding columns
on either side of the set operation were not the same. This error only
occurred with a limited set of types. This bug is present in versions
20.2.6+, 21.1.0+, and 21.2.0+.
craig bot pushed a commit that referenced this pull request Jan 21, 2022
75219: opt: do not add invalid casts to match types in set operations r=mgartner a=mgartner

The optbuilder adds casts in order to match non-identical left/right
types of set operations like UNION (see #60560). This change prevents
the optbuilder from adding casts that are invalid, which caused internal
errors. Now, a user error is thrown if no such cast exists from the left
or right input type to the output type.

Fixes #70831

Release note (bug fix): A bug has been fixed that caused internal errors
in queries with set operations, like UNION, when corresponding columns
on either side of the set operation were not the same. This error only
occurred with a limited set of types. This bug is present in versions
20.2.6+, 21.1.0+, and 21.2.0+.

Co-authored-by: Marcus Gartner <[email protected]>
blathers-crl bot pushed a commit that referenced this pull request Jan 21, 2022
The optbuilder adds casts in order to match non-identical left/right
types of set operations like UNION (see #60560). This change prevents
the optbuilder from adding casts that are invalid, which caused internal
errors. Now, a user error is thrown if no such cast exists from the left
or right input type to the output type.

Release note (bug fix): A bug has been fixed that caused internal errors
in queries with set operations, like UNION, when corresponding columns
on either side of the set operation were not the same. This error only
occurred with a limited set of types. This bug is present in versions
20.2.6+, 21.1.0+, and 21.2.0+.
mgartner added a commit that referenced this pull request Jan 21, 2022
The optbuilder adds casts in order to match non-identical left/right
types of set operations like UNION (see #60560). This change prevents
the optbuilder from adding casts that are invalid, which caused internal
errors. Now, a user error is thrown if no such cast exists from the left
or right input type to the output type.

Release note (bug fix): A bug has been fixed that caused internal errors
in queries with set operations, like UNION, when corresponding columns
on either side of the set operation were not the same. This error only
occurred with a limited set of types. This bug is present in versions
20.2.6+, 21.1.0+, and 21.2.0+.
mgartner added a commit to mgartner/cockroach that referenced this pull request Jan 21, 2022
The optbuilder adds casts in order to match non-identical left/right
types of set operations like UNION (see cockroachdb#60560). This change prevents
the optbuilder from adding casts that are invalid, which caused internal
errors. Now, a user error is thrown if no such cast exists from the left
or right input type to the output type.

Release note (bug fix): A bug has been fixed that caused internal errors
in queries with set operations, like UNION, when corresponding columns
on either side of the set operation were not the same. This error only
occurred with a limited set of types. This bug is present in versions
20.2.6+, 21.1.0+, and 21.2.0+.
gtr pushed a commit to gtr/cockroach that referenced this pull request Jan 24, 2022
The optbuilder adds casts in order to match non-identical left/right
types of set operations like UNION (see cockroachdb#60560). This change prevents
the optbuilder from adding casts that are invalid, which caused internal
errors. Now, a user error is thrown if no such cast exists from the left
or right input type to the output type.

Release note (bug fix): A bug has been fixed that caused internal errors
in queries with set operations, like UNION, when corresponding columns
on either side of the set operation were not the same. This error only
occurred with a limited set of types. This bug is present in versions
20.2.6+, 21.1.0+, and 21.2.0+.
mgartner added a commit that referenced this pull request Jan 24, 2022
The optbuilder adds casts in order to match non-identical left/right
types of set operations like UNION (see #60560). This change prevents
the optbuilder from adding casts that are invalid, which caused internal
errors. Now, a user error is thrown if no such cast exists from the left
or right input type to the output type.

Release note (bug fix): A bug has been fixed that caused internal errors
in queries with set operations, like UNION, when corresponding columns
on either side of the set operation were not the same. This error only
occurred with a limited set of types. This bug is present in versions
20.2.6+, 21.1.0+, and 21.2.0+.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

opt: cast UNION inputs to identical types
4 participants