-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opt: cast to identical types for set operations #60560
Conversation
72ea704
to
8795571
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 4 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde and @rytaft)
pkg/sql/opt/optbuilder/union.go, line 48 at r1 (raw file):
outScope = inScope.push() //// propagateTypesLeft/propagateTypesRight indicate whether we need to wrap
This seems like a leftover?
pkg/sql/opt/optbuilder/union.go, line 152 at r1 (raw file):
// Equivalent but not identical types. Use the type from the left and add // a cast on the right-hand side. // TODO(radu): perhaps we should do a best effort attempt to choose the
We're actually doing something similar here
cockroach/pkg/sql/colexec/mergejoiner.go
Lines 253 to 271 in fe919cc
// The types are different and both are numeric, so we need to plan | |
// a cast. There is a hierarchy of valid casts: | |
// INT2 -> INT4 -> INT8 -> FLOAT -> DECIMAL | |
// and the cast is valid if 'fromType' is mentioned before 'toType' | |
// in this chain. | |
castLeftToRight := false | |
switch leftType.Family() { | |
case types.IntFamily: | |
switch leftType.Width() { | |
case 16: | |
castLeftToRight = true | |
case 32: | |
castLeftToRight = !rightType.Identical(types.Int2) | |
default: | |
castLeftToRight = rightType.Family() != types.IntFamily | |
} | |
case types.FloatFamily: | |
castLeftToRight = rightType.Family() == types.DecimalFamily | |
} |
Possibly we could extract that code and reuse it here to make the determination.
Postgres does support UNIONing an INT with a DECIMAL:
yuzefovich=# create table t (i int, d decimal);
CREATE TABLE
yuzefovich=# insert into t values (1, 1);
INSERT 0 1
yuzefovich=# select u, pg_typeof(u) from (select i from t union all select d from t) as t(u);
u | pg_typeof
---+-----------
1 | numeric
1 | numeric
(2 rows)
Also, the postgres upcasts:
yuzefovich=# create table t (a int4, b int8);
CREATE TABLE
yuzefovich=# insert into t values (1, 1);
INSERT 0 1
yuzefovich=# select u, pg_typeof(u) from (select a from t union all select b from t) as t(u);
u | pg_typeof
---+-----------
1 | bigint
1 | bigint
(2 rows)
yuzefovich=# select u, pg_typeof(u) from (select b from t union all select a from t) as t(u);
u | pg_typeof
---+-----------
1 | bigint
1 | bigint
(2 rows)
I think it might be worth making us resemble Postgres here since we're touching the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have anything to add beyond the comments from @yuzefovich. I can take another look once you've addressed those. Otherwise looks good.
Reviewed 4 of 4 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. We now use the larger (according to Width()
) type when possible, and support a few numeric upcasts.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @yuzefovich)
pkg/sql/opt/optbuilder/union.go, line 48 at r1 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
This seems like a leftover?
Done.
pkg/sql/opt/optbuilder/union.go, line 152 at r1 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
We're actually doing something similar here
cockroach/pkg/sql/colexec/mergejoiner.go
Lines 253 to 271 in fe919cc
// The types are different and both are numeric, so we need to plan // a cast. There is a hierarchy of valid casts: // INT2 -> INT4 -> INT8 -> FLOAT -> DECIMAL // and the cast is valid if 'fromType' is mentioned before 'toType' // in this chain. castLeftToRight := false switch leftType.Family() { case types.IntFamily: switch leftType.Width() { case 16: castLeftToRight = true case 32: castLeftToRight = !rightType.Identical(types.Int2) default: castLeftToRight = rightType.Family() != types.IntFamily } case types.FloatFamily: castLeftToRight = rightType.Family() == types.DecimalFamily } Possibly we could extract that code and reuse it here to make the determination.
Postgres does support UNIONing an INT with a DECIMAL:
yuzefovich=# create table t (i int, d decimal); CREATE TABLE yuzefovich=# insert into t values (1, 1); INSERT 0 1 yuzefovich=# select u, pg_typeof(u) from (select i from t union all select d from t) as t(u); u | pg_typeof ---+----------- 1 | numeric 1 | numeric (2 rows)
Also, the postgres upcasts:
yuzefovich=# create table t (a int4, b int8); CREATE TABLE yuzefovich=# insert into t values (1, 1); INSERT 0 1 yuzefovich=# select u, pg_typeof(u) from (select a from t union all select b from t) as t(u); u | pg_typeof ---+----------- 1 | bigint 1 | bigint (2 rows) yuzefovich=# select u, pg_typeof(u) from (select b from t union all select a from t) as t(u); u | pg_typeof ---+----------- 1 | bigint 1 | bigint (2 rows)
I think it might be worth making us resemble Postgres here since we're touching the code.
Done.
8795571
to
fed07d4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 4 files at r2.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @RaduBerinde)
pkg/sql/opt/optbuilder/union_test.go, line 1 at r2 (raw file):
// Copyright 2020 The Cockroach Authors.
nit: s/2020/2021/
.
pkg/sql/opt/optbuilder/union_test.go, line 54 at r2 (raw file):
}, { // At the same scale, we use the left type.
nit: seems like we have different scales here.
fed07d4
to
c130f6a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 4 files at r2, 1 of 1 files at r3.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @RaduBerinde)
pkg/sql/opt/optbuilder/with.go, line 196 at r3 (raw file):
panic(pgerror.Newf( pgcode.DatatypeMismatch, "UNION types %s and %s cannot be matched",
[nit] it's possible that the types would be allowed in a normal UNION, just not in a recursive CTE, right? Might be worth mentioning the WITH RECURSIVE
part if so.
c130f6a
to
6195a7f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @rytaft)
pkg/sql/opt/optbuilder/with.go, line 196 at r3 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
[nit] it's possible that the types would be allowed in a normal UNION, just not in a recursive CTE, right? Might be worth mentioning the
WITH RECURSIVE
part if so.
Done, and added a test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r4.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale)
6195a7f
to
5e0341c
Compare
This change makes the optbuilder more strict when building set operations. Previously, it could build expressions which have corresponding left/right types which are `Equivalent()`, but not `Identical()`. This leads to errors in vectorized execution, when we e.g. try to union a INT8 with an INT4. We now make the types on both sides `Identical()`, adding casts as necessary. We try to do a best-effort attempt to use the larger numeric type when possible (e.g. int4->int8, int->float, float->decimal). Fixes cockroachdb#59148. Release note (bug fix): fixed execution errors for some queries that use set operations (UNION / EXCEPT / INTERSECT) where a column has types of different widths on the two sides (e.g. INT4 vs INT8).
5e0341c
to
4037bd8
Compare
bors r+ |
Build succeeded: |
Actually, would it be easy to add casts for inputs to a join operator (I'm interested in the merge joiner in particular, but I'm guessing this should also happen during the optbuild stage, before we know which kind of joiner we will be using) so that equality columns are of the same type (in the same cases with numeric types of different widths or families)? This would allow us to get rid of the casts that are planned for the merge joiner in the vectorized engine I mentioned above. |
It would be pretty different - merge joins are generated during exploration. I think probably the xform rule or the execbuilder would be the best place for that. Can you file an issue (and include a pointer to the exec code for mergejoins)? |
I think we should backport this. We've seen a couple of sentry reports that could have been caused by this issue (e.g. #61023). |
In cockroachdb#60560, we made the matching of types in UNIONs more strict. In the recursive CTE code, we don't allow adding casts to the initial expression, so the change caused us to regress in terms of supported queries. This change fixes this by allowing casts to the initial expression. Not sure why I didn't allow this from the get-go. Release note (bug fix): fixed "types cannot be matched for WITH RECURSIVE" error in cases where we can cast the type in the initial expression.
62808: opt: allow casts in initial CTE expression r=RaduBerinde a=RaduBerinde In #60560, we made the matching of types in UNIONs more strict. In the recursive CTE code, we don't allow adding casts to the initial expression, so the change caused us to regress in terms of supported queries. This change fixes this by allowing casts to the initial expression. Not sure why I didn't allow this from the get-go. Release note (bug fix): fixed "types cannot be matched for WITH RECURSIVE" error in cases where we can cast the type in the initial expression. Co-authored-by: Radu Berinde <[email protected]>
In cockroachdb#60560, we made the matching of types in UNIONs more strict. In the recursive CTE code, we don't allow adding casts to the initial expression, so the change caused us to regress in terms of supported queries. This change fixes this by allowing casts to the initial expression. Not sure why I didn't allow this from the get-go. Release note (bug fix): fixed "types cannot be matched for WITH RECURSIVE" error in cases where we can cast the type in the initial expression.
In cockroachdb#60560, we made the matching of types in UNIONs more strict. In the recursive CTE code, we don't allow adding casts to the initial expression, so the change caused us to regress in terms of supported queries. This change fixes this by allowing casts to the initial expression. Not sure why I didn't allow this from the get-go. Release note (bug fix): fixed "types cannot be matched for WITH RECURSIVE" error in cases where we can cast the type in the initial expression.
The optbuilder adds casts in order to match non-identical left/right types of set operations like UNION (see cockroachdb#60560). This change prevents the optbuilder from adding casts that are invalid, which caused internal errors. Now, a user error is thrown if no such cast exists from the left or right input type to the output type. Release note (bug fix): A bug has been fixed that caused internal errors in queries with set operations, like UNION, when corresponding columns on either side of the set operation were not the same. This error only occurred with a limited set of types. This bug is present in versions 20.2.6+, 21.1.0+, and 21.2.0+.
75219: opt: do not add invalid casts to match types in set operations r=mgartner a=mgartner The optbuilder adds casts in order to match non-identical left/right types of set operations like UNION (see #60560). This change prevents the optbuilder from adding casts that are invalid, which caused internal errors. Now, a user error is thrown if no such cast exists from the left or right input type to the output type. Fixes #70831 Release note (bug fix): A bug has been fixed that caused internal errors in queries with set operations, like UNION, when corresponding columns on either side of the set operation were not the same. This error only occurred with a limited set of types. This bug is present in versions 20.2.6+, 21.1.0+, and 21.2.0+. Co-authored-by: Marcus Gartner <[email protected]>
The optbuilder adds casts in order to match non-identical left/right types of set operations like UNION (see #60560). This change prevents the optbuilder from adding casts that are invalid, which caused internal errors. Now, a user error is thrown if no such cast exists from the left or right input type to the output type. Release note (bug fix): A bug has been fixed that caused internal errors in queries with set operations, like UNION, when corresponding columns on either side of the set operation were not the same. This error only occurred with a limited set of types. This bug is present in versions 20.2.6+, 21.1.0+, and 21.2.0+.
The optbuilder adds casts in order to match non-identical left/right types of set operations like UNION (see #60560). This change prevents the optbuilder from adding casts that are invalid, which caused internal errors. Now, a user error is thrown if no such cast exists from the left or right input type to the output type. Release note (bug fix): A bug has been fixed that caused internal errors in queries with set operations, like UNION, when corresponding columns on either side of the set operation were not the same. This error only occurred with a limited set of types. This bug is present in versions 20.2.6+, 21.1.0+, and 21.2.0+.
The optbuilder adds casts in order to match non-identical left/right types of set operations like UNION (see cockroachdb#60560). This change prevents the optbuilder from adding casts that are invalid, which caused internal errors. Now, a user error is thrown if no such cast exists from the left or right input type to the output type. Release note (bug fix): A bug has been fixed that caused internal errors in queries with set operations, like UNION, when corresponding columns on either side of the set operation were not the same. This error only occurred with a limited set of types. This bug is present in versions 20.2.6+, 21.1.0+, and 21.2.0+.
The optbuilder adds casts in order to match non-identical left/right types of set operations like UNION (see cockroachdb#60560). This change prevents the optbuilder from adding casts that are invalid, which caused internal errors. Now, a user error is thrown if no such cast exists from the left or right input type to the output type. Release note (bug fix): A bug has been fixed that caused internal errors in queries with set operations, like UNION, when corresponding columns on either side of the set operation were not the same. This error only occurred with a limited set of types. This bug is present in versions 20.2.6+, 21.1.0+, and 21.2.0+.
The optbuilder adds casts in order to match non-identical left/right types of set operations like UNION (see #60560). This change prevents the optbuilder from adding casts that are invalid, which caused internal errors. Now, a user error is thrown if no such cast exists from the left or right input type to the output type. Release note (bug fix): A bug has been fixed that caused internal errors in queries with set operations, like UNION, when corresponding columns on either side of the set operation were not the same. This error only occurred with a limited set of types. This bug is present in versions 20.2.6+, 21.1.0+, and 21.2.0+.
This change makes the optbuilder more strict when building set
operations. Previously, it could build expressions which have
corresponding left/right types which are
Equivalent()
, but notIdentical()
. This leads to errors in vectorized execution, when wee.g. try to union a INT8 with an INT4.
We now make the types on both sides
Identical()
, adding casts asnecessary. We try to do a best-effort attempt to use the larger
numeric type when possible (e.g. int4->int8, int->float, float->decimal).
Fixes #59148.
Release note (bug fix): fixed execution errors for some queries that
use set operations (UNION / EXCEPT / INTERSECT) where a column has
types of different widths on the two sides (e.g. INT4 vs INT8).