Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: computed column expressions incorrectly evaluated when dependent on certain types #69665

Closed
mgartner opened this issue Aug 31, 2021 · 8 comments · Fixed by #70722
Closed
Assignees
Labels
A-sql-mutations Mutation statements: UPDATE/INSERT/UPSERT/DELETE. A-sql-optimizer SQL logical planning and optimizations. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-queries SQL Queries Team

Comments

@mgartner
Copy link
Collaborator

mgartner commented Aug 31, 2021

A TLP failure was reported here. The following SQL is a reduced reproduction of the bug:

defaultdb> CREATE TABLE t (
  c CHAR,
  UNIQUE INDEX u (lower(c) ASC)
);
CREATE TABLE

defaultdb> INSERT INTO t (c) VALUES (' ':::STRING);
INSERT 1


defaultdb> DELETE FROM t WHERE true;
DELETE 1

-- This should return 0, not 1.
defaultdb> SELECT count(*) FROM t@u;
  count
---------
      1
(1 row)

If we look at the kvtrace of the INSERT and DELETE, the problem can be seen.

query T kvtrace
INSERT INTO t (c) VALUES (' ':::STRING)
----
CPut /Table/53/1/689335466071326721/0 -> /TUPLE/1:1:Bytes/
InitPut /Table/53/2/" "/0 -> /BYTES/0xfd099102f442428001

query T kvtrace
DELETE FROM t
----
Scan /Table/53/{1-2}
Del /Table/53/2/""/0
Del /Table/53/1/689335466071326721/0

The INSERT is incorrectly inserting the key " " in the unique secondary index and the DELETE is correctly deleting the key "".

This maybe be related to #69327.

@mgartner mgartner added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-sql-optimizer SQL logical planning and optimizations. A-sql-mutations Mutation statements: UPDATE/INSERT/UPSERT/DELETE. labels Aug 31, 2021
@mgartner mgartner self-assigned this Aug 31, 2021
@blathers-crl blathers-crl bot added the T-sql-queries SQL Queries Team label Aug 31, 2021
@mgartner
Copy link
Collaborator Author

mgartner commented Sep 1, 2021

I think this is the same issue as #69327.

Here's an optbuilder example that shows the problem in another way:

exec-ddl
CREATE TABLE t (
  c CHAR,
  v STRING AS (c) STORED
)
----

build
INSERT INTO t VALUES (' '::STRING)
----
insert t
 ├── columns: <none>
 ├── insert-mapping:
 │    ├── column1:6 => c:1
 │    ├── column1:6 => v:2
 │    └── rowid_default:7 => rowid:3
 └── project
      ├── columns: rowid_default:7 column1:6!null
      ├── values
      │    ├── columns: column1:6!null
      │    └── (' ',)
      └── projections
           └── unique_rowid() [as=rowid_default:7]

The ' ' value is used for both columns. Because CHAR is a space-padded type, the enforceLocalColumnConstraints function in the execution engine will truncate the value when inserting the value for c. But v will not be truncated because its type is STRING.

I'm not sure the best way to fix this. I think these cases are hinting that we should not be adjusting INSERT and UPDATE values at execution-time. I think we have to determine at planning time that the value of the types may change and automatically insert casts so that computed column values are correct.

cc @RaduBerinde @otan @rafiss any thoughts?

@mgartner mgartner changed the title sql: incorrect expression index keys deleted during DELETE sql: computed column expressions incorrectly evaluated when dependent on certain types Sep 1, 2021
@RaduBerinde
Copy link
Member

Feels like buildInputForInsert (and others) should add casts if necessary. And checkDatumTypeFitsColumnType should probably disallow this case. This is related to #67083: we have some special code related to rounding decimals but that feels like it could be equally well handled using casts.

It's also possible that the fixed-sized chars should be a different family than StringFamily because apparently you can't simply use a DString datum as a fixed-sized char.

@mgartner
Copy link
Collaborator Author

mgartner commented Sep 1, 2021

Feels like buildInputForInsert (and others) should add casts if necessary. And checkDatumTypeFitsColumnType should probably disallow this case. This is related to #67083: we have some special code related to rounding decimals but that feels like it could be equally well handled using casts.

I'll attempt to add casts and see if #67083 can be addressed too.

It's also possible that the fixed-sized chars should be a different family than StringFamily because apparently you can't simply use a DString datum as a fixed-sized char.

Based on this definition of type families, it sounds like fixed-sized chars should be in the StringFamily. But this is something I did think about. Maybe our definition needs to be updated?

// Family specifies a group of types that are compatible with one another. Types
// in the same family can be compared, assigned, etc., but may differ from one
// another in width, precision, locale, and other attributes. For example, it is
// always an error to insert an INT value into a FLOAT column, because they are
// not in the same family. However, values of different types within the same
// family are "insert-compatible" with one another. Insertion may still result
// in an error because of width overflow or other constraints, but it can at
// least be attempted.

@RaduBerinde
Copy link
Member

I checked what pg does. It seems that it allows implicit conversions, but only if characters aren't truncated. It's worth noting that it will truncate spaces though:

radu=> create table t (a char(2));
radu=> insert into t values ('abc'::text);
ERROR:  value too long for type character(2)
radu=> insert into t values ('ab'::char(4));
INSERT 0 1
radu=> insert into t values ('abc'::char(4));
ERROR:  value too long for type character(2)
radu=> insert into t values ('ab  '::char(4));
INSERT 0 1
radu=> insert into t values ('ab  '::text);
INSERT 0 1

We won't be able to achieve the same result if we just force casts - in that case, we will silently truncate.

Maybe we need a new flavor of cast that is more strict?

@rafiss
Copy link
Collaborator

rafiss commented Sep 1, 2021

It's worth noting that it will truncate spaces though

just to clarify:

The following is a snippet from a postgres CLI, and CRDB matches this behavior:

rafiss@127:postgres> create table chars(a char, b char(2), c "char");
CREATE TABLE

rafiss@127:postgres> insert into chars values ('c  ', 'ca  ', 'cat');
INSERT 0 1

rafiss@127:postgres> insert into chars values ('  d', '  do', 'dog');
value too long for type character(1)

rafiss@127:postgres> select * from chars;
+-----+-----+-----+
| a   | b   | c   |
|-----+-----+-----|
| c   | ca  | c   |
+-----+-----+-----+

rafiss@127:postgres> select 'cat'::char, 'cat'::char(2), 'cat'::"char";
+----------+----------+--------+
| bpchar   | bpchar   | char   |
|----------+----------+--------|
| c        | ca       | c      |
+----------+----------+--------+

@mgartner
Copy link
Collaborator Author

mgartner commented Sep 1, 2021

It seems like there's no methods to these semantics. Each type has different casting behavior when casting between types in the same family and with different widths, and the "casting" performed during INSERT/UPDATE is defined differently...

@RaduBerinde
Copy link
Member

Wow. What a laughable naming choice, "char".

@mgartner
Copy link
Collaborator Author

mgartner commented Sep 1, 2021

The scare quotes are quite fitting. 😆

mgartner added a commit to mgartner/cockroach that referenced this issue Sep 24, 2021
Casts in Postgres are performed in one of three contexts [1]:

  1. An explicit context with `CAST(x AS T)` or `x::T`.
  2. An assignment context performed implicitly during an INSERT,
     UPSERT, or UPDATE.
  3. An implicit context during the evaluation of an expression. For
     example the DATE in `'2021-01-02'::DATE < now()` will be implicitly
     cast to a TIMESTAMPTZ so that the values can be compared.

Not all casts can be performed in all contexts. Postgres's pg_cast table
lists valid casts and specifies the maximum cast context in which each
can be performed. A cast with a max context of explicit can only be
performed in an explicit context. A cast with an assignment max context
can be performed in an explicit context or an assignment context. A cast
with an implicit max context can be performed in all contexts.

Much to my personal disappointment and frustration, there are valid
casts that are not listed in Postgres's pg_cast table. These casts are
called "automatic I/O conversions" and they allow casting most types to
and from the string types: TEXT, VARCHAR, CHAR, NAME, and "char" [2].
We cannot determine these casts' maximum cast context from the pg_cast
table, so we rely on the documentation which states that conversions to
string types are assignment casts and conversions from string types are
explicit casts [3].

--

This commit implements assignment casts for INSERTs. Follow up work will
implement assignment casts for UPSERTs and UPDATEs.

A cast performed in an assignment context has slightly different
behavior than the same cast performed in an explicit context. In an
assignment context, the cast will error if the width of the value is too
large for the given type. In an explicit context, the value will be
truncated to match the width. The one exception is assignment casts to
the special "char" type which do truncate values.

To support different cast behaviors for different contexts, a new
built-in, `crdb_internal.assignment_cast` has been introduced. This
function takes two arguments: a value and a type. Because SQL
does not have first-class types, a type cannot be passed directly to the
built-in. Instead, a `NULL` cast to a type is used as a workaround,
similar to the `json_populate_record` built-in. For example, an integer
can be assignment-cast to a string with:

    crdb_internal.assignment_cast(1::INT, NULL::STRING)

Optbuilder is responsible for wrapping INSERT columns with the
assignment cast built-in function. If an insert column type `T1` is not
identical to the table's corresponding target column type `T2`,
optbuilder will check if there is a valid cast from `T1` to `T2` with a
maximum context that allows an assignment cast. If there is a such a
cast, a projection will wrap the column in the assignment cast built-in
function. If there is no such cast, a user error will be produced.

The introduction of assignment casts fixes minor bugs and addresses some
inconsistencies with Postgres's behavior. In general, INSERTS now
successfully cast values to target table column types in more cases. As
one example, inserting a string into an integer column now succeeds:

    CREATE TABLE t (i INT)
    INSERT INTO t VALUES ('1'::STRING)

Prior to this commit there was logic that mimicked assignment casts, but
it was not correct. Bugs in the implementation caused incorrect behavior
when inserting into tables with computed columns. Most notably, a
computed column expression that referenced another column `c` was
evaluated with the value of `c` before the assignment cast was
performed. This resulted in incorrect values for computed columns in
some cases.

In addition, assignment casts make the special logic for rounding
decimal values in optbuilder obsolete. The builtin function
`crdb_internal.round_decimal_values` and related logic in optbuilder
will be removed once assignment casts are implemented for UPSERTs and
UPDATEs.

Fixes cockroachdb#69327
Fixes cockroachdb#69665

[1] https://www.postgresql.org/docs/current/typeconv.html
[2] https://www.postgresql.org/docs/13/catalog-pg-cast.html#CATALOG-PG-CAST
[3] https://www.postgresql.org/docs/13/sql-createcast.html#SQL-CREATECAST-NOTES

Release note (sql change): Implicit casts performed during INSERT
statements now more closely follow Postgres's behavior. Several minor
bugs related to these types of casts have been fixed.
mgartner added a commit to mgartner/cockroach that referenced this issue Sep 24, 2021
Casts in Postgres are performed in one of three contexts [1]:

  1. An explicit context with `CAST(x AS T)` or `x::T`.
  2. An assignment context performed implicitly during an INSERT,
     UPSERT, or UPDATE.
  3. An implicit context during the evaluation of an expression. For
     example the DATE in `'2021-01-02'::DATE < now()` will be implicitly
     cast to a TIMESTAMPTZ so that the values can be compared.

Not all casts can be performed in all contexts. Postgres's pg_cast table
lists valid casts and specifies the maximum cast context in which each
can be performed. A cast with a max context of explicit can only be
performed in an explicit context. A cast with an assignment max context
can be performed in an explicit context or an assignment context. A cast
with an implicit max context can be performed in all contexts.

Much to my personal disappointment and frustration, there are valid
casts that are not listed in Postgres's pg_cast table. These casts are
called "automatic I/O conversions" and they allow casting most types to
and from the string types: TEXT, VARCHAR, CHAR, NAME, and "char" [2].
We cannot determine these casts' maximum cast context from the pg_cast
table, so we rely on the documentation which states that conversions to
string types are assignment casts and conversions from string types are
explicit casts [3].

--

This commit implements assignment casts for INSERTs. Follow up work will
implement assignment casts for UPSERTs and UPDATEs.

A cast performed in an assignment context has slightly different
behavior than the same cast performed in an explicit context. In an
assignment context, the cast will error if the width of the value is too
large for the given type. In an explicit context, the value will be
truncated to match the width. The one exception is assignment casts to
the special "char" type which do truncate values.

To support different cast behaviors for different contexts, a new
built-in, `crdb_internal.assignment_cast` has been introduced. This
function takes two arguments: a value and a type. Because SQL
does not have first-class types, a type cannot be passed directly to the
built-in. Instead, a `NULL` cast to a type is used as a workaround,
similar to the `json_populate_record` built-in. For example, an integer
can be assignment-cast to a string with:

    crdb_internal.assignment_cast(1::INT, NULL::STRING)

Optbuilder is responsible for wrapping INSERT columns with the
assignment cast built-in function. If an insert column type `T1` is not
identical to the table's corresponding target column type `T2`,
optbuilder will check if there is a valid cast from `T1` to `T2` with a
maximum context that allows an assignment cast. If there is a such a
cast, a projection will wrap the column in the assignment cast built-in
function. If there is no such cast, a user error will be produced.

The introduction of assignment casts fixes minor bugs and addresses some
inconsistencies with Postgres's behavior. In general, INSERTS now
successfully cast values to target table column types in more cases. As
one example, inserting a string into an integer column now succeeds:

    CREATE TABLE t (i INT)
    INSERT INTO t VALUES ('1'::STRING)

Prior to this commit there was logic that mimicked assignment casts, but
it was not correct. Bugs in the implementation caused incorrect behavior
when inserting into tables with computed columns. Most notably, a
computed column expression that referenced another column `c` was
evaluated with the value of `c` before the assignment cast was
performed. This resulted in incorrect values for computed columns in
some cases.

In addition, assignment casts make the special logic for rounding
decimal values in optbuilder obsolete. The builtin function
`crdb_internal.round_decimal_values` and related logic in optbuilder
will be removed once assignment casts are implemented for UPSERTs and
UPDATEs.

Fixes cockroachdb#69327
Fixes cockroachdb#69665

[1] https://www.postgresql.org/docs/current/typeconv.html
[2] https://www.postgresql.org/docs/13/catalog-pg-cast.html#CATALOG-PG-CAST
[3] https://www.postgresql.org/docs/13/sql-createcast.html#SQL-CREATECAST-NOTES

Release note (sql change): Implicit casts performed during INSERT
statements now more closely follow Postgres's behavior. Several minor
bugs related to these types of casts have been fixed.
mgartner added a commit to mgartner/cockroach that referenced this issue Sep 27, 2021
Casts in Postgres are performed in one of three contexts [1]:

  1. An explicit context with `CAST(x AS T)` or `x::T`.
  2. An assignment context performed implicitly during an INSERT,
     UPSERT, or UPDATE.
  3. An implicit context during the evaluation of an expression. For
     example the DATE in `'2021-01-02'::DATE < now()` will be implicitly
     cast to a TIMESTAMPTZ so that the values can be compared.

Not all casts can be performed in all contexts. Postgres's pg_cast table
lists valid casts and specifies the maximum cast context in which each
can be performed. A cast with a max context of explicit can only be
performed in an explicit context. A cast with an assignment max context
can be performed in an explicit context or an assignment context. A cast
with an implicit max context can be performed in all contexts.

Much to my personal disappointment and frustration, there are valid
casts that are not listed in Postgres's pg_cast table. These casts are
called "automatic I/O conversions" and they allow casting most types to
and from the string types: TEXT, VARCHAR, CHAR, NAME, and "char" [2].
We cannot determine these casts' maximum cast context from the pg_cast
table, so we rely on the documentation which states that conversions to
string types are assignment casts and conversions from string types are
explicit casts [3].

--

This commit implements assignment casts for INSERTs. Follow up work will
implement assignment casts for UPSERTs and UPDATEs.

A cast performed in an assignment context has slightly different
behavior than the same cast performed in an explicit context. In an
assignment context, the cast will error if the width of the value is too
large for the given type. In an explicit context, the value will be
truncated to match the width. The one exception is assignment casts to
the special "char" type which do truncate values.

To support different cast behaviors for different contexts, a new
built-in, `crdb_internal.assignment_cast` has been introduced. This
function takes two arguments: a value and a type. Because SQL
does not have first-class types, a type cannot be passed directly to the
built-in. Instead, a `NULL` cast to a type is used as a workaround,
similar to the `json_populate_record` built-in. For example, an integer
can be assignment-cast to a string with:

    crdb_internal.assignment_cast(1::INT, NULL::STRING)

Optbuilder is responsible for wrapping INSERT columns with the
assignment cast built-in function. If an insert column type `T1` is not
identical to the table's corresponding target column type `T2`,
optbuilder will check if there is a valid cast from `T1` to `T2` with a
maximum context that allows an assignment cast. If there is a such a
cast, a projection will wrap the column in the assignment cast built-in
function. If there is no such cast, a user error will be produced.

The introduction of assignment casts fixes minor bugs and addresses some
inconsistencies with Postgres's behavior. In general, INSERTS now
successfully cast values to target table column types in more cases. As
one example, inserting a string into an integer column now succeeds:

    CREATE TABLE t (i INT)
    INSERT INTO t VALUES ('1'::STRING)

Prior to this commit there was logic that mimicked assignment casts, but
it was not correct. Bugs in the implementation caused incorrect behavior
when inserting into tables with computed columns. Most notably, a
computed column expression that referenced another column `c` was
evaluated with the value of `c` before the assignment cast was
performed. This resulted in incorrect values for computed columns in
some cases.

In addition, assignment casts make the special logic for rounding
decimal values in optbuilder obsolete. The builtin function
`crdb_internal.round_decimal_values` and related logic in optbuilder
will be removed once assignment casts are implemented for UPSERTs and
UPDATEs.

Fixes cockroachdb#69327
Fixes cockroachdb#69665

[1] https://www.postgresql.org/docs/current/typeconv.html
[2] https://www.postgresql.org/docs/13/catalog-pg-cast.html#CATALOG-PG-CAST
[3] https://www.postgresql.org/docs/13/sql-createcast.html#SQL-CREATECAST-NOTES

Release note (sql change): Implicit casts performed during INSERT
statements now more closely follow Postgres's behavior. Several minor
bugs related to these types of casts have been fixed.
mgartner added a commit to mgartner/cockroach that referenced this issue Sep 28, 2021
Casts in Postgres are performed in one of three contexts [1]:

  1. An explicit context with `CAST(x AS T)` or `x::T`.
  2. An assignment context performed implicitly during an INSERT,
     UPSERT, or UPDATE.
  3. An implicit context during the evaluation of an expression. For
     example the DATE in `'2021-01-02'::DATE < now()` will be implicitly
     cast to a TIMESTAMPTZ so that the values can be compared.

Not all casts can be performed in all contexts. Postgres's pg_cast table
lists valid casts and specifies the maximum cast context in which each
can be performed. A cast with a max context of explicit can only be
performed in an explicit context. A cast with an assignment max context
can be performed in an explicit context or an assignment context. A cast
with an implicit max context can be performed in all contexts.

Much to my personal disappointment and frustration, there are valid
casts that are not listed in Postgres's pg_cast table. These casts are
called "automatic I/O conversions" and they allow casting most types to
and from the string types: TEXT, VARCHAR, CHAR, NAME, and "char" [2].
We cannot determine these casts' maximum cast context from the pg_cast
table, so we rely on the documentation which states that conversions to
string types are assignment casts and conversions from string types are
explicit casts [3].

--

This commit implements assignment casts for INSERTs. Follow up work will
implement assignment casts for UPSERTs and UPDATEs.

A cast performed in an assignment context has slightly different
behavior than the same cast performed in an explicit context. In an
assignment context, the cast will error if the width of the value is too
large for the given type. In an explicit context, the value will be
truncated to match the width. The one exception is assignment casts to
the special "char" type which do truncate values.

To support different cast behaviors for different contexts, a new
built-in, `crdb_internal.assignment_cast` has been introduced. This
function takes two arguments: a value and a type. Because SQL
does not have first-class types, a type cannot be passed directly to the
built-in. Instead, a `NULL` cast to a type is used as a workaround,
similar to the `json_populate_record` built-in. For example, an integer
can be assignment-cast to a string with:

    crdb_internal.assignment_cast(1::INT, NULL::STRING)

Optbuilder is responsible for wrapping INSERT columns with the
assignment cast built-in function. If an insert column type `T1` is not
identical to the table's corresponding target column type `T2`,
optbuilder will check if there is a valid cast from `T1` to `T2` with a
maximum context that allows an assignment cast. If there is a such a
cast, a projection will wrap the column in the assignment cast built-in
function. If there is no such cast, a user error will be produced.

The introduction of assignment casts fixes minor bugs and addresses some
inconsistencies with Postgres's behavior. In general, INSERTS now
successfully cast values to target table column types in more cases. As
one example, inserting a string into an integer column now succeeds:

    CREATE TABLE t (i INT)
    INSERT INTO t VALUES ('1'::STRING)

Prior to this commit there was logic that mimicked assignment casts, but
it was not correct. Bugs in the implementation caused incorrect behavior
when inserting into tables with computed columns. Most notably, a
computed column expression that referenced another column `c` was
evaluated with the value of `c` before the assignment cast was
performed. This resulted in incorrect values for computed columns in
some cases.

In addition, assignment casts make the special logic for rounding
decimal values in optbuilder obsolete. The builtin function
`crdb_internal.round_decimal_values` and related logic in optbuilder
will be removed once assignment casts are implemented for UPSERTs and
UPDATEs.

Fixes cockroachdb#69327
Fixes cockroachdb#69665

[1] https://www.postgresql.org/docs/current/typeconv.html
[2] https://www.postgresql.org/docs/13/catalog-pg-cast.html#CATALOG-PG-CAST
[3] https://www.postgresql.org/docs/13/sql-createcast.html#SQL-CREATECAST-NOTES

Release note (sql change): Implicit casts performed during INSERT
statements now more closely follow Postgres's behavior. Several minor
bugs related to these types of casts have been fixed.
mgartner added a commit to mgartner/cockroach that referenced this issue Sep 28, 2021
Casts in Postgres are performed in one of three contexts [1]:

  1. An explicit context with `CAST(x AS T)` or `x::T`.
  2. An assignment context performed implicitly during an INSERT,
     UPSERT, or UPDATE.
  3. An implicit context during the evaluation of an expression. For
     example the DATE in `'2021-01-02'::DATE < now()` will be implicitly
     cast to a TIMESTAMPTZ so that the values can be compared.

Not all casts can be performed in all contexts. Postgres's pg_cast table
lists valid casts and specifies the maximum cast context in which each
can be performed. A cast with a max context of explicit can only be
performed in an explicit context. A cast with an assignment max context
can be performed in an explicit context or an assignment context. A cast
with an implicit max context can be performed in all contexts.

Much to my personal disappointment and frustration, there are valid
casts that are not listed in Postgres's pg_cast table. These casts are
called "automatic I/O conversions" and they allow casting most types to
and from the string types: TEXT, VARCHAR, CHAR, NAME, and "char" [2].
We cannot determine these casts' maximum cast context from the pg_cast
table, so we rely on the documentation which states that conversions to
string types are assignment casts and conversions from string types are
explicit casts [3].

--

This commit implements assignment casts for INSERTs. Follow up work will
implement assignment casts for UPSERTs and UPDATEs.

A cast performed in an assignment context has slightly different
behavior than the same cast performed in an explicit context. In an
assignment context, the cast will error if the width of the value is too
large for the given type. In an explicit context, the value will be
truncated to match the width. The one exception is assignment casts to
the special "char" type which do truncate values.

To support different cast behaviors for different contexts, a new
built-in, `crdb_internal.assignment_cast` has been introduced. This
function takes two arguments: a value and a type. Because SQL
does not have first-class types, a type cannot be passed directly to the
built-in. Instead, a `NULL` cast to a type is used as a workaround,
similar to the `json_populate_record` built-in. For example, an integer
can be assignment-cast to a string with:

    crdb_internal.assignment_cast(1::INT, NULL::STRING)

Optbuilder is responsible for wrapping INSERT columns with the
assignment cast built-in function. If an insert column type `T1` is not
identical to the table's corresponding target column type `T2`,
optbuilder will check if there is a valid cast from `T1` to `T2` with a
maximum context that allows an assignment cast. If there is a such a
cast, a projection will wrap the column in the assignment cast built-in
function. If there is no such cast, a user error will be produced.

The introduction of assignment casts fixes minor bugs and addresses some
inconsistencies with Postgres's behavior. In general, INSERTS now
successfully cast values to target table column types in more cases. As
one example, inserting a string into an integer column now succeeds:

    CREATE TABLE t (i INT)
    INSERT INTO t VALUES ('1'::STRING)

Prior to this commit there was logic that mimicked assignment casts, but
it was not correct. Bugs in the implementation caused incorrect behavior
when inserting into tables with computed columns. Most notably, a
computed column expression that referenced another column `c` was
evaluated with the value of `c` before the assignment cast was
performed. This resulted in incorrect values for computed columns in
some cases.

In addition, assignment casts make the special logic for rounding
decimal values in optbuilder obsolete. The builtin function
`crdb_internal.round_decimal_values` and related logic in optbuilder
will be removed once assignment casts are implemented for UPSERTs and
UPDATEs.

Fixes cockroachdb#69327
Fixes cockroachdb#69665

[1] https://www.postgresql.org/docs/current/typeconv.html
[2] https://www.postgresql.org/docs/13/catalog-pg-cast.html#CATALOG-PG-CAST
[3] https://www.postgresql.org/docs/13/sql-createcast.html#SQL-CREATECAST-NOTES

Release note (sql change): Implicit casts performed during INSERT
statements now more closely follow Postgres's behavior. Several minor
bugs related to these types of casts have been fixed.
mgartner added a commit to mgartner/cockroach that referenced this issue Oct 11, 2021
Casts in Postgres are performed in one of three contexts [1]:

  1. An explicit context with `CAST(x AS T)` or `x::T`.
  2. An assignment context performed implicitly during an INSERT,
     UPSERT, or UPDATE.
  3. An implicit context during the evaluation of an expression. For
     example the DATE in `'2021-01-02'::DATE < now()` will be implicitly
     cast to a TIMESTAMPTZ so that the values can be compared.

Not all casts can be performed in all contexts. Postgres's pg_cast table
lists valid casts and specifies the maximum cast context in which each
can be performed. A cast with a max context of explicit can only be
performed in an explicit context. A cast with an assignment max context
can be performed in an explicit context or an assignment context. A cast
with an implicit max context can be performed in all contexts.

Much to my personal disappointment and frustration, there are valid
casts that are not listed in Postgres's pg_cast table. These casts are
called "automatic I/O conversions" and they allow casting most types to
and from the string types: TEXT, VARCHAR, CHAR, NAME, and "char" [2].
We cannot determine these casts' maximum cast context from the pg_cast
table, so we rely on the documentation which states that conversions to
string types are assignment casts and conversions from string types are
explicit casts [3].

--

This commit implements assignment casts for INSERTs. Follow up work will
implement assignment casts for UPSERTs and UPDATEs.

A cast performed in an assignment context has slightly different
behavior than the same cast performed in an explicit context. In an
assignment context, the cast will error if the width of the value is too
large for the given type. In an explicit context, the value will be
truncated to match the width. The one exception is assignment casts to
the special "char" type which do truncate values.

To support different cast behaviors for different contexts, a new
built-in, `crdb_internal.assignment_cast` has been introduced. This
function takes two arguments: a value and a type. Because SQL
does not have first-class types, a type cannot be passed directly to the
built-in. Instead, a `NULL` cast to a type is used as a workaround,
similar to the `json_populate_record` built-in. For example, an integer
can be assignment-cast to a string with:

    crdb_internal.assignment_cast(1::INT, NULL::STRING)

The optimizer is responsible for wrapping INSERT columns with the
assignment cast built-in function. If an insert column type `T1` is not
identical to the table's corresponding target column type `T2`, the
optimizer will check if there is a valid cast from `T1` to `T2` with a
maximum context that allows an assignment cast. If there is a such a
cast, a projection will wrap the column in the assignment cast built-in
function. If there is no such cast, a user error will be produced.

Some changes to prepared statement placeholder type inference were
required in order to better match Postgres's behavior (this is a
best-effort match thus far and there are still inconsistencies). Most
notably, widths and precision are no longer inferred for the types of
placeholders. The effect of this is that assignment casts will be
correctly added by the optimizer in order to make sure that values for
placeholders are correctly coerced to the target column type during
execution of a prepared insert.

The introduction of assignment casts fixes minor bugs and addresses some
inconsistencies with Postgres's behavior. In general, INSERTS now
successfully cast values to target table column types in more cases. As
one example, inserting a string into an integer column now succeeds:

    CREATE TABLE t (i INT)
    INSERT INTO t VALUES ('1'::STRING)

Prior to this commit there was logic that mimicked assignment casts, but
it was not correct. Bugs in the implementation caused incorrect behavior
when inserting into tables with computed columns. Most notably, a
computed column expression that referenced another column `c` was
evaluated with the value of `c` before the assignment cast was
performed. This resulted in incorrect values for computed columns in
some cases.

In addition, assignment casts make the special logic for rounding
decimal values in optbuilder obsolete. The builtin function
`crdb_internal.round_decimal_values` and related logic in optbuilder
will be removed once assignment casts are implemented for UPSERTs and
UPDATEs.

Fixes cockroachdb#69327
Fixes cockroachdb#69665

[1] https://www.postgresql.org/docs/current/typeconv.html
[2] https://www.postgresql.org/docs/13/catalog-pg-cast.html#CATALOG-PG-CAST
[3] https://www.postgresql.org/docs/13/sql-createcast.html#SQL-CREATECAST-NOTES

Release note (sql change): Implicit casts performed during INSERT
statements now more closely follow Postgres's behavior. Several minor
bugs related to these types of casts have been fixed.
mgartner added a commit to mgartner/cockroach that referenced this issue Oct 11, 2021
Casts in Postgres are performed in one of three contexts [1]:

  1. An explicit context with `CAST(x AS T)` or `x::T`.
  2. An assignment context performed implicitly during an INSERT,
     UPSERT, or UPDATE.
  3. An implicit context during the evaluation of an expression. For
     example the DATE in `'2021-01-02'::DATE < now()` will be implicitly
     cast to a TIMESTAMPTZ so that the values can be compared.

Not all casts can be performed in all contexts. Postgres's pg_cast table
lists valid casts and specifies the maximum cast context in which each
can be performed. A cast with a max context of explicit can only be
performed in an explicit context. A cast with an assignment max context
can be performed in an explicit context or an assignment context. A cast
with an implicit max context can be performed in all contexts.

Much to my personal disappointment and frustration, there are valid
casts that are not listed in Postgres's pg_cast table. These casts are
called "automatic I/O conversions" and they allow casting most types to
and from the string types: TEXT, VARCHAR, CHAR, NAME, and "char" [2].
We cannot determine these casts' maximum cast context from the pg_cast
table, so we rely on the documentation which states that conversions to
string types are assignment casts and conversions from string types are
explicit casts [3].

--

This commit implements assignment casts for INSERTs. Follow up work will
implement assignment casts for UPSERTs and UPDATEs.

A cast performed in an assignment context has slightly different
behavior than the same cast performed in an explicit context. In an
assignment context, the cast will error if the width of the value is too
large for the given type. In an explicit context, the value will be
truncated to match the width. The one exception is assignment casts to
the special "char" type which do truncate values.

To support different cast behaviors for different contexts, a new
built-in, `crdb_internal.assignment_cast` has been introduced. This
function takes two arguments: a value and a type. Because SQL
does not have first-class types, a type cannot be passed directly to the
built-in. Instead, a `NULL` cast to a type is used as a workaround,
similar to the `json_populate_record` built-in. For example, an integer
can be assignment-cast to a string with:

    crdb_internal.assignment_cast(1::INT, NULL::STRING)

The optimizer is responsible for wrapping INSERT columns with the
assignment cast built-in function. If an insert column type `T1` is not
identical to the table's corresponding target column type `T2`, the
optimizer will check if there is a valid cast from `T1` to `T2` with a
maximum context that allows an assignment cast. If there is a such a
cast, a projection will wrap the column in the assignment cast built-in
function. If there is no such cast, a user error will be produced.

Some changes to prepared statement placeholder type inference were
required in order to better match Postgres's behavior (this is a
best-effort match thus far and there are still inconsistencies). Most
notably, widths and precision are no longer inferred for the types of
placeholders. The effect of this is that assignment casts will be
correctly added by the optimizer in order to make sure that values for
placeholders are correctly coerced to the target column type during
execution of a prepared insert.

The introduction of assignment casts fixes minor bugs and addresses some
inconsistencies with Postgres's behavior. In general, INSERTS now
successfully cast values to target table column types in more cases. As
one example, inserting a string into an integer column now succeeds:

    CREATE TABLE t (i INT)
    INSERT INTO t VALUES ('1'::STRING)

Prior to this commit there was logic that mimicked assignment casts, but
it was not correct. Bugs in the implementation caused incorrect behavior
when inserting into tables with computed columns. Most notably, a
computed column expression that referenced another column `c` was
evaluated with the value of `c` before the assignment cast was
performed. This resulted in incorrect values for computed columns in
some cases.

In addition, assignment casts make the special logic for rounding
decimal values in optbuilder obsolete. The builtin function
`crdb_internal.round_decimal_values` and related logic in optbuilder
will be removed once assignment casts are implemented for UPSERTs and
UPDATEs.

Fixes cockroachdb#69327
Fixes cockroachdb#69665

[1] https://www.postgresql.org/docs/current/typeconv.html
[2] https://www.postgresql.org/docs/13/catalog-pg-cast.html#CATALOG-PG-CAST
[3] https://www.postgresql.org/docs/13/sql-createcast.html#SQL-CREATECAST-NOTES

Release note (sql change): Implicit casts performed during INSERT
statements now more closely follow Postgres's behavior. Several minor
bugs related to these types of casts have been fixed.
mgartner added a commit to mgartner/cockroach that referenced this issue Oct 11, 2021
Casts in Postgres are performed in one of three contexts [1]:

  1. An explicit context with `CAST(x AS T)` or `x::T`.
  2. An assignment context performed implicitly during an INSERT,
     UPSERT, or UPDATE.
  3. An implicit context during the evaluation of an expression. For
     example the DATE in `'2021-01-02'::DATE < now()` will be implicitly
     cast to a TIMESTAMPTZ so that the values can be compared.

Not all casts can be performed in all contexts. Postgres's pg_cast table
lists valid casts and specifies the maximum cast context in which each
can be performed. A cast with a max context of explicit can only be
performed in an explicit context. A cast with an assignment max context
can be performed in an explicit context or an assignment context. A cast
with an implicit max context can be performed in all contexts.

Much to my personal disappointment and frustration, there are valid
casts that are not listed in Postgres's pg_cast table. These casts are
called "automatic I/O conversions" and they allow casting most types to
and from the string types: TEXT, VARCHAR, CHAR, NAME, and "char" [2].
We cannot determine these casts' maximum cast context from the pg_cast
table, so we rely on the documentation which states that conversions to
string types are assignment casts and conversions from string types are
explicit casts [3].

--

This commit implements assignment casts for INSERTs. Follow up work will
implement assignment casts for UPSERTs and UPDATEs.

A cast performed in an assignment context has slightly different
behavior than the same cast performed in an explicit context. In an
assignment context, the cast will error if the width of the value is too
large for the given type. In an explicit context, the value will be
truncated to match the width. The one exception is assignment casts to
the special "char" type which do truncate values.

To support different cast behaviors for different contexts, a new
built-in, `crdb_internal.assignment_cast` has been introduced. This
function takes two arguments: a value and a type. Because SQL
does not have first-class types, a type cannot be passed directly to the
built-in. Instead, a `NULL` cast to a type is used as a workaround,
similar to the `json_populate_record` built-in. For example, an integer
can be assignment-cast to a string with:

    crdb_internal.assignment_cast(1::INT, NULL::STRING)

The optimizer is responsible for wrapping INSERT columns with the
assignment cast built-in function. If an insert column type `T1` is not
identical to the table's corresponding target column type `T2`, the
optimizer will check if there is a valid cast from `T1` to `T2` with a
maximum context that allows an assignment cast. If there is a such a
cast, a projection will wrap the column in the assignment cast built-in
function. If there is no such cast, a user error will be produced.

Some changes to prepared statement placeholder type inference were
required in order to better match Postgres's behavior (this is a
best-effort match thus far and there are still inconsistencies). Most
notably, widths and precision are no longer inferred for the types of
placeholders. The effect of this is that assignment casts will be
correctly added by the optimizer in order to make sure that values for
placeholders are correctly coerced to the target column type during
execution of a prepared insert.

The introduction of assignment casts fixes minor bugs and addresses some
inconsistencies with Postgres's behavior. In general, INSERTS now
successfully cast values to target table column types in more cases. As
one example, inserting a string into an integer column now succeeds:

    CREATE TABLE t (i INT)
    INSERT INTO t VALUES ('1'::STRING)

Prior to this commit there was logic that mimicked assignment casts, but
it was not correct. Bugs in the implementation caused incorrect behavior
when inserting into tables with computed columns. Most notably, a
computed column expression that referenced another column `c` was
evaluated with the value of `c` before the assignment cast was
performed. This resulted in incorrect values for computed columns in
some cases.

In addition, assignment casts make the special logic for rounding
decimal values in optbuilder obsolete. The builtin function
`crdb_internal.round_decimal_values` and related logic in optbuilder
will be removed once assignment casts are implemented for UPSERTs and
UPDATEs.

Fixes cockroachdb#69327
Fixes cockroachdb#69665

[1] https://www.postgresql.org/docs/current/typeconv.html
[2] https://www.postgresql.org/docs/13/catalog-pg-cast.html#CATALOG-PG-CAST
[3] https://www.postgresql.org/docs/13/sql-createcast.html#SQL-CREATECAST-NOTES

Release note (sql change): Implicit casts performed during INSERT
statements now more closely follow Postgres's behavior. Several minor
bugs related to these types of casts have been fixed.
mgartner added a commit to mgartner/cockroach that referenced this issue Oct 12, 2021
Casts in Postgres are performed in one of three contexts [1]:

  1. An explicit context with `CAST(x AS T)` or `x::T`.
  2. An assignment context performed implicitly during an INSERT,
     UPSERT, or UPDATE.
  3. An implicit context during the evaluation of an expression. For
     example the DATE in `'2021-01-02'::DATE < now()` will be implicitly
     cast to a TIMESTAMPTZ so that the values can be compared.

Not all casts can be performed in all contexts. Postgres's pg_cast table
lists valid casts and specifies the maximum cast context in which each
can be performed. A cast with a max context of explicit can only be
performed in an explicit context. A cast with an assignment max context
can be performed in an explicit context or an assignment context. A cast
with an implicit max context can be performed in all contexts.

Much to my personal disappointment and frustration, there are valid
casts that are not listed in Postgres's pg_cast table. These casts are
called "automatic I/O conversions" and they allow casting most types to
and from the string types: TEXT, VARCHAR, CHAR, NAME, and "char" [2].
We cannot determine these casts' maximum cast context from the pg_cast
table, so we rely on the documentation which states that conversions to
string types are assignment casts and conversions from string types are
explicit casts [3].

--

This commit implements assignment casts for INSERTs. Follow up work will
implement assignment casts for UPSERTs and UPDATEs.

A cast performed in an assignment context has slightly different
behavior than the same cast performed in an explicit context. In an
assignment context, the cast will error if the width of the value is too
large for the given type. In an explicit context, the value will be
truncated to match the width. The one exception is assignment casts to
the special "char" type which do truncate values.

To support different cast behaviors for different contexts, a new
built-in, `crdb_internal.assignment_cast` has been introduced. This
function takes two arguments: a value and a type. Because SQL
does not have first-class types, a type cannot be passed directly to the
built-in. Instead, a `NULL` cast to a type is used as a workaround,
similar to the `json_populate_record` built-in. For example, an integer
can be assignment-cast to a string with:

    crdb_internal.assignment_cast(1::INT, NULL::STRING)

The optimizer is responsible for wrapping INSERT columns with the
assignment cast built-in function. If an insert column type `T1` is not
identical to the table's corresponding target column type `T2`, the
optimizer will check if there is a valid cast from `T1` to `T2` with a
maximum context that allows an assignment cast. If there is a such a
cast, a projection will wrap the column in the assignment cast built-in
function. If there is no such cast, a user error will be produced.

Some changes to prepared statement placeholder type inference were
required in order to better match Postgres's behavior (this is a
best-effort match thus far and there are still inconsistencies). Most
notably, widths and precision are no longer inferred for the types of
placeholders. The effect of this is that assignment casts will be
correctly added by the optimizer in order to make sure that values for
placeholders are correctly coerced to the target column type during
execution of a prepared insert.

The introduction of assignment casts fixes minor bugs and addresses some
inconsistencies with Postgres's behavior. In general, INSERTS now
successfully cast values to target table column types in more cases. As
one example, inserting a string into an integer column now succeeds:

    CREATE TABLE t (i INT)
    INSERT INTO t VALUES ('1'::STRING)

Prior to this commit there was logic that mimicked assignment casts, but
it was not correct. Bugs in the implementation caused incorrect behavior
when inserting into tables with computed columns. Most notably, a
computed column expression that referenced another column `c` was
evaluated with the value of `c` before the assignment cast was
performed. This resulted in incorrect values for computed columns in
some cases.

In addition, assignment casts make the special logic for rounding
decimal values in optbuilder obsolete. The builtin function
`crdb_internal.round_decimal_values` and related logic in optbuilder
will be removed once assignment casts are implemented for UPSERTs and
UPDATEs.

Fixes cockroachdb#69327
Fixes cockroachdb#69665

[1] https://www.postgresql.org/docs/current/typeconv.html
[2] https://www.postgresql.org/docs/13/catalog-pg-cast.html#CATALOG-PG-CAST
[3] https://www.postgresql.org/docs/13/sql-createcast.html#SQL-CREATECAST-NOTES

Release note (sql change): Implicit casts performed during INSERT
statements now more closely follow Postgres's behavior. Several minor
bugs related to these types of casts have been fixed.
mgartner added a commit to mgartner/cockroach that referenced this issue Oct 12, 2021
Casts in Postgres are performed in one of three contexts [1]:

  1. An explicit context with `CAST(x AS T)` or `x::T`.
  2. An assignment context performed implicitly during an INSERT,
     UPSERT, or UPDATE.
  3. An implicit context during the evaluation of an expression. For
     example the DATE in `'2021-01-02'::DATE < now()` will be implicitly
     cast to a TIMESTAMPTZ so that the values can be compared.

Not all casts can be performed in all contexts. Postgres's pg_cast table
lists valid casts and specifies the maximum cast context in which each
can be performed. A cast with a max context of explicit can only be
performed in an explicit context. A cast with an assignment max context
can be performed in an explicit context or an assignment context. A cast
with an implicit max context can be performed in all contexts.

Much to my personal disappointment and frustration, there are valid
casts that are not listed in Postgres's pg_cast table. These casts are
called "automatic I/O conversions" and they allow casting most types to
and from the string types: TEXT, VARCHAR, CHAR, NAME, and "char" [2].
We cannot determine these casts' maximum cast context from the pg_cast
table, so we rely on the documentation which states that conversions to
string types are assignment casts and conversions from string types are
explicit casts [3].

--

This commit implements assignment casts for INSERTs. Follow up work will
implement assignment casts for UPSERTs and UPDATEs.

A cast performed in an assignment context has slightly different
behavior than the same cast performed in an explicit context. In an
assignment context, the cast will error if the width of the value is too
large for the given type. In an explicit context, the value will be
truncated to match the width. The one exception is assignment casts to
the special "char" type which do truncate values.

To support different cast behaviors for different contexts, a new
built-in, `crdb_internal.assignment_cast` has been introduced. This
function takes two arguments: a value and a type. Because SQL
does not have first-class types, a type cannot be passed directly to the
built-in. Instead, a `NULL` cast to a type is used as a workaround,
similar to the `json_populate_record` built-in. For example, an integer
can be assignment-cast to a string with:

    crdb_internal.assignment_cast(1::INT, NULL::STRING)

The optimizer is responsible for wrapping INSERT columns with the
assignment cast built-in function. If an insert column type `T1` is not
identical to the table's corresponding target column type `T2`, the
optimizer will check if there is a valid cast from `T1` to `T2` with a
maximum context that allows an assignment cast. If there is a such a
cast, a projection will wrap the column in the assignment cast built-in
function. If there is no such cast, a user error will be produced.

Some changes to prepared statement placeholder type inference were
required in order to better match Postgres's behavior (this is a
best-effort match thus far and there are still inconsistencies). Most
notably, widths and precision are no longer inferred for the types of
placeholders. The effect of this is that assignment casts will be
correctly added by the optimizer in order to make sure that values for
placeholders are correctly coerced to the target column type during
execution of a prepared insert.

The introduction of assignment casts fixes minor bugs and addresses some
inconsistencies with Postgres's behavior. In general, INSERTS now
successfully cast values to target table column types in more cases. As
one example, inserting a string into an integer column now succeeds:

    CREATE TABLE t (i INT)
    INSERT INTO t VALUES ('1'::STRING)

Prior to this commit there was logic that mimicked assignment casts, but
it was not correct. Bugs in the implementation caused incorrect behavior
when inserting into tables with computed columns. Most notably, a
computed column expression that referenced another column `c` was
evaluated with the value of `c` before the assignment cast was
performed. This resulted in incorrect values for computed columns in
some cases.

In addition, assignment casts make the special logic for rounding
decimal values in optbuilder obsolete. The builtin function
`crdb_internal.round_decimal_values` and related logic in optbuilder
will be removed once assignment casts are implemented for UPSERTs and
UPDATEs.

Fixes cockroachdb#69327
Fixes cockroachdb#69665

[1] https://www.postgresql.org/docs/current/typeconv.html
[2] https://www.postgresql.org/docs/13/catalog-pg-cast.html#CATALOG-PG-CAST
[3] https://www.postgresql.org/docs/13/sql-createcast.html#SQL-CREATECAST-NOTES

Release note (sql change): Implicit casts performed during INSERT
statements now more closely follow Postgres's behavior. Several minor
bugs related to these types of casts have been fixed.
mgartner added a commit to mgartner/cockroach that referenced this issue Oct 12, 2021
Casts in Postgres are performed in one of three contexts [1]:

  1. An explicit context with `CAST(x AS T)` or `x::T`.
  2. An assignment context performed implicitly during an INSERT,
     UPSERT, or UPDATE.
  3. An implicit context during the evaluation of an expression. For
     example the DATE in `'2021-01-02'::DATE < now()` will be implicitly
     cast to a TIMESTAMPTZ so that the values can be compared.

Not all casts can be performed in all contexts. Postgres's pg_cast table
lists valid casts and specifies the maximum cast context in which each
can be performed. A cast with a max context of explicit can only be
performed in an explicit context. A cast with an assignment max context
can be performed in an explicit context or an assignment context. A cast
with an implicit max context can be performed in all contexts.

Much to my personal disappointment and frustration, there are valid
casts that are not listed in Postgres's pg_cast table. These casts are
called "automatic I/O conversions" and they allow casting most types to
and from the string types: TEXT, VARCHAR, CHAR, NAME, and "char" [2].
We cannot determine these casts' maximum cast context from the pg_cast
table, so we rely on the documentation which states that conversions to
string types are assignment casts and conversions from string types are
explicit casts [3].

--

This commit implements assignment casts for INSERTs. Follow up work will
implement assignment casts for UPSERTs and UPDATEs.

A cast performed in an assignment context has slightly different
behavior than the same cast performed in an explicit context. In an
assignment context, the cast will error if the width of the value is too
large for the given type. In an explicit context, the value will be
truncated to match the width. The one exception is assignment casts to
the special "char" type which do truncate values.

To support different cast behaviors for different contexts, a new
built-in, `crdb_internal.assignment_cast` has been introduced. This
function takes two arguments: a value and a type. Because SQL
does not have first-class types, a type cannot be passed directly to the
built-in. Instead, a `NULL` cast to a type is used as a workaround,
similar to the `json_populate_record` built-in. For example, an integer
can be assignment-cast to a string with:

    crdb_internal.assignment_cast(1::INT, NULL::STRING)

The optimizer is responsible for wrapping INSERT columns with the
assignment cast built-in function. If an insert column type `T1` is not
identical to the table's corresponding target column type `T2`, the
optimizer will check if there is a valid cast from `T1` to `T2` with a
maximum context that allows an assignment cast. If there is a such a
cast, a projection will wrap the column in the assignment cast built-in
function. If there is no such cast, a user error will be produced.

Some changes to prepared statement placeholder type inference were
required in order to better match Postgres's behavior (this is a
best-effort match thus far and there are still inconsistencies). Most
notably, widths and precision are no longer inferred for the types of
placeholders. The effect of this is that assignment casts will be
correctly added by the optimizer in order to make sure that values for
placeholders are correctly coerced to the target column type during
execution of a prepared insert.

The introduction of assignment casts fixes minor bugs and addresses some
inconsistencies with Postgres's behavior. In general, INSERTS now
successfully cast values to target table column types in more cases. As
one example, inserting a string into an integer column now succeeds:

    CREATE TABLE t (i INT)
    INSERT INTO t VALUES ('1'::STRING)

Prior to this commit there was logic that mimicked assignment casts, but
it was not correct. Bugs in the implementation caused incorrect behavior
when inserting into tables with computed columns. Most notably, a
computed column expression that referenced another column `c` was
evaluated with the value of `c` before the assignment cast was
performed. This resulted in incorrect values for computed columns in
some cases.

In addition, assignment casts make the special logic for rounding
decimal values in optbuilder obsolete. The builtin function
`crdb_internal.round_decimal_values` and related logic in optbuilder
will be removed once assignment casts are implemented for UPSERTs and
UPDATEs.

Fixes cockroachdb#69327
Fixes cockroachdb#69665

[1] https://www.postgresql.org/docs/current/typeconv.html
[2] https://www.postgresql.org/docs/13/catalog-pg-cast.html#CATALOG-PG-CAST
[3] https://www.postgresql.org/docs/13/sql-createcast.html#SQL-CREATECAST-NOTES

Release note (sql change): Implicit casts performed during INSERT
statements now more closely follow Postgres's behavior. Several minor
bugs related to these types of casts have been fixed.
craig bot pushed a commit that referenced this issue Oct 13, 2021
70722: sql: add assignment casts for INSERTs r=mgartner a=mgartner

#### sql: rename pkg/sql/sem/tree/casts.go to cast.go

Release note: None

#### tree: add OID->OID cast map

This commit adds a new map that describes valid casts from OID to OID.
It introduces three cast contexts: explicit casts, assignment casts, and
implicit casts. See the comments for CastContext and castMap for
details.

This map will enable us to properly follow Postgres's casting behavior.
Most immediately, it will allow us to support assignment casts.

Future work includes moving volatility in castMap. In the longer term,
cast functions can be moved into castMap as well.

Release note: None

#### sql: set "char" type width to 1

The `"char"` type is a special single-character type. This commit
adds a `types.QChar` with a width one. It removes the `types.MakeQChar`
function so that it is impossible to create `"char"` types with any
other width.

Release note: None

#### sql: add assignment casts for INSERTs

Casts in Postgres are performed in one of three contexts [1]:

  1. An explicit context with `CAST(x AS T)` or `x::T`.
  2. An assignment context performed implicitly during an INSERT,
     UPSERT, or UPDATE.
  3. An implicit context during the evaluation of an expression. For
     example the DATE in `'2021-01-02'::DATE < now()` will be implicitly
     cast to a TIMESTAMPTZ so that the values can be compared.

Not all casts can be performed in all contexts. Postgres's pg_cast table
lists valid casts and specifies the maximum cast context in which each
can be performed. A cast with a max context of explicit can only be
performed in an explicit context. A cast with an assignment max context
can be performed in an explicit context or an assignment context. A cast
with an implicit max context can be performed in all contexts.

Much to my personal disappointment and frustration, there are valid
casts that are not listed in Postgres's pg_cast table. These casts are
called "automatic I/O conversions" and they allow casting most types to
and from the string types: TEXT, VARCHAR, CHAR, NAME, and "char" [2].
We cannot determine these casts' maximum cast context from the pg_cast
table, so we rely on the documentation which states that conversions to
string types are assignment casts and conversions from string types are
explicit casts [3].

--

This commit implements assignment casts for INSERTs. Follow up work will
implement assignment casts for UPSERTs and UPDATEs.

A cast performed in an assignment context has slightly different
behavior than the same cast performed in an explicit context. In an
assignment context, the cast will error if the width of the value is too
large for the given type. In an explicit context, the value will be
truncated to match the width. The one exception is assignment casts to
the special "char" type which do truncate values.

To support different cast behaviors for different contexts, a new
built-in, `crdb_internal.assignment_cast` has been introduced. This
function takes two arguments: a value and a type. Because SQL
does not have first-class types, a type cannot be passed directly to the
built-in. Instead, a `NULL` cast to a type is used as a workaround,
similar to the `json_populate_record` built-in. For example, an integer
can be assignment-cast to a string with:

    crdb_internal.assignment_cast(1::INT, NULL::STRING)

The optimizer is responsible for wrapping INSERT columns with the
assignment cast built-in function. If an insert column type `T1` is not
identical to the table's corresponding target column type `T2`, the
optimizer will check if there is a valid cast from `T1` to `T2` with a
maximum context that allows an assignment cast. If there is a such a
cast, a projection will wrap the column in the assignment cast built-in
function. If there is no such cast, a user error will be produced.

Some changes to prepared statement placeholder type inference were
required in order to better match Postgres's behavior (this is a
best-effort match thus far and there are still inconsistencies). Most
notably, widths and precision are no longer inferred for the types of
placeholders. The effect of this is that assignment casts will be
correctly added by the optimizer in order to make sure that values for
placeholders are correctly coerced to the target column type during
execution of a prepared insert.

The introduction of assignment casts fixes minor bugs and addresses some
inconsistencies with Postgres's behavior. In general, INSERTS now
successfully cast values to target table column types in more cases. As
one example, inserting a string into an integer column now succeeds:

    CREATE TABLE t (i INT)
    INSERT INTO t VALUES ('1'::STRING)

Prior to this commit there was logic that mimicked assignment casts, but
it was not correct. Bugs in the implementation caused incorrect behavior
when inserting into tables with computed columns. Most notably, a
computed column expression that referenced another column `c` was
evaluated with the value of `c` before the assignment cast was
performed. This resulted in incorrect values for computed columns in
some cases.

In addition, assignment casts make the special logic for rounding
decimal values in optbuilder obsolete. The builtin function
`crdb_internal.round_decimal_values` and related logic in optbuilder
will be removed once assignment casts are implemented for UPSERTs and
UPDATEs.

Fixes #69327
Fixes #69665

[1] https://www.postgresql.org/docs/current/typeconv.html
[2] https://www.postgresql.org/docs/13/catalog-pg-cast.html#CATALOG-PG-CAST
[3] https://www.postgresql.org/docs/13/sql-createcast.html#SQL-CREATECAST-NOTES

Release note (sql change): Implicit casts performed during INSERT
statements now more closely follow Postgres's behavior. Several minor
bugs related to these types of casts have been fixed.


70950: instancestorage: Add SQL test for sql_instances. r=knz a=rimadeodhar

This PR adds a unit test to verify that the sql_instances
table can be accessed through SQL API.

Release note: None

71412: sqlproxyccl: rework sqlproxy connection throttler r=JeffSwenson a=JeffSwenson

This change switches the sqlproxy connection throttling logic back to
exponential backoff. The tokenbucket approach was introduced by
PR #69041. There are a few behavior differences between this and the
original exponential backoff implementation.

1. The throttling logic is maintained per (ip, tenant) instead of per
   (ip). Some platform as a service provides share a single outbound ip
   address between multiple clients. These users would occasionaly see
   throttling caused by a second user sharing their IP.
2. The throttling logic was triggered before there was an authentication
   failure. It takes ~100ms-1000ms to authenticate with the tenant
   process.  Any requests that arrived after the first request, but
   before it was processed, would trigger the throttle. Now, we only
   trigger the throttle in response to an explict authorization error.

Release note: None

Co-authored-by: Marcus Gartner <[email protected]>
Co-authored-by: rimadeodhar <[email protected]>
Co-authored-by: Jeff <[email protected]>
@craig craig bot closed this as completed in 17276b5 Oct 13, 2021
mgartner added a commit to mgartner/cockroach that referenced this issue Dec 22, 2021
Adjusting INSERT, UPDATE, and UPSERT values right before they are
written in the execution engine is no longer necessary with assignment
casts, and it is incorrect (see cockroachdb#69665). However, we will continue to
perform these adjustments to avoid regressions that assignment casts may
cause until we have more confidene in assignment casts. In test builds,
an assertion has been added to ensure that the adjustments have no
effect, which should help catch potential bugs in assignment casts.

Release note: None
@mgartner mgartner moved this to Done in SQL Queries Jul 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-mutations Mutation statements: UPDATE/INSERT/UPSERT/DELETE. A-sql-optimizer SQL logical planning and optimizations. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-queries SQL Queries Team
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants