Skip to content

Commit

Permalink
sql: add assignment casts for INSERTs
Browse files Browse the repository at this point in the history
Casts in Postgres are performed in one of three contexts [1]:

  1. An explicit context with `CAST(x AS T)` or `x::T`.
  2. An assignment context performed implicitly during an INSERT,
     UPSERT, or UPDATE.
  3. An implicit context during the evaluation of an expression. For
     example the DATE in `'2021-01-02'::DATE < now()` will be implicitly
     cast to a TIMESTAMPTZ so that the values can be compared.

Not all casts can be performed in all contexts. Postgres's pg_cast table
lists valid casts and specifies the maximum cast context in which each
can be performed. A cast with a max context of explicit can only be
performed in an explicit context. A cast with an assignment max context
can be performed in an explicit context or an assignment context. A cast
with an implicit max context can be performed in all contexts.

Much to my personal disappointment and frustration, there are valid
casts that are not listed in Postgres's pg_cast table. These casts are
called "automatic I/O conversions" and they allow casting most types to
and from the string types: TEXT, VARCHAR, CHAR, NAME, and "char" [2].
We cannot determine these casts' maximum cast context from the pg_cast
table, so we rely on the documentation which states that conversions to
string types are assignment casts and conversions from string types are
explicit casts [3].

--

This commit implements assignment casts for INSERTs. Follow up work will
implement assignment casts for UPSERTs and UPDATEs.

A cast performed in an assignment context has slightly different
behavior than the same cast performed in an explicit context. In an
assignment context, the cast will error if the width of the value is too
large for the given type. In an explicit context, the value will be
truncated to match the width. The one exception is assignment casts to
the special "char" type which do truncate values.

To support different cast behaviors for different contexts, a new
built-in, `crdb_internal.assignment_cast` has been introduced. This
function takes two arguments: a value and a type. Because SQL
does not have first-class types, a type cannot be passed directly to the
built-in. Instead, a `NULL` cast to a type is used as a workaround,
similar to the `json_populate_record` built-in. For example, an integer
can be assignment-cast to a string with:

    crdb_internal.assignment_cast(1::INT, NULL::STRING)

The optimizer is responsible for wrapping INSERT columns with the
assignment cast built-in function. If an insert column type `T1` is not
identical to the table's corresponding target column type `T2`, the
optimizer will check if there is a valid cast from `T1` to `T2` with a
maximum context that allows an assignment cast. If there is a such a
cast, a projection will wrap the column in the assignment cast built-in
function. If there is no such cast, a user error will be produced.

Some changes to prepared statement placeholder type inference were
required in order to better match Postgres's behavior (this is a
best-effort match thus far and there are still inconsistencies). Most
notably, widths and precision are no longer inferred for the types of
placeholders. The effect of this is that assignment casts will be
correctly added by the optimizer in order to make sure that values for
placeholders are correctly coerced to the target column type during
execution of a prepared insert.

The introduction of assignment casts fixes minor bugs and addresses some
inconsistencies with Postgres's behavior. In general, INSERTS now
successfully cast values to target table column types in more cases. As
one example, inserting a string into an integer column now succeeds:

    CREATE TABLE t (i INT)
    INSERT INTO t VALUES ('1'::STRING)

Prior to this commit there was logic that mimicked assignment casts, but
it was not correct. Bugs in the implementation caused incorrect behavior
when inserting into tables with computed columns. Most notably, a
computed column expression that referenced another column `c` was
evaluated with the value of `c` before the assignment cast was
performed. This resulted in incorrect values for computed columns in
some cases.

In addition, assignment casts make the special logic for rounding
decimal values in optbuilder obsolete. The builtin function
`crdb_internal.round_decimal_values` and related logic in optbuilder
will be removed once assignment casts are implemented for UPSERTs and
UPDATEs.

Fixes cockroachdb#69327
Fixes cockroachdb#69665

[1] https://www.postgresql.org/docs/current/typeconv.html
[2] https://www.postgresql.org/docs/13/catalog-pg-cast.html#CATALOG-PG-CAST
[3] https://www.postgresql.org/docs/13/sql-createcast.html#SQL-CREATECAST-NOTES

Release note (sql change): Implicit casts performed during INSERT
statements now more closely follow Postgres's behavior. Several minor
bugs related to these types of casts have been fixed.
  • Loading branch information
mgartner committed Oct 11, 2021
1 parent 0c8ac1e commit df5b790
Show file tree
Hide file tree
Showing 43 changed files with 2,049 additions and 1,060 deletions.
2 changes: 2 additions & 0 deletions docs/generated/sql/functions.md
Original file line number Diff line number Diff line change
Expand Up @@ -2826,6 +2826,8 @@ may increase either contention or retry errors, or both.</p>
</span></td></tr>
<tr><td><a name="crdb_internal.approximate_timestamp"></a><code>crdb_internal.approximate_timestamp(timestamp: <a href="decimal.html">decimal</a>) &rarr; <a href="timestamp.html">timestamp</a></code></td><td><span class="funcdesc"><p>Converts the crdb_internal_mvcc_timestamp column into an approximate timestamp.</p>
</span></td></tr>
<tr><td><a name="crdb_internal.assignment_cast"></a><code>crdb_internal.assignment_cast(val: anyelement, type: anyelement) &rarr; anyelement</code></td><td><span class="funcdesc"><p>This function is used internally to perform assignment casts during mutations.</p>
</span></td></tr>
<tr><td><a name="crdb_internal.check_consistency"></a><code>crdb_internal.check_consistency(stats_only: <a href="bool.html">bool</a>, start_key: <a href="bytes.html">bytes</a>, end_key: <a href="bytes.html">bytes</a>) &rarr; tuple{int AS range_id, bytes AS start_key, string AS start_key_pretty, string AS status, string AS detail}</code></td><td><span class="funcdesc"><p>Runs a consistency check on ranges touching the specified key range. an empty start or end key is treated as the minimum and maximum possible, respectively. stats_only should only be set to false when targeting a small number of ranges to avoid overloading the cluster. Each returned row contains the range ID, the status (a roachpb.CheckConsistencyResponse_Status), and verbose detail.</p>
<p>Example usage:
SELECT * FROM crdb_internal.check_consistency(true, ‘\x02’, ‘\x04’)</p>
Expand Down
7 changes: 6 additions & 1 deletion pkg/sql/colexec/builtin_funcs.go
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ import (
"github.com/cockroachdb/cockroach/pkg/sql/execinfra"
"github.com/cockroachdb/cockroach/pkg/sql/sem/tree"
"github.com/cockroachdb/cockroach/pkg/sql/types"
"github.com/cockroachdb/errors"
)

type defaultBuiltinFuncOperator struct {
Expand Down Expand Up @@ -118,7 +119,11 @@ func NewBuiltinFunctionOperator(
outputIdx int,
input colexecop.Operator,
) (colexecop.Operator, error) {
switch funcExpr.ResolvedOverload().SpecializedVecBuiltin {
overload := funcExpr.ResolvedOverload()
if overload.FnWithExprs != nil {
return nil, errors.New("builtins with FnWithExprs are not supported in the vectorized engine")
}
switch overload.SpecializedVecBuiltin {
case tree.SubstringStringIntInt:
input = colexecutils.NewVectorTypeEnforcer(allocator, input, types.String, outputIdx)
return newSubstringOperator(
Expand Down
8 changes: 2 additions & 6 deletions pkg/sql/execute.go
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ import (
"github.com/cockroachdb/cockroach/pkg/sql/pgwire/pgerror"
"github.com/cockroachdb/cockroach/pkg/sql/sem/tree"
"github.com/cockroachdb/cockroach/pkg/sql/sessiondata"
"github.com/cockroachdb/errors"
"github.com/cockroachdb/cockroach/pkg/sql/types"
)

// fillInPlaceholder helps with the EXECUTE foo(args) SQL statement: it takes in
Expand All @@ -43,12 +43,8 @@ func fillInPlaceholders(
for i, e := range params {
idx := tree.PlaceholderIdx(i)

typ, ok := ps.ValueType(idx)
if !ok {
return nil, errors.AssertionFailedf("no type for placeholder %s", idx)
}
typedExpr, err := schemaexpr.SanitizeVarFreeExpr(
ctx, e, typ, "EXECUTE parameter" /* context */, &semaCtx, tree.VolatilityVolatile,
ctx, e, types.Any, "EXECUTE parameter" /* context */, &semaCtx, tree.VolatilityVolatile,
)
if err != nil {
return nil, pgerror.WithCandidateCode(err, pgcode.WrongObjectType)
Expand Down
2 changes: 1 addition & 1 deletion pkg/sql/insert.go
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,7 @@ func (r *insertRun) initRowContainer(params runParams, columns colinfo.ResultCol
// processSourceRow processes one row from the source for insertion and, if
// result rows are needed, saves it in the result row container.
func (r *insertRun) processSourceRow(params runParams, rowVals tree.Datums) error {
if err := enforceLocalColumnConstraints(rowVals, r.insertCols); err != nil {
if err := enforceLocalColumnConstraints(rowVals, r.insertCols, false /* isUpdate */); err != nil {
return err
}

Expand Down
3 changes: 0 additions & 3 deletions pkg/sql/logictest/testdata/logic_test/alter_column_type
Original file line number Diff line number Diff line change
Expand Up @@ -200,9 +200,6 @@ rowid INT8 false unique_rowid() · {primary} true
statement ok
INSERT INTO t2 VALUES ('5')

statement error pq: value type int doesn't match type string of column "id"
INSERT INTO t2 VALUES (6)

# Verify ALTER COLUMN TYPE from INT to STRING works correctly.
# Column order should stay the same.
statement ok
Expand Down
4 changes: 2 additions & 2 deletions pkg/sql/logictest/testdata/logic_test/array
Original file line number Diff line number Diff line change
Expand Up @@ -810,7 +810,7 @@ SELECT * FROM a
{hello}
{goodbye}

statement error value type collatedstring{fr}\[\] doesn't match type collatedstring{en}\[\] of column "b"
statement ok
INSERT INTO a VALUES (ARRAY['hello' COLLATE fr])

statement ok
Expand Down Expand Up @@ -841,7 +841,7 @@ CREATE TABLE a (b STRING[])
statement ok
INSERT INTO a VALUES (ARRAY['foo'])

statement error value type collatedstring{en}\[\] doesn't match type string\[\] of column "b"
statement ok
INSERT INTO a VALUES (ARRAY['foo' COLLATE en])

statement ok
Expand Down
175 changes: 175 additions & 0 deletions pkg/sql/logictest/testdata/logic_test/cast
Original file line number Diff line number Diff line change
@@ -1,3 +1,178 @@
# Tests for assignment casts.
subtest assignment_casts

statement ok
CREATE TABLE assn_cast (
c CHAR,
vc VARCHAR(1),
qc "char",
b BIT,
i INT,
t timestamp,
d DECIMAL(10, 0),
s STRING
)

statement ok
INSERT INTO assn_cast(c) VALUES ('a')

statement ok
INSERT INTO assn_cast(c) VALUES (null), ('b')

statement error value too long for type CHAR
INSERT INTO assn_cast(c) VALUES ('abc')

query T
INSERT INTO assn_cast(c) VALUES (1) RETURNING c
----
1

statement error value too long for type CHAR
INSERT INTO assn_cast(c) VALUES (123)

statement ok
PREPARE insert_c AS INSERT INTO assn_cast(c) VALUES ($1)

statement error value too long for type CHAR
EXECUTE insert_c('foo')

statement error value too long for type CHAR
EXECUTE insert_c('foo'::STRING)

statement ok
DELETE FROM assn_cast

statement
EXECUTE insert_c(' ')

statement
EXECUTE insert_c(' '::STRING)

# Length should be 0 for both rows because CHAR is a blank-padded string type.
query I
SELECT length(c) FROM assn_cast
----
0
0

statement ok
INSERT INTO assn_cast(vc) VALUES ('a')

statement ok
INSERT INTO assn_cast(vc) VALUES (null), ('b')

statement error value too long for type VARCHAR\(1\)
INSERT INTO assn_cast(vc) VALUES ('abc')

query T
INSERT INTO assn_cast(vc) VALUES (1) RETURNING vc
----
1

statement error value too long for type VARCHAR\(1\)
INSERT INTO assn_cast(vc) VALUES (123)

statement ok
INSERT INTO assn_cast(qc) VALUES ('a')

statement ok
INSERT INTO assn_cast(qc) VALUES (null), ('b')

query T
INSERT INTO assn_cast(qc) VALUES ('abc') RETURNING qc
----
a

# Note: This statement fails in Postgres because the default integer type is an
# INT4, and the INT4 -> "char" cast is explicit. Our default integer type
# is an INT8 and INT8 -> "char" is an assignment cast.
query T
INSERT INTO assn_cast(qc) VALUES (123) RETURNING qc
----
{

statement error \"char\" out of range
INSERT INTO assn_cast(qc) VALUES (1234)

statement ok
PREPARE insert_qc AS INSERT INTO assn_cast(qc) VALUES ($1)

statement ok
DELETE FROM assn_cast

statement ok
EXECUTE insert_qc('foo')

statement ok
EXECUTE insert_qc('foo'::STRING)

query T
SELECT qc FROM assn_cast
----
f
f

statement ok
INSERT into assn_cast(b) VALUES ('1')

statement ok
INSERT INTO assn_cast(b) VALUES (null), ('1')

# TODO(mgartner): To match Postgres behavior, this statement should fail with
# the message "value too long for type BIT".
statement ok
INSERT into assn_cast(b) VALUES ('01')

statement error value type int doesn't match type bit of column \"b\"
INSERT into assn_cast(b) VALUES (1)

statement ok
INSERT INTO assn_cast(i) VALUES ('1')

statement ok
INSERT INTO assn_cast(i) VALUES (null), ('1')

statement ok
PREPARE insert_i AS INSERT INTO assn_cast(i) VALUES ($1)

statement ok
EXECUTE insert_i('1')

statement error value type string doesn't match type int of column \"i\"
INSERT INTO assn_cast(i) VALUES ('1'::STRING)

statement ok
INSERT INTO assn_cast(t) VALUES ('1970-01-01'::timestamptz)

statement ok
PREPARE insert_d AS INSERT INTO assn_cast(d) VALUES ($1)

statement ok
EXECUTE insert_d(123.45)

statement ok
PREPARE insert_d2 AS INSERT INTO assn_cast(d) SELECT * FROM (VALUES ($1::DECIMAL(10, 2)))

statement ok
EXECUTE insert_d2(67.89)

query F
SELECT d FROM assn_cast WHERE d IS NOT NULL
----
123
68

statement ok
INSERT INTO assn_cast(s) VALUES (1)

statement ok
PREPARE insert_s AS INSERT INTO assn_cast(s) VALUES ($1)

statement ok
EXECUTE insert_s(1)

subtest regressions

statement ok
CREATE TABLE t45837 AS SELECT 1.25::decimal AS d

Expand Down
10 changes: 8 additions & 2 deletions pkg/sql/logictest/testdata/logic_test/collatedstring
Original file line number Diff line number Diff line change
Expand Up @@ -231,8 +231,10 @@ INSERT INTO t VALUES
('x' COLLATE en),
('ü' COLLATE en)

statement error value type collatedstring{de} doesn't match type collatedstring{en} of column "a"
INSERT INTO t VALUES ('X' COLLATE de)
statement ok
INSERT INTO t VALUES
('X' COLLATE de),
('y' COLLATE de)

query T
SELECT a FROM t ORDER BY t.a
Expand All @@ -243,6 +245,8 @@ b
B
ü
x
X
y

query T
SELECT a FROM t ORDER BY t.a COLLATE da
Expand All @@ -252,6 +256,8 @@ A
b
B
x
X
y
ü

query T
Expand Down
40 changes: 39 additions & 1 deletion pkg/sql/logictest/testdata/logic_test/computed
Original file line number Diff line number Diff line change
Expand Up @@ -883,9 +883,17 @@ CREATE TABLE x (
b INT AS (a+1) STORED
)

query error value type decimal doesn't match type int of column "a"
statement ok
INSERT INTO x VALUES(1.4)

query II
SELECT * FROM x
----
1 2

query error value type date doesn't match type int of column "a"
INSERT INTO x VALUES('1970-01-01'::date)

# Regression test for #34901: verify that builtins can be used in computed
# column expressions without a "memory budget exceeded" error while backfilling
statement ok
Expand Down Expand Up @@ -1025,3 +1033,33 @@ SET experimental_computed_column_rewrites = "bad"

statement error invalid column rewrites expression
SET CLUSTER SETTING sql.defaults.experimental_computed_column_rewrites = "bad"

# Regression test for #69327. Computed columns should be evaluated after
# assignment casts have been performed.
statement ok
CREATE TABLE t69327 (
c "char",
v STRING AS (c) STORED
);
INSERT INTO t69327 VALUES ('foo'::STRING)

# Both columns should have a value of "f".
query TT
SELECT * FROM t69327
----
f f

# Regression test for #69665.Computed columns should be evaluated after
# assignment casts have been performed.
statement ok
CREATE TABLE t69665 (
c CHAR,
v STRING AS (c) STORED
);
INSERT INTO t69665 VALUES (' '::STRING)

# Both columns should be empty values.
query II
SELECT length(c), length(v) FROM t69665
----
0 0
Loading

0 comments on commit df5b790

Please sign in to comment.