Skip to content

Commit

Permalink
sql: add assignment casts for INSERTs
Browse files Browse the repository at this point in the history
Casts in Postgres are performed in one of three contexts [1]:

  1. An explicit context with `CAST(x AS T)` or `x::T`.
  2. An assignment context performed implicitly during an INSERT,
     UPSERT, or UPDATE.
  3. An implicit context during the evaluation of an expression. For
     example the DATE in `'2021-01-02'::DATE < now()` will be implicitly
     cast to a TIMESTAMPTZ so that the values can be compared.

Not all casts can be performed in all contexts. Postgres's pg_cast table
lists valid casts and specifies the maximum cast context in which each
can be performed. A cast with a max context of explicit can only be
performed in an explicit context. A cast with an assignment max context
can be performed in an explicit context or an assignment context. A cast
with an implicit max context can be performed in all contexts.

Much to my personal disappointment and frustration, there are valid
casts that are not listed in Postgres's pg_cast table. These casts are
called "automatic I/O conversions" and they allow casting most types to
and from the string types: TEXT, VARCHAR, CHAR, NAME, and "char" [2].
We cannot determine these casts' maximum cast context from the pg_cast
table, so we rely on the documentation which states that conversions to
string types are assignment casts and conversions from string types are
explicit casts [3].

--

This commit implements assignment casts for INSERTs. Follow up work will
implement assignment casts for UPSERTs and UPDATEs.

A cast performed in an assignment context has slightly different
behavior than the same cast performed in an explicit context. In an
assignment context, the cast will error if the width of the value is too
large for the given type. In an explicit context, the value will be
truncated to match the width. The one exception is assignment casts to
the special "char" type which do truncate values.

To support different cast behaviors for different contexts, a new
built-in, `crdb_internal.assignment_cast` has been introduced. This
function takes two arguments: a value and a type. Because SQL
does not have first-class types, a type cannot be passed directly to the
built-in. Instead, a `NULL` cast to a type is used as a workaround,
similar to the `json_populate_record` built-in. For example, an integer
can be assignment-cast to a string with:

    crdb_internal.assignment_cast(1::INT, NULL::STRING)

Optbuilder is responsible for wrapping INSERT columns with the
assignment cast built-in function. If an insert column type `T1` is not
identical to the table's corresponding target column type `T2`,
optbuilder will check if there is a valid cast from `T1` to `T2` with a
maximum context that allows an assignment cast. If there is a such a
cast, a projection will wrap the column in the assignment cast built-in
function. If there is no such cast, a user error will be produced.

The introduction of assignment casts fixes minor bugs and addresses some
inconsistencies with Postgres's behavior. In general, INSERTS now
successfully cast values to target table column types in more cases. As
one example, inserting a string into an integer column now succeeds:

    CREATE TABLE t (i INT)
    INSERT INTO t VALUES ('1'::STRING)

Prior to this commit there was logic that mimicked assignment casts, but
it was not correct. Bugs in the implementation caused incorrect behavior
when inserting into tables with computed columns. Most notably, a
computed column expression that referenced another column `c` was
evaluated with the value of `c` before the assignment cast was
performed. This resulted in incorrect values for computed columns in
some cases.

In addition, assignment casts make the special logic for rounding
decimal values in optbuilder obsolete. The builtin function
`crdb_internal.round_decimal_values` and related logic in optbuilder
will be removed once assignment casts are implemented for UPSERTs and
UPDATEs.

Fixes cockroachdb#69327
Fixes cockroachdb#69665

[1] https://www.postgresql.org/docs/current/typeconv.html
[2] https://www.postgresql.org/docs/13/catalog-pg-cast.html#CATALOG-PG-CAST
[3] https://www.postgresql.org/docs/13/sql-createcast.html#SQL-CREATECAST-NOTES

Release note (sql change): Implicit casts performed during INSERT
statements now more closely follow Postgres's behavior. Several minor
bugs related to these types of casts have been fixed.
  • Loading branch information
mgartner committed Sep 24, 2021
1 parent 0bf1138 commit bd658c0
Show file tree
Hide file tree
Showing 31 changed files with 1,597 additions and 1,104 deletions.
2 changes: 2 additions & 0 deletions docs/generated/sql/functions.md
Original file line number Diff line number Diff line change
Expand Up @@ -2826,6 +2826,8 @@ may increase either contention or retry errors, or both.</p>
</span></td></tr>
<tr><td><a name="crdb_internal.approximate_timestamp"></a><code>crdb_internal.approximate_timestamp(timestamp: <a href="decimal.html">decimal</a>) &rarr; <a href="timestamp.html">timestamp</a></code></td><td><span class="funcdesc"><p>Converts the crdb_internal_mvcc_timestamp column into an approximate timestamp.</p>
</span></td></tr>
<tr><td><a name="crdb_internal.assignment_cast"></a><code>crdb_internal.assignment_cast(val: anyelement, type: anyelement) &rarr; anyelement</code></td><td><span class="funcdesc"><p>This function is used internally to perform assignment casts during mutations.</p>
</span></td></tr>
<tr><td><a name="crdb_internal.check_consistency"></a><code>crdb_internal.check_consistency(stats_only: <a href="bool.html">bool</a>, start_key: <a href="bytes.html">bytes</a>, end_key: <a href="bytes.html">bytes</a>) &rarr; tuple{int AS range_id, bytes AS start_key, string AS start_key_pretty, string AS status, string AS detail}</code></td><td><span class="funcdesc"><p>Runs a consistency check on ranges touching the specified key range. an empty start or end key is treated as the minimum and maximum possible, respectively. stats_only should only be set to false when targeting a small number of ranges to avoid overloading the cluster. Each returned row contains the range ID, the status (a roachpb.CheckConsistencyResponse_Status), and verbose detail.</p>
<p>Example usage:
SELECT * FROM crdb_internal.check_consistency(true, ‘\x02’, ‘\x04’)</p>
Expand Down
8 changes: 7 additions & 1 deletion pkg/sql/colexec/builtin_funcs.go
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@
package colexec

import (
"errors"

"github.com/cockroachdb/cockroach/pkg/col/coldata"
"github.com/cockroachdb/cockroach/pkg/sql/colconv"
"github.com/cockroachdb/cockroach/pkg/sql/colexec/colexecutils"
Expand Down Expand Up @@ -118,7 +120,11 @@ func NewBuiltinFunctionOperator(
outputIdx int,
input colexecop.Operator,
) (colexecop.Operator, error) {
switch funcExpr.ResolvedOverload().SpecializedVecBuiltin {
overload := funcExpr.ResolvedOverload()
if overload.FnWithExprs != nil {
return nil, errors.New("builtins with FnWithExprs are not supported in the vectorized engine")
}
switch overload.SpecializedVecBuiltin {
case tree.SubstringStringIntInt:
input = colexecutils.NewVectorTypeEnforcer(allocator, input, types.String, outputIdx)
return newSubstringOperator(
Expand Down
2 changes: 1 addition & 1 deletion pkg/sql/insert.go
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,7 @@ func (r *insertRun) initRowContainer(params runParams, columns colinfo.ResultCol
// processSourceRow processes one row from the source for insertion and, if
// result rows are needed, saves it in the result row container.
func (r *insertRun) processSourceRow(params runParams, rowVals tree.Datums) error {
if err := enforceLocalColumnConstraints(rowVals, r.insertCols); err != nil {
if err := enforceLocalColumnConstraints(rowVals, r.insertCols, false /* isUpdate */); err != nil {
return err
}

Expand Down
85 changes: 85 additions & 0 deletions pkg/sql/logictest/testdata/logic_test/cast
Original file line number Diff line number Diff line change
@@ -1,3 +1,88 @@
# Tests for assignment casts.
subtest assignment_casts

statement ok
CREATE TABLE assn_cast (
c CHAR,
vc VARCHAR(1),
qc "char",
b BIT,
i INT,
t timestamp
)

statement ok
INSERT INTO assn_cast(c) VALUES ('a')

statement error value too long for type CHAR
INSERT INTO assn_cast(c) VALUES ('abc')

query T
INSERT INTO assn_cast(c) VALUES (1) RETURNING c
----
1

statement error value too long for type CHAR
INSERT INTO assn_cast(c) VALUES (123)

statement ok
INSERT INTO assn_cast(vc) VALUES ('a')

statement error value too long for type VARCHAR\(1\)
INSERT INTO assn_cast(vc) VALUES ('abc')

query T
INSERT INTO assn_cast(c) VALUES (1) RETURNING c
----
1

statement error value too long for type CHAR
INSERT INTO assn_cast(c) VALUES (123)

statement ok
INSERT INTO assn_cast(qc) VALUES ('a')

query T
INSERT INTO assn_cast(qc) VALUES ('abc') RETURNING qc
----
a

# Note: This statement fails in Postgres because the default integer type is an
# INT4, and the INT4 -> "char" cast is explicit. Our default integer type
# is an INT8 and INT8 -> "char" is an assignment cast.
#
# TODO(mgartner): This should return '{'.
query T
INSERT INTO assn_cast(qc) VALUES (123) RETURNING qc
----
1

# TODO(mgartner): This should be an out of range error.
statement ok
INSERT INTO assn_cast(qc) VALUES (1234)

statement ok
INSERT into assn_cast(b) VALUES ('1')

# TODO(mgartner): To match Postgres behavior, this statement should fail with
# the message "value too long for type BIT".
statement ok
INSERT into assn_cast(b) VALUES ('01')

statement error value type int doesn't match type bit of column \"b\"
INSERT into assn_cast(b) VALUES (1)

statement ok
INSERT INTO assn_cast(i) VALUES ('1')

statement error value type string doesn't match type int of column \"i\"
INSERT INTO assn_cast(i) VALUES ('1'::STRING)

statement ok
INSERT INTO assn_cast(t) VALUES ('1970-01-01'::timestamptz)

subtest regressions

statement ok
CREATE TABLE t45837 AS SELECT 1.25::decimal AS d

Expand Down
40 changes: 39 additions & 1 deletion pkg/sql/logictest/testdata/logic_test/computed
Original file line number Diff line number Diff line change
Expand Up @@ -883,9 +883,17 @@ CREATE TABLE x (
b INT AS (a+1) STORED
)

query error value type decimal doesn't match type int of column "a"
statement ok
INSERT INTO x VALUES(1.4)

query II
SELECT * FROM x
----
1 2

query error value type date doesn't match type int of column "a"
INSERT INTO x VALUES('1970-01-01'::date)

# Regression test for #34901: verify that builtins can be used in computed
# column expressions without a "memory budget exceeded" error while backfilling
statement ok
Expand Down Expand Up @@ -1025,3 +1033,33 @@ SET experimental_computed_column_rewrites = "bad"

statement error invalid column rewrites expression
SET CLUSTER SETTING sql.defaults.experimental_computed_column_rewrites = "bad"

# Regression test for #69327. Computed columns should be evaluated after
# assignment casts have been performed.
statement ok
CREATE TABLE t69327 (
c "char",
v STRING AS (c) STORED
);
INSERT INTO t69327 VALUES ('foo'::STRING)

# Both columns should have a value of "f".
query TT
SELECT * FROM t69327
----
f f

# Regression test for #69665.Computed columns should be evaluated after
# assignment casts have been performed.
statement ok
CREATE TABLE t69665 (
c CHAR,
v STRING AS (c) STORED
);
INSERT INTO t69665 VALUES (' '::STRING)

# Both columns should be empty values.
query II
SELECT length(c), length(v) FROM t69665
----
0 0
21 changes: 13 additions & 8 deletions pkg/sql/logictest/testdata/logic_test/insert
Original file line number Diff line number Diff line change
Expand Up @@ -129,26 +129,31 @@ INSERT INTO kv4 (int, bool) VALUES (3, 'a')
statement ok
INSERT INTO kv4 (int, bool) VALUES (3, true)

statement error value type int doesn't match type char of column "char"
statement error value too long for type CHAR
INSERT INTO kv4 (int, char) VALUES (4, 11)

statement ok
INSERT INTO kv4 (int, char) VALUES (4, 1)

statement ok
INSERT INTO kv4 (int, char) VALUES (4, 'a')
INSERT INTO kv4 (int, char) VALUES (5, 'a')

statement error value type int doesn't match type float of column "float"
INSERT INTO kv4 (int, float) VALUES (5, 1::INT)
statement ok
INSERT INTO kv4 (int, float) VALUES (6, 1::INT)

statement ok
INSERT INTO kv4 (int, float) VALUES (5, 2.3)
INSERT INTO kv4 (int, float) VALUES (7, 2.3)

query ITBTR rowsort
SELECT * from kv4
----
1 NULL NULL NULL NULL
2 1 NULL NULL NULL
3 NULL true NULL NULL
4 NULL NULL a NULL
5 NULL NULL NULL 2.3
4 NULL NULL 1 NULL
5 NULL NULL a NULL
6 NULL NULL NULL 1
7 NULL NULL NULL 2.3

statement ok
CREATE TABLE kv5 (
Expand Down Expand Up @@ -446,7 +451,7 @@ INSERT INTO string_t VALUES ('str')
query error value type string doesn't match type bytes of column "b"
INSERT INTO bytes_t SELECT * FROM string_t

query error value type bytes doesn't match type string of column "s"
statement ok
INSERT INTO string_t SELECT * FROM bytes_t

subtest string_width_check
Expand Down
4 changes: 2 additions & 2 deletions pkg/sql/logictest/testdata/logic_test/typing
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,8 @@ SELECT * FROM f
statement ok
CREATE TABLE i (x INT)

statement error value type decimal doesn't match type int of column "x"
INSERT INTO i(x) VALUES (4.5)
statement error value type timestamptz doesn't match type int of column "x"
INSERT INTO i(x) VALUES ('1970-01-01'::timestamptz)

statement ok
INSERT INTO i(x) VALUES (2.0)
Expand Down
6 changes: 3 additions & 3 deletions pkg/sql/opt/ops/scalar.opt
Original file line number Diff line number Diff line change
Expand Up @@ -267,7 +267,7 @@ define ZipItem {
Cols ColList
}

# And is the boolean conjunction operator that evalutes to true only if both of
# And is the boolean conjunction operator that evaluates to true only if both of
# its conditions evaluate to true.
[Scalar, Bool]
define And {
Expand Down Expand Up @@ -630,8 +630,8 @@ define UnaryCbrt {
Input ScalarExpr
}

# Cast converts the input expression into an expression of the target type.
# Note that the conversion may cause trunction based on the target types' width,
# Cast converts the input expression into an expression of the target type. Note
# that the conversion may cause truncation based on the target types' width,
# such as in this example:
#
# 'hello'::VARCHAR(2)
Expand Down
32 changes: 23 additions & 9 deletions pkg/sql/opt/optbuilder/insert.go
Original file line number Diff line number Diff line change
Expand Up @@ -255,6 +255,7 @@ func (b *Builder) buildInsert(ins *tree.Insert, inScope *scope) (outScope *scope
//
// INSERT INTO <table> DEFAULT VALUES
//
isUpsert := ins.OnConflict != nil && !ins.OnConflict.DoNothing
if !ins.DefaultValues() {
// Replace any DEFAULT expressions in the VALUES clause, if a VALUES clause
// exists:
Expand All @@ -263,15 +264,15 @@ func (b *Builder) buildInsert(ins *tree.Insert, inScope *scope) (outScope *scope
//
rows := mb.replaceDefaultExprs(ins.Rows)

mb.buildInputForInsert(inScope, rows)
mb.buildInputForInsert(inScope, rows, isUpsert)
} else {
mb.buildInputForInsert(inScope, nil /* rows */)
mb.buildInputForInsert(inScope, nil /* rows */, isUpsert)
}

// Add default columns that were not explicitly specified by name or
// implicitly targeted by input columns. Also add any computed columns. In
// both cases, include columns undergoing mutations in the write-only state.
mb.addSynthesizedColsForInsert()
mb.addSynthesizedColsForInsert(isUpsert)

var returning tree.ReturningExprs
if resultsNeeded(ins.Returning) {
Expand Down Expand Up @@ -553,7 +554,9 @@ func (mb *mutationBuilder) addTargetTableColsForInsert(maxCols int) {

// buildInputForInsert constructs the memo group for the input expression and
// constructs a new output scope containing that expression's output columns.
func (mb *mutationBuilder) buildInputForInsert(inScope *scope, inputRows *tree.Select) {
func (mb *mutationBuilder) buildInputForInsert(
inScope *scope, inputRows *tree.Select, isUpsert bool,
) {
// Handle DEFAULT VALUES case by creating a single empty row as input.
if inputRows == nil {
mb.outScope = inScope.push()
Expand Down Expand Up @@ -603,16 +606,23 @@ func (mb *mutationBuilder) buildInputForInsert(inScope *scope, inputRows *tree.S
mb.addTargetTableColsForInsert(len(mb.outScope.cols))
}

if !isUpsert {
mb.outScope = mb.addAssignmentCasts(mb.outScope, desiredTypes)
}

// Loop over input columns and:
// 1. Type check each column
// 2. Check if the INSERT violates a GENERATED ALWAYS AS IDENTITY column.
// 2. Assign name to each column
// 3. Add column ID to the insertColIDs list.
for i := range mb.outScope.cols {
inCol := &mb.outScope.cols[i]
ord := mb.tabID.ColumnOrdinal(mb.targetColList[i])

// Type check the input column against the corresponding table column.
checkDatumTypeFitsColumnType(mb.tab.Column(ord), inCol.typ)
if isUpsert {
// Type check the input column against the corresponding table column.
checkDatumTypeFitsColumnType(mb.tab.Column(ord), inCol.typ)
}

// Check if the input column is created with `GENERATED ALWAYS AS IDENTITY`
// syntax. If yes, and user does not specify the `OVERRIDING SYSTEM VALUE`
Expand All @@ -634,7 +644,7 @@ func (mb *mutationBuilder) buildInputForInsert(inScope *scope, inputRows *tree.S
// columns that are not yet part of the target column list. This includes all
// write-only mutation columns, since they must always have default or computed
// values.
func (mb *mutationBuilder) addSynthesizedColsForInsert() {
func (mb *mutationBuilder) addSynthesizedColsForInsert(isUpsert bool) {
// Start by adding non-computed columns that have not already been explicitly
// specified in the query. Do this before adding computed columns, since those
// may depend on non-computed columns.
Expand All @@ -646,13 +656,17 @@ func (mb *mutationBuilder) addSynthesizedColsForInsert() {

// Possibly round DECIMAL-related columns containing insertion values (whether
// synthesized or not).
mb.roundDecimalValues(mb.insertColIDs, false /* roundComputedCols */)
if isUpsert {
mb.roundDecimalValues(mb.insertColIDs, false /* roundComputedCols */)
}

// Now add all computed columns.
mb.addSynthesizedComputedCols(mb.insertColIDs, false /* restrict */)

// Possibly round DECIMAL-related computed columns.
mb.roundDecimalValues(mb.insertColIDs, true /* roundComputedCols */)
if isUpsert {
mb.roundDecimalValues(mb.insertColIDs, true /* roundComputedCols */)
}
}

// buildInsert constructs an Insert operator, possibly wrapped by a Project
Expand Down
Loading

0 comments on commit bd658c0

Please sign in to comment.