sql: add assignment casts for INSERTs

Casts in Postgres are performed in one of three contexts [1]: 1. An explicit context with `CAST(x AS T)` or `x::T`. 2. An assignment context performed implicitly during an INSERT, UPSERT, or UPDATE. 3. An implicit context during the evaluation of an expression. For example the DATE in `'2021-01-02'::DATE < now()` will be implicitly cast to a TIMESTAMPTZ so that the values can be compared. Not all casts can be performed in all contexts. Postgres's pg_cast table lists valid casts and specifies the maximum cast context in which each can be performed. A cast with a max context of explicit can only be performed in an explicit context. A cast with an assignment max context can be performed in an explicit context or an assignment context. A cast with an implicit max context can be performed in all contexts. Much to my personal disappointment and frustration, there are valid casts that are not listed in Postgres's pg_cast table. These casts are called "automatic I/O conversions" and they allow casting most types to and from the string types: TEXT, VARCHAR, CHAR, NAME, and "char" [2]. We cannot determine these casts' maximum cast context from the pg_cast table, so we rely on the documentation which states that conversions to string types are assignment casts and conversions from string types are explicit casts [3]. -- This commit implements assignment casts for INSERTs. Follow up work will implement assignment casts for UPSERTs and UPDATEs. A cast performed in an assignment context has slightly different behavior than the same cast performed in an explicit context. In an assignment context, the cast will error if the width of the value is too large for the given type. In an explicit context, the value will be truncated to match the width. The one exception is assignment casts to the special "char" type which do truncate values. To support different cast behaviors for different contexts, a new built-in, `crdb_internal.assignment_cast` has been introduced. This function takes two arguments: a value and a type. Because SQL does not have first-class types, a type cannot be passed directly to the built-in. Instead, a `NULL` cast to a type is used as a workaround, similar to the `json_populate_record` built-in. For example, an integer can be assignment-cast to a string with: crdb_internal.assignment_cast(1::INT, NULL::STRING) Optbuilder is responsible for wrapping INSERT columns with the assignment cast built-in function. If an insert column type `T1` is not identical to the table's corresponding target column type `T2`, optbuilder will check if there is a valid cast from `T1` to `T2` with a maximum context that allows an assignment cast. If there is a such a cast, a projection will wrap the column in the assignment cast built-in function. If there is no such cast, a user error will be produced. The introduction of assignment casts fixes minor bugs and addresses some inconsistencies with Postgres's behavior. In general, INSERTS now successfully cast values to target table column types in more cases. As one example, inserting a string into an integer column now succeeds: CREATE TABLE t (i INT) INSERT INTO t VALUES ('1'::STRING) Prior to this commit there was logic that mimicked assignment casts, but it was not correct. Bugs in the implementation caused incorrect behavior when inserting into tables with computed columns. Most notably, a computed column expression that referenced another column `c` was evaluated with the value of `c` before the assignment cast was performed. This resulted in incorrect values for computed columns in some cases. In addition, assignment casts make the special logic for rounding decimal values in optbuilder obsolete. The builtin function `crdb_internal.round_decimal_values` and related logic in optbuilder will be removed once assignment casts are implemented for UPSERTs and UPDATEs. Fixes cockroachdb#69327 Fixes cockroachdb#69665 [1] https://www.postgresql.org/docs/current/typeconv.html [2] https://www.postgresql.org/docs/13/catalog-pg-cast.html#CATALOG-PG-CAST [3] https://www.postgresql.org/docs/13/sql-createcast.html#SQL-CREATECAST-NOTES Release note (sql change): Implicit casts performed during INSERT statements now more closely follow Postgres's behavior. Several minor bugs related to these types of casts have been fixed.
mgartner · Sep 24, 2021 · bd658c0 · bd658c0
1 parent 0bf1138
commit bd658c0
Show file tree

Hide file tree

Showing 31 changed files with 1,597 additions and 1,104 deletions.
diff --git a/docs/generated/sql/functions.md b/docs/generated/sql/functions.md
@@ -2826,6 +2826,8 @@ may increase either contention or retry errors, or both.</p>
 </span></td></tr>
 <tr><td><a name="crdb_internal.approximate_timestamp"></a><code>crdb_internal.approximate_timestamp(timestamp: <a href="decimal.html">decimal</a>) &rarr; <a href="timestamp.html">timestamp</a></code></td><td><span class="funcdesc"><p>Converts the crdb_internal_mvcc_timestamp column into an approximate timestamp.</p>
 </span></td></tr>
+<tr><td><a name="crdb_internal.assignment_cast"></a><code>crdb_internal.assignment_cast(val: anyelement, type: anyelement) &rarr; anyelement</code></td><td><span class="funcdesc"><p>This function is used internally to perform assignment casts during mutations.</p>
+</span></td></tr>
 <tr><td><a name="crdb_internal.check_consistency"></a><code>crdb_internal.check_consistency(stats_only: <a href="bool.html">bool</a>, start_key: <a href="bytes.html">bytes</a>, end_key: <a href="bytes.html">bytes</a>) &rarr; tuple{int AS range_id, bytes AS start_key, string AS start_key_pretty, string AS status, string AS detail}</code></td><td><span class="funcdesc"><p>Runs a consistency check on ranges touching the specified key range. an empty start or end key is treated as the minimum and maximum possible, respectively. stats_only should only be set to false when targeting a small number of ranges to avoid overloading the cluster. Each returned row contains the range ID, the status (a roachpb.CheckConsistencyResponse_Status), and verbose detail.</p>
 <p>Example usage:
 SELECT * FROM crdb_internal.check_consistency(true, ‘\x02’, ‘\x04’)</p>

diff --git a/pkg/sql/colexec/builtin_funcs.go b/pkg/sql/colexec/builtin_funcs.go
@@ -11,6 +11,8 @@
 package colexec
 
 import (
+	"errors"
+
 	"github.com/cockroachdb/cockroach/pkg/col/coldata"
 	"github.com/cockroachdb/cockroach/pkg/sql/colconv"
 	"github.com/cockroachdb/cockroach/pkg/sql/colexec/colexecutils"
@@ -118,7 +120,11 @@ func NewBuiltinFunctionOperator(
 	outputIdx int,
 	input colexecop.Operator,
 ) (colexecop.Operator, error) {
-	switch funcExpr.ResolvedOverload().SpecializedVecBuiltin {
+	overload := funcExpr.ResolvedOverload()
+	if overload.FnWithExprs != nil {
+		return nil, errors.New("builtins with FnWithExprs are not supported in the vectorized engine")
+	}
+	switch overload.SpecializedVecBuiltin {
 	case tree.SubstringStringIntInt:
 		input = colexecutils.NewVectorTypeEnforcer(allocator, input, types.String, outputIdx)
 		return newSubstringOperator(

diff --git a/pkg/sql/insert.go b/pkg/sql/insert.go
@@ -123,7 +123,7 @@ func (r *insertRun) initRowContainer(params runParams, columns colinfo.ResultCol
 // processSourceRow processes one row from the source for insertion and, if
 // result rows are needed, saves it in the result row container.
 func (r *insertRun) processSourceRow(params runParams, rowVals tree.Datums) error {
-	if err := enforceLocalColumnConstraints(rowVals, r.insertCols); err != nil {
+	if err := enforceLocalColumnConstraints(rowVals, r.insertCols, false /* isUpdate */); err != nil {
 		return err
 	}
 

diff --git a/pkg/sql/logictest/testdata/logic_test/cast b/pkg/sql/logictest/testdata/logic_test/cast
@@ -1,3 +1,88 @@
+# Tests for assignment casts.
+subtest assignment_casts
+
+statement ok
+CREATE TABLE assn_cast (
+  c CHAR,
+  vc VARCHAR(1),
+  qc "char",
+  b BIT,
+  i INT,
+  t timestamp
+)
+
+statement ok
+INSERT INTO assn_cast(c) VALUES ('a')
+
+statement error value too long for type CHAR
+INSERT INTO assn_cast(c) VALUES ('abc')
+
+query T
+INSERT INTO assn_cast(c) VALUES (1) RETURNING c
+----
+1
+
+statement error value too long for type CHAR
+INSERT INTO assn_cast(c) VALUES (123)
+
+statement ok
+INSERT INTO assn_cast(vc) VALUES ('a')
+
+statement error value too long for type VARCHAR\(1\)
+INSERT INTO assn_cast(vc) VALUES ('abc')
+
+query T
+INSERT INTO assn_cast(c) VALUES (1) RETURNING c
+----
+1
+
+statement error value too long for type CHAR
+INSERT INTO assn_cast(c) VALUES (123)
+
+statement ok
+INSERT INTO assn_cast(qc) VALUES ('a')
+
+query T
+INSERT INTO assn_cast(qc) VALUES ('abc') RETURNING qc
+----
+a
+
+# Note: This statement fails in Postgres because the default integer type is an
+# INT4, and the INT4 -> "char" cast is explicit. Our default integer type
+# is an INT8 and INT8 -> "char" is an assignment cast.
+#
+# TODO(mgartner): This should return '{'.
+query T
+INSERT INTO assn_cast(qc) VALUES (123) RETURNING qc
+----
+1
+
+# TODO(mgartner): This should be an out of range error.
+statement ok
+INSERT INTO assn_cast(qc) VALUES (1234)
+
+statement ok
+INSERT into assn_cast(b) VALUES ('1')
+
+# TODO(mgartner): To match Postgres behavior, this statement should fail with
+# the message "value too long for type BIT".
+statement ok
+INSERT into assn_cast(b) VALUES ('01')
+
+statement error value type int doesn't match type bit of column \"b\"
+INSERT into assn_cast(b) VALUES (1)
+
+statement ok
+INSERT INTO assn_cast(i) VALUES ('1')
+
+statement error value type string doesn't match type int of column \"i\"
+INSERT INTO assn_cast(i) VALUES ('1'::STRING)
+
+statement ok
+INSERT INTO assn_cast(t) VALUES ('1970-01-01'::timestamptz)
+
+subtest regressions
+
 statement ok
 CREATE TABLE t45837 AS SELECT 1.25::decimal AS d
 

diff --git a/pkg/sql/logictest/testdata/logic_test/computed b/pkg/sql/logictest/testdata/logic_test/computed
@@ -883,9 +883,17 @@ CREATE TABLE x (
   b INT AS (a+1) STORED
 )
 
-query error value type decimal doesn't match type int of column "a"
+statement ok
 INSERT INTO x VALUES(1.4)
 
+query II
+SELECT * FROM x
+----
+1  2
+
+query error value type date doesn't match type int of column "a"
+INSERT INTO x VALUES('1970-01-01'::date)
+
 # Regression test for #34901: verify that builtins can be used in computed
 # column expressions without a "memory budget exceeded" error while backfilling
 statement ok
@@ -1025,3 +1033,33 @@ SET experimental_computed_column_rewrites = "bad"
 
 statement error invalid column rewrites expression
 SET CLUSTER SETTING sql.defaults.experimental_computed_column_rewrites = "bad"
+
+# Regression test for #69327. Computed columns should be evaluated after
+# assignment casts have been performed.
+statement ok
+CREATE TABLE t69327 (
+  c "char",
+  v STRING AS (c) STORED
+);
+INSERT INTO t69327 VALUES ('foo'::STRING)
+
+# Both columns should have a value of "f".
+query TT
+SELECT * FROM t69327
+----
+f  f
+
+# Regression test for #69665.Computed columns should be evaluated after
+# assignment casts have been performed.
+statement ok
+CREATE TABLE t69665 (
+  c CHAR,
+  v STRING AS (c) STORED
+);
+INSERT INTO t69665 VALUES (' '::STRING)
+
+# Both columns should be empty values.
+query II
+SELECT length(c), length(v) FROM t69665
+----
+0  0
diff --git a/pkg/sql/logictest/testdata/logic_test/insert b/pkg/sql/logictest/testdata/logic_test/insert
@@ -129,26 +129,31 @@ INSERT INTO kv4 (int, bool) VALUES (3, 'a')
 statement ok
 INSERT INTO kv4 (int, bool) VALUES (3, true)
 
-statement error value type int doesn't match type char of column "char"
+statement error value too long for type CHAR
+INSERT INTO kv4 (int, char) VALUES (4, 11)
+
+statement ok
 INSERT INTO kv4 (int, char) VALUES (4, 1)
 
 statement ok
-INSERT INTO kv4 (int, char) VALUES (4, 'a')
+INSERT INTO kv4 (int, char) VALUES (5, 'a')
 
-statement error value type int doesn't match type float of column "float"
-INSERT INTO kv4 (int, float) VALUES (5, 1::INT)
+statement ok
+INSERT INTO kv4 (int, float) VALUES (6, 1::INT)
 
 statement ok
-INSERT INTO kv4 (int, float) VALUES (5, 2.3)
+INSERT INTO kv4 (int, float) VALUES (7, 2.3)
 
 query ITBTR rowsort
 SELECT * from kv4
 ----
 1  NULL  NULL  NULL  NULL
 2  1     NULL  NULL  NULL
 3  NULL  true  NULL  NULL
-4  NULL  NULL  a     NULL
-5  NULL  NULL  NULL  2.3
+4  NULL  NULL  1     NULL
+5  NULL  NULL  a     NULL
+6  NULL  NULL  NULL  1
+7  NULL  NULL  NULL  2.3
 
 statement ok
 CREATE TABLE kv5 (
@@ -446,7 +451,7 @@ INSERT INTO string_t VALUES ('str')
 query error value type string doesn't match type bytes of column "b"
 INSERT INTO bytes_t SELECT * FROM string_t
 
-query error value type bytes doesn't match type string of column "s"
+statement ok
 INSERT INTO string_t SELECT * FROM bytes_t
 
 subtest string_width_check

diff --git a/pkg/sql/logictest/testdata/logic_test/typing b/pkg/sql/logictest/testdata/logic_test/typing
@@ -13,8 +13,8 @@ SELECT * FROM f
 statement ok
 CREATE TABLE i (x INT)
 
-statement error value type decimal doesn't match type int of column "x"
-INSERT INTO i(x) VALUES (4.5)
+statement error value type timestamptz doesn't match type int of column "x"
+INSERT INTO i(x) VALUES ('1970-01-01'::timestamptz)
 
 statement ok
 INSERT INTO i(x) VALUES (2.0)

diff --git a/pkg/sql/opt/ops/scalar.opt b/pkg/sql/opt/ops/scalar.opt
@@ -267,7 +267,7 @@ define ZipItem {
     Cols ColList
 }
 
-# And is the boolean conjunction operator that evalutes to true only if both of
+# And is the boolean conjunction operator that evaluates to true only if both of
 # its conditions evaluate to true.
 [Scalar, Bool]
 define And {
@@ -630,8 +630,8 @@ define UnaryCbrt {
     Input ScalarExpr
 }
 
-# Cast converts the input expression into an expression of the target type.
-# Note that the conversion may cause trunction based on the target types' width,
+# Cast converts the input expression into an expression of the target type. Note
+# that the conversion may cause truncation based on the target types' width,
 # such as in this example:
 #
 #   'hello'::VARCHAR(2)

diff --git a/pkg/sql/opt/optbuilder/insert.go b/pkg/sql/opt/optbuilder/insert.go
@@ -255,6 +255,7 @@ func (b *Builder) buildInsert(ins *tree.Insert, inScope *scope) (outScope *scope
 	//
 	//   INSERT INTO <table> DEFAULT VALUES
 	//
+	isUpsert := ins.OnConflict != nil && !ins.OnConflict.DoNothing
 	if !ins.DefaultValues() {
 		// Replace any DEFAULT expressions in the VALUES clause, if a VALUES clause
 		// exists:
@@ -263,15 +264,15 @@ func (b *Builder) buildInsert(ins *tree.Insert, inScope *scope) (outScope *scope
 		//
 		rows := mb.replaceDefaultExprs(ins.Rows)
 
-		mb.buildInputForInsert(inScope, rows)
+		mb.buildInputForInsert(inScope, rows, isUpsert)
 	} else {
-		mb.buildInputForInsert(inScope, nil /* rows */)
+		mb.buildInputForInsert(inScope, nil /* rows */, isUpsert)
 	}
 
 	// Add default columns that were not explicitly specified by name or
 	// implicitly targeted by input columns. Also add any computed columns. In
 	// both cases, include columns undergoing mutations in the write-only state.
-	mb.addSynthesizedColsForInsert()
+	mb.addSynthesizedColsForInsert(isUpsert)
 
 	var returning tree.ReturningExprs
 	if resultsNeeded(ins.Returning) {
@@ -553,7 +554,9 @@ func (mb *mutationBuilder) addTargetTableColsForInsert(maxCols int) {
 
 // buildInputForInsert constructs the memo group for the input expression and
 // constructs a new output scope containing that expression's output columns.
-func (mb *mutationBuilder) buildInputForInsert(inScope *scope, inputRows *tree.Select) {
+func (mb *mutationBuilder) buildInputForInsert(
+	inScope *scope, inputRows *tree.Select, isUpsert bool,
+) {
 	// Handle DEFAULT VALUES case by creating a single empty row as input.
 	if inputRows == nil {
 		mb.outScope = inScope.push()
@@ -603,16 +606,23 @@ func (mb *mutationBuilder) buildInputForInsert(inScope *scope, inputRows *tree.S
 		mb.addTargetTableColsForInsert(len(mb.outScope.cols))
 	}
 
+	if !isUpsert {
+		mb.outScope = mb.addAssignmentCasts(mb.outScope, desiredTypes)
+	}
+
 	// Loop over input columns and:
 	//   1. Type check each column
+	//   2. Check if the INSERT violates a GENERATED ALWAYS AS IDENTITY column.
 	//   2. Assign name to each column
 	//   3. Add column ID to the insertColIDs list.
 	for i := range mb.outScope.cols {
 		inCol := &mb.outScope.cols[i]
 		ord := mb.tabID.ColumnOrdinal(mb.targetColList[i])
 
-		// Type check the input column against the corresponding table column.
-		checkDatumTypeFitsColumnType(mb.tab.Column(ord), inCol.typ)
+		if isUpsert {
+			// Type check the input column against the corresponding table column.
+			checkDatumTypeFitsColumnType(mb.tab.Column(ord), inCol.typ)
+		}
 
 		// Check if the input column is created with `GENERATED ALWAYS AS IDENTITY`
 		// syntax. If yes, and user does not specify the `OVERRIDING SYSTEM VALUE`
@@ -634,7 +644,7 @@ func (mb *mutationBuilder) buildInputForInsert(inScope *scope, inputRows *tree.S
 // columns that are not yet part of the target column list. This includes all
 // write-only mutation columns, since they must always have default or computed
 // values.
-func (mb *mutationBuilder) addSynthesizedColsForInsert() {
+func (mb *mutationBuilder) addSynthesizedColsForInsert(isUpsert bool) {
 	// Start by adding non-computed columns that have not already been explicitly
 	// specified in the query. Do this before adding computed columns, since those
 	// may depend on non-computed columns.
@@ -646,13 +656,17 @@ func (mb *mutationBuilder) addSynthesizedColsForInsert() {
 
 	// Possibly round DECIMAL-related columns containing insertion values (whether
 	// synthesized or not).
-	mb.roundDecimalValues(mb.insertColIDs, false /* roundComputedCols */)
+	if isUpsert {
+		mb.roundDecimalValues(mb.insertColIDs, false /* roundComputedCols */)
+	}
 
 	// Now add all computed columns.
 	mb.addSynthesizedComputedCols(mb.insertColIDs, false /* restrict */)
 
 	// Possibly round DECIMAL-related computed columns.
-	mb.roundDecimalValues(mb.insertColIDs, true /* roundComputedCols */)
+	if isUpsert {
+		mb.roundDecimalValues(mb.insertColIDs, true /* roundComputedCols */)
+	}
 }
 
 // buildInsert constructs an Insert operator, possibly wrapped by a Project