Skip to content

Commit

Permalink
sql: propagate result types to inputs in set ops with nulls and tuples
Browse files Browse the repository at this point in the history
The physical planning for set operations is special because it allows
for the result types from its inputs to be of different type family in
some cases. Namely, it is ok if one input has `types.Unknown` whereas
the other input is of some "known" type. In order to handle such case
the execbuilder plans casts; however, currently it is not possible to
plan casts to `Tuple` type. As a result, the vectorized engine is not
able to execute the query because it expects the data coming from both
inputs to be of the same type.

This commit modifies the logic of reconciling the result types from both
inputs to additionally update unknown types on the inputs if the other
input is of a Tuple type. Such behavior is acceptable because in the
original plan we only expected to have NULL values, and the Tuple type
is able to handle such case too.

The problem can also be reproduced in a logic test with the row-by-row
engine in the `fakedist` setting. This bug was introduced in
a76ee31, and before that change we had
the following behavior: overwrite `plan.ResultTypes` to the merged types
and then plan a stage of distinct processors if we have UNION. This
resulted in the input spec for the input stream to distinct having
correctly set types. After the change, that was no longer the case.

Before that change the vectorized engine couldn't handle such queries
due to type mismatch during `SupportsVectorized` check, so we fell back
to the row-by-row engine. However, this commit makes it so that the
vectorized engine is able to execute such queries.

Release note (bug fix): CockroachDB previously could encounter an
internal error when performing UNION operation when the first input
resulted only in NULL values and consequent inputs produce tuples, and
this is now fixed. Only 21.1 alpha versions are affected.
  • Loading branch information
yuzefovich committed Feb 22, 2021
1 parent 3fc46fb commit 4208a25
Show file tree
Hide file tree
Showing 4 changed files with 84 additions and 14 deletions.
2 changes: 1 addition & 1 deletion pkg/sql/distsql_physical_planner.go
Original file line number Diff line number Diff line change
Expand Up @@ -3268,7 +3268,7 @@ func (dsp *DistSQLPlanner) createPlanForSetOp(
p.PlanToStreamColMap = planToStreamColMap

// Merge the plans' result types and merge ordering.
resultTypes, err := mergeResultTypes(leftPlan.GetResultTypes(), rightPlan.GetResultTypes())
resultTypes, err := mergeResultTypesForSetOp(leftPlan, rightPlan)
if err != nil {
return nil, err
}
Expand Down
40 changes: 29 additions & 11 deletions pkg/sql/distsql_plan_set_op.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,14 @@ import (
"github.com/cockroachdb/errors"
)

// mergeResultTypes reconciles the ResultTypes between two plans. It enforces
// that each pair of ColumnTypes must either match or be null, in which case the
// non-null type is used. This logic is necessary for cases like
// SELECT NULL UNION SELECT 1.
func mergeResultTypes(left, right []*types.T) ([]*types.T, error) {
// mergeResultTypesForSetOp reconciles the ResultTypes between two plans. It
// enforces that each pair of ColumnTypes must either match or be null, in which
// case the non-null type is used. This logic is necessary for cases like SELECT
// NULL UNION SELECT 1.
//
// This method is intended to be used only for planning of set operations.
func mergeResultTypesForSetOp(leftPlan, rightPlan *PhysicalPlan) ([]*types.T, error) {
left, right := leftPlan.GetResultTypes(), rightPlan.GetResultTypes()
if len(left) != len(right) {
return nil, errors.Errorf("ResultTypes length mismatch: %d and %d", len(left), len(right))
}
Expand All @@ -30,19 +33,34 @@ func mergeResultTypes(left, right []*types.T) ([]*types.T, error) {
merged[i] = leftType
} else if leftType.Family() == types.UnknownFamily {
merged[i] = rightType
} else if equivalentTypes(leftType, rightType) {
} else if leftType.Equivalent(rightType) {
// The types are equivalent for the purpose of UNION. Precision,
// Width, Oid, etc. do not affect the merging of values.
merged[i] = leftType
} else {
return nil, errors.Errorf(
"conflicting ColumnTypes: %s and %s", leftType.DebugString(), rightType.DebugString())
}
}
updateUnknownTypesForSetOp(leftPlan, merged)
updateUnknownTypesForSetOp(rightPlan, merged)
return merged, nil
}

// equivalentType checks whether a column type is equivalent to another for the
// purpose of UNION. Precision, Width, Oid, etc. do not affect the merging of
// values.
func equivalentTypes(c, other *types.T) bool {
return c.Equivalent(other)
// updateUnknownTypesForSetOp modifies plan's output types of the
// types.UnknownFamily type family to be of the corresponding Tuple type coming
// from the merged types. This is needed because at the moment the execbuilder
// is not able to plan casts to tuples.
//
// This method is intended to be used only for planning of set operations.
// TODO(yuzefovich): remove this once the execbuilder plans casts to tuples.
func updateUnknownTypesForSetOp(plan *PhysicalPlan, merged []*types.T) {
currentTypes := plan.GetResultTypes()
for i := range merged {
if merged[i].Family() == types.TupleFamily && currentTypes[i].Family() == types.UnknownFamily {
for _, procIdx := range plan.ResultRouters {
plan.Processors[procIdx].Spec.ResultTypes[i] = merged[i]
}
}
}
}
32 changes: 30 additions & 2 deletions pkg/sql/distsql_plan_set_op_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -14,18 +14,22 @@ import (
"reflect"
"testing"

"github.com/cockroachdb/cockroach/pkg/roachpb"
"github.com/cockroachdb/cockroach/pkg/sql/physicalplan"
"github.com/cockroachdb/cockroach/pkg/sql/types"
"github.com/cockroachdb/cockroach/pkg/util/leaktest"
"github.com/cockroachdb/cockroach/pkg/util/log"
"github.com/cockroachdb/cockroach/pkg/util/uuid"
)

func TestMergeResultTypes(t *testing.T) {
func TestMergeResultTypesForSetOp(t *testing.T) {
defer leaktest.AfterTest(t)()
defer log.Scope(t).Close(t)

empty := []*types.T{}
null := []*types.T{types.Unknown}
typeInt := []*types.T{types.Int}
typeTuple := []*types.T{types.MakeTuple(typeInt)}

testData := []struct {
name string
Expand All @@ -41,10 +45,32 @@ func TestMergeResultTypes(t *testing.T) {
{"left null", null, typeInt, &typeInt, false},
{"right null", typeInt, null, &typeInt, false},
{"both int", typeInt, typeInt, &typeInt, false},
{"left null, right tuple", null, typeTuple, &typeTuple, false},
{"right null, left tuple", typeTuple, null, &typeTuple, false},
}
checkUnknownTypesUpdate := func(plan PhysicalPlan, orig, merged []*types.T) {
for i, typ := range plan.GetResultTypes() {
if orig[i].Family() == types.UnknownFamily {
if typ.Family() == types.UnknownFamily && merged[i].Family() == types.TupleFamily {
t.Fatal("should have updated types NULL to tuple type on the original plan")
}
if typ.Family() != types.UnknownFamily && merged[i].Family() != types.TupleFamily {
t.Fatal("should have NOT updated types NULL to tuple type on the original plan")
}
}
}
}
infra := physicalplan.MakePhysicalInfrastructure(uuid.FastMakeV4(), roachpb.NodeID(1))
var leftPlan, rightPlan PhysicalPlan
leftPlan.PhysicalInfrastructure = &infra
rightPlan.PhysicalInfrastructure = &infra
leftPlan.ResultRouters = []physicalplan.ProcessorIdx{infra.AddProcessor(physicalplan.Processor{})}
rightPlan.ResultRouters = []physicalplan.ProcessorIdx{infra.AddProcessor(physicalplan.Processor{})}
for _, td := range testData {
t.Run(td.name, func(t *testing.T) {
result, err := mergeResultTypes(td.left, td.right)
leftPlan.Processors[0].Spec.ResultTypes = td.left
rightPlan.Processors[1].Spec.ResultTypes = td.right
result, err := mergeResultTypesForSetOp(&leftPlan, &rightPlan)
if td.err {
if err == nil {
t.Fatalf("expected error, got %+v", result)
Expand All @@ -57,6 +83,8 @@ func TestMergeResultTypes(t *testing.T) {
if !reflect.DeepEqual(*td.expected, result) {
t.Fatalf("expected %+v, got %+v", *td.expected, result)
}
checkUnknownTypesUpdate(leftPlan, td.left, result)
checkUnknownTypesUpdate(rightPlan, td.right, result)
})
}
}
24 changes: 24 additions & 0 deletions pkg/sql/logictest/testdata/logic_test/union
Original file line number Diff line number Diff line change
Expand Up @@ -366,3 +366,27 @@ SELECT b FROM ab UNION SELECT a FROM ab
----
1
2

# Regression test for the vectorized engine not being able to handle a UNION
# between NULL and a tuple (#59611).
statement ok
CREATE TABLE t59611 (a INT);
INSERT INTO t59611 VALUES (1)

query T
WITH
cte (cte_col)
AS (
SELECT
*
FROM
(VALUES ((SELECT NULL FROM t59611 LIMIT 1:::INT8)))
UNION SELECT * FROM (VALUES ((1, 2)))
)
SELECT
NULL
FROM
cte
----
NULL
NULL

0 comments on commit 4208a25

Please sign in to comment.