Skip to content

Commit

Permalink
colexec: fully support distinct spec
Browse files Browse the repository at this point in the history
This commit adds the remaining support of Distinct spec to the
vectorized engine. Previously, we couldn't support cases when nulls
should be treated as distinct or when an error should be returned when
a duplicate is detected. These cases are needed for UPSERT queries.

This change required some modifications to treat nulls as distinct in
several components (the hash table used by the unordered distinct, sort
partitioners since they are used by the fallback strategy of the
external sort, valuesDiffer used by sort chunks). This commit also
introduced a helper struct to emit an error on duplicates. This also
required some changes to track whether duplicates were detected in
a batch.

Release note (sql change): Previously, in some special cases (UPSERTs,
as documented by cockroachdb/docs#9922), the
support of the distinct operations was missing in the vectorized
engine. Now it is added, and such operations will be able to spill to
disk if necessary. However, there is a slight complication in case the
distinct operator does, in fact, spill to disk. Namely, the order in
which rows are inserted can be non-deterministic: for example, for query
like `INSERT INTO t VALUES (1, 1), (1, 2), (1, 3) ON CONFLICT DO NOTHING`
with `t` having the schema as `a INT PRIMARY KEY, b INT`, it is possible
that any of the three rows are actually inserted. It appears that
Postgres has the same behavior.
  • Loading branch information
yuzefovich committed Jun 14, 2021
1 parent b7d8fd1 commit d1cbf20
Show file tree
Hide file tree
Showing 26 changed files with 1,064 additions and 486 deletions.
12 changes: 5 additions & 7 deletions pkg/sql/colexec/colbuilder/execplan.go
Original file line number Diff line number Diff line change
Expand Up @@ -189,12 +189,6 @@ func supportedNatively(spec *execinfrapb.ProcessorSpec) error {
return nil

case spec.Core.Distinct != nil:
if spec.Core.Distinct.NullsAreDistinct {
return errors.Newf("distinct with unique nulls not supported")
}
if spec.Core.Distinct.ErrorOnDup != "" {
return errors.Newf("distinct with error on duplicates not supported")
}
return nil

case spec.Core.Ordinality != nil:
Expand Down Expand Up @@ -954,7 +948,10 @@ func NewColOperator(
result.ColumnTypes = make([]*types.T, len(spec.Input[0].ColumnTypes))
copy(result.ColumnTypes, spec.Input[0].ColumnTypes)
if len(core.Distinct.OrderedColumns) == len(core.Distinct.DistinctColumns) {
result.Root, err = colexecbase.NewOrderedDistinct(inputs[0].Root, core.Distinct.OrderedColumns, result.ColumnTypes)
result.Root, err = colexecbase.NewOrderedDistinct(
inputs[0].Root, core.Distinct.OrderedColumns, result.ColumnTypes,
core.Distinct.NullsAreDistinct, core.Distinct.ErrorOnDup,
)
} else {
// We have separate unit tests that instantiate in-memory
// distinct operators, so we don't need to look at
Expand All @@ -970,6 +967,7 @@ func NewColOperator(
allocator := colmem.NewAllocator(ctx, distinctMemAccount, factory)
inMemoryUnorderedDistinct := colexec.NewUnorderedDistinct(
allocator, inputs[0].Root, core.Distinct.DistinctColumns, result.ColumnTypes,
core.Distinct.NullsAreDistinct, core.Distinct.ErrorOnDup,
)
edOpName := "external-distinct"
diskAccount := result.createDiskAccount(ctx, flowCtx, edOpName, spec.ProcessorID)
Expand Down
2 changes: 2 additions & 0 deletions pkg/sql/colexec/colexecbase/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,8 @@ go_library(
"//pkg/sql/colexecerror",
"//pkg/sql/colexecop",
"//pkg/sql/colmem", # keep
"//pkg/sql/pgwire/pgcode",
"//pkg/sql/pgwire/pgerror",
"//pkg/sql/sem/tree", # keep
"//pkg/sql/types",
"//pkg/util/duration", # keep
Expand Down
Loading

0 comments on commit d1cbf20

Please sign in to comment.