Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
colexec: fully support distinct spec
This commit adds the remaining support of Distinct spec to the vectorized engine. Previously, we couldn't support cases when nulls should be treated as distinct or when an error should be returned when a duplicate is detected. These cases are needed for UPSERT queries. This change required some modifications to treat nulls as distinct in several components (the hash table used by the unordered distinct, sort partitioners since they are used by the fallback strategy of the external sort, valuesDiffer used by sort chunks). This commit also introduced a helper struct to emit an error on duplicates. This also required some changes to track whether duplicates were detected in a batch. Release note (sql change): Previously, in some special cases (UPSERTs, as documented by cockroachdb/docs#9922), the support of the distinct operations was missing in the vectorized engine. Now it is added, and such operations will be able to spill to disk if necessary. However, there is a slight complication in case the distinct operator does, in fact, spill to disk. Namely, the order in which rows are inserted can be non-deterministic: for example, for query like `INSERT INTO t VALUES (1, 1), (1, 2), (1, 3) ON CONFLICT DO NOTHING` with `t` having the schema as `a INT PRIMARY KEY, b INT`, it is possible that any of the three rows are actually inserted. It appears that Postgres has the same behavior.
- Loading branch information