Skip to content

Commit

Permalink
Merge #74825
Browse files Browse the repository at this point in the history
74825: colexechash: fix an internal error with distinct mode r=yuzefovich a=yuzefovich

**colexechash: fix an internal error with distinct mode**

This commit fixes a bug with the hash table when it is used by the
unordered distinct when NULLs are treated as different. This is the case
when UPSERT or INSERT ... ON CONFLICT queries have to perform
`upsert-distinct-on` operation.

The problem was that we were updating some internal state (`GroupID`
slice responsible for tracking what is the current duplicate candidate
for each row being probed) in more cases than necessary. The code path
in question is used for two purposes:
- first, when we're removing the duplicates from within the batch,
without looking at the state of the hash table at all. In this case we
do want the update mentioned above;
- next, when the batch only contains unique rows, we want to remove the
duplicates when comparing against the hash table. In this case we do not
want the update.

The bug is fixed by refactoring the code to not update the internal
state at all; instead, we now rely on the `distinct` flag for each row
to tell us that the row is distinct within the batch, and we then
correctly populate `HeadID` value for it (which was the ultimate goal
all the time, and previously we used `GroupID` value as an
intermediary).

This mistake would not result in incorrect results (because `distinct`
flag is still marked correctly) and could only result in an internal
error due to index out of bounds. In particular, for the error to occur
the last row in the vectorized batch must have a NULL value in any
column (except for the last one) used for the distinctness check.

Fixes: #74795.

Release note (bug fix): Previously, CockroachDB could encounter an
internal error when performing UPSERT or INSERT ... ON CONFLICT queries
in some cases when the new rows contained NULL values (either NULLS
explicitly specified or NULLs used since some columns were omitted).

**colexechash: cleanup the previous commit**

The idea behind this commit is to remove some dead code (and actually get
a minor performance improvement) by skipping the check of a conditional
that is always true as well as removing another conditional that is
always false.

We chose to not squash this commit into the previous one because the
latter will be backported, and we want to reduce the amount of code
changes for the backport.

Release note: None

**colexectestutils: increase test coverage by randomizing batch length**

We are already randomizing the max batch size used in a run of our unit
tests; however, previously it would be fixed for a single run. This
commit makes it so that on each batch produced by our test operators we
independently decide what length to use - with 50% chance we keep on
using the maximum while in the remaining cases we'll pick a random
length that doesn't exceed the maximum.

Release note: None

**colexechash: remove some dead code**

This commit takes advantage of the observation that `probeSel` argument
in some templated function calls is never `nil` and removes some code
that was present for the case when it is `nil`.

Release note: None

**colexechash: combine two conditionals into one in distinct mode**

Previously, we had the code of the form
```
if keyID != 0 {
...
}
if keyID == 0 {
...
}
```
being generated for the distinct mode. This commit adjusts the template
to generate
```
if keyID != 0 {
...
} else {
...
}
```
instead.

Release note: None

Co-authored-by: Yahor Yuzefovich <[email protected]>
  • Loading branch information
craig[bot] and yuzefovich committed Jan 14, 2022
2 parents 79bb083 + 2490224 commit 677ea44
Show file tree
Hide file tree
Showing 7 changed files with 6,177 additions and 9,239 deletions.
1 change: 1 addition & 0 deletions pkg/sql/colexec/colexechash/hashtable.go
Original file line number Diff line number Diff line change
Expand Up @@ -518,6 +518,7 @@ func (ht *HashTable) checkCols(probeVecs []coldata.Vec, nToCheck uint64, probeSe

// checkColsForDistinctTuples performs a column by column check to find distinct
// tuples in the probe table that are not present in the build table.
// NOTE: It assumes that probeSel has already been populated and it is not nil.
func (ht *HashTable) checkColsForDistinctTuples(
probeVecs []coldata.Vec, nToCheck uint64, probeSel []int,
) {
Expand Down
Loading

0 comments on commit 677ea44

Please sign in to comment.