release-21.2: colexechash: fix an internal error with distinct mode #74872
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Backport 1/5 commits from #74825.
/cc @cockroachdb/release
colexechash: fix an internal error with distinct mode
This commit fixes a bug with the hash table when it is used by the
unordered distinct when NULLs are treated as different. This is the case
when UPSERT or INSERT ... ON CONFLICT queries have to perform
upsert-distinct-on
operation.The problem was that we were updating some internal state (
GroupID
slice responsible for tracking what is the current duplicate candidate
for each row being probed) in more cases than necessary. The code path
in question is used for two purposes:
without looking at the state of the hash table at all. In this case we
do want the update mentioned above;
duplicates when comparing against the hash table. In this case we do not
want the update.
The bug is fixed by refactoring the code to not update the internal
state at all; instead, we now rely on the
distinct
flag for each rowto tell us that the row is distinct within the batch, and we then
correctly populate
HeadID
value for it (which was the ultimate goalall the time, and previously we used
GroupID
value as anintermediary).
This mistake would not result in incorrect results (because
distinct
flag is still marked correctly) and could only result in an internal
error due to index out of bounds. In particular, for the error to occur
the last row in the vectorized batch must have a NULL value in any
column (except for the last one) used for the distinctness check.
Fixes: #74795.
Release note (bug fix): Previously, CockroachDB could encounter an
internal error when performing UPSERT or INSERT ... ON CONFLICT queries
in some cases when the new rows contained NULL values (either NULLS
explicitly specified or NULLs used since some columns were omitted).