Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
74825: colexechash: fix an internal error with distinct mode r=yuzefovich a=yuzefovich **colexechash: fix an internal error with distinct mode** This commit fixes a bug with the hash table when it is used by the unordered distinct when NULLs are treated as different. This is the case when UPSERT or INSERT ... ON CONFLICT queries have to perform `upsert-distinct-on` operation. The problem was that we were updating some internal state (`GroupID` slice responsible for tracking what is the current duplicate candidate for each row being probed) in more cases than necessary. The code path in question is used for two purposes: - first, when we're removing the duplicates from within the batch, without looking at the state of the hash table at all. In this case we do want the update mentioned above; - next, when the batch only contains unique rows, we want to remove the duplicates when comparing against the hash table. In this case we do not want the update. The bug is fixed by refactoring the code to not update the internal state at all; instead, we now rely on the `distinct` flag for each row to tell us that the row is distinct within the batch, and we then correctly populate `HeadID` value for it (which was the ultimate goal all the time, and previously we used `GroupID` value as an intermediary). This mistake would not result in incorrect results (because `distinct` flag is still marked correctly) and could only result in an internal error due to index out of bounds. In particular, for the error to occur the last row in the vectorized batch must have a NULL value in any column (except for the last one) used for the distinctness check. Fixes: #74795. Release note (bug fix): Previously, CockroachDB could encounter an internal error when performing UPSERT or INSERT ... ON CONFLICT queries in some cases when the new rows contained NULL values (either NULLS explicitly specified or NULLs used since some columns were omitted). **colexechash: cleanup the previous commit** The idea behind this commit is to remove some dead code (and actually get a minor performance improvement) by skipping the check of a conditional that is always true as well as removing another conditional that is always false. We chose to not squash this commit into the previous one because the latter will be backported, and we want to reduce the amount of code changes for the backport. Release note: None **colexectestutils: increase test coverage by randomizing batch length** We are already randomizing the max batch size used in a run of our unit tests; however, previously it would be fixed for a single run. This commit makes it so that on each batch produced by our test operators we independently decide what length to use - with 50% chance we keep on using the maximum while in the remaining cases we'll pick a random length that doesn't exceed the maximum. Release note: None **colexechash: remove some dead code** This commit takes advantage of the observation that `probeSel` argument in some templated function calls is never `nil` and removes some code that was present for the case when it is `nil`. Release note: None **colexechash: combine two conditionals into one in distinct mode** Previously, we had the code of the form ``` if keyID != 0 { ... } if keyID == 0 { ... } ``` being generated for the distinct mode. This commit adjusts the template to generate ``` if keyID != 0 { ... } else { ... } ``` instead. Release note: None Co-authored-by: Yahor Yuzefovich <[email protected]>
- Loading branch information