-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: vectorized IN evaluation incorrectly assumes the RHS tuple contents are sorted #68979
Comments
I've managed to reduce the problem, but haven't started to debug yet. Leaving a logic test below that highlights the problem so I don't lose it:
|
This seems to be a regression from prior releases, so leaving release-blocker label. |
An even simpler reproduction:
I've bisected to #68289, though I don't believe that introduced the bug, it only revealed it. It looks like there's a false assumption (added in #50337) being made that values in a tuple are always normalized such that they are in order: cockroach/pkg/sql/colexec/select_in_tmpl.go Lines 179 to 180 in 62b3462
However, this is not the case if the tuple does not have all constant elements at optimization time (see this I'm not yet sure the best way to fix this. Some options I've thought of:
Maybe @yuzefovich has some ideas? |
Nice find! I think it'd be get the best of both options - keep the faster binary search when the tuple is constant (and, thus, has been normalized) and fall back to linear scan if it isn't. My understanding is that it is sufficient to simply iterate over all elements of the tuple and check whether each is |
I'm not sure it'd be so simple. When I was debugging, the value in the middle of the tuple was already filled in with a |
I guess this was the case only because of |
roachtest.tlp failed with artifacts on master @ 46cef2c6f0b36ba2f7d551c8ab017832c1b9d592:
Reproduce
See: roachtest README See: CI job to stress roachtests For the CI stress job, click the ellipsis (...) next to the Run button and fill in: * Changes / Build branch: master * Parameters / `env.TESTS`: `^tlp$` * Parameters / `env.COUNT`: <number of runs> |
This is the case without |
I've created an issue, #69327, for the last reported TLP bug in the comment above. It looks to be a separate bug. |
Yes, totally. We run all subqueries upfront, and we store the results in |
Leaving a note here for my future self:
|
roachtest.tlp failed with artifacts on master @ 967ed00f80981ce8848a5e8144ee6fbd29bc95bb:
|
roachtest.tlp failed with artifacts on master @ 0b57dc40deda1206d9a1c215ffdb219bbf182a39:
|
Removing the release-blocker label since the bug with IN is not a regression in 21.2. |
From the perspective of the specific reproduction found by TLP I think it is a regression. The repro does not result in incorrect results in 21.1. But the offending code was merged long ago, so there is a chance some other reproduction would tickle the bug. |
With a query like below, |
The vectorized implementation of an `element IN tuple` expression assumes that the contents of `tuple` are sorted by the optimizer. Based on this assumption, it performs a binary search instead of a linear search. However, the assumption that the optimizer sorts all tuples is incorrect. For example, there are cases where the contents of a tuple are not known at planning-time, so the tuple cannot be sorted. Performing a binary search with an unsorted tuple causes incorrect query results. Now, the vectorized engine sorts tuple contents if they are not already sorted. Fixes cockroachdb#68979 Release justification: This commit fixes a bug with the IN operator that causes incorrect results. Release note (bug fix): A bug has been fixed which caused incorrect evaluation of the `IN` operator when the tuple on the right-hand-side of the operator included a subquery, like `a IN ('foo', (SELECT s FROM t), 'bar')`.
The vectorized implementation of an `element IN tuple` expression assumes that the contents of `tuple` are sorted by the optimizer. Based on this assumption, it performs a binary search instead of a linear search. However, the assumption that the optimizer sorts all tuples is incorrect. For example, there are cases where the contents of a tuple are not known at planning-time, so the tuple cannot be sorted. Performing a binary search with an unsorted tuple causes incorrect query results. Now, the vectorized engine sorts tuple contents if they are not already sorted. Fixes cockroachdb#68979 Release justification: This commit fixes a bug with the IN operator that causes incorrect results. Release note (bug fix): A bug has been fixed which caused incorrect evaluation of the `IN` operator when the tuple on the right-hand-side of the operator included a subquery, like `a IN ('foo', (SELECT s FROM t), 'bar')`.
The vectorized implementation of an `element IN tuple` expression assumes that the contents of `tuple` are sorted by the optimizer. Based on this assumption, it performs a binary search instead of a linear search. However, the assumption that the optimizer sorts all tuples is incorrect. For example, there are cases where the contents of a tuple are not known at planning-time, so the tuple cannot be sorted. Performing a binary search with an unsorted tuple causes incorrect query results. Now, the vectorized engine sorts tuple contents if they are not already sorted. Fixes cockroachdb#68979 Release justification: This commit fixes a bug with the IN operator that causes incorrect results. Release note (bug fix): A bug has been fixed which caused incorrect evaluation of the `IN` operator when the tuple on the right-hand-side of the operator included a subquery, like `a IN ('foo', (SELECT s FROM t), 'bar')`.
The vectorized implementation of an `element IN tuple` expression assumes that the contents of `tuple` are sorted by the optimizer. Based on this assumption, it performs a binary search instead of a linear search. However, the assumption that the optimizer sorts all tuples is incorrect. For example, there are cases where the contents of a tuple are not known at planning-time, so the tuple cannot be sorted. Performing a binary search with an unsorted tuple causes incorrect query results. Now, the vectorized engine sorts tuple contents if they are not already sorted. Fixes cockroachdb#68979 Release justification: This commit fixes a bug with the IN operator that causes incorrect results. Release note (bug fix): A bug has been fixed which caused incorrect evaluation of the `IN` operator when the tuple on the right-hand-side of the operator included a subquery, like `a IN ('foo', (SELECT s FROM t), 'bar')`.
69203: storage: update min version with store cluster version r=sumeerbhola a=jbowens Whenever the store cluster version key is updated, update the minimum storage version too. Fix #69116. Release note: None 69641: server: add SQL STATS COMPACTION to automatic job list r=maryliag,arulajmani,ajwerner a=Azhng Previously, SQL STATS COMPACTION job was not marked as automatic job. This means SQL STATS COMPACTION job would show up in the Jobs Page in DB Console and in the output of SHOW JOBS. This commit adds SQL STATS COMPACTION to a list of automatic jobs (e.g. auto table stats). This means SQL STATS COMPACTION job information would only be present in the output of SHOW AUTOMATIC JOBS. Release justification: Bug fixes and low-risk updates to new functionality Release note (sql change): SQL STATS COMPACTION job now only shows up in the output of SHOW AUTOMATIC JOBS. 69651: colexec: fix IN operator with unsorted tuple r=mgartner a=mgartner The vectorized implementation of an `element IN tuple` expression assumes that the contents of `tuple` are sorted by the optimizer. Based on this assumption, it performs a binary search instead of a linear search. However, the assumption that the optimizer sorts all tuples is incorrect. For example, there are cases where the contents of a tuple are not known at planning-time, so the tuple cannot be sorted. Performing a binary search with an unsorted tuple causes incorrect query results. Now, the vectorized engine sorts tuple contents if they are not already sorted. Fixes #68979 Release justification: This commit fixes a bug with the IN operator that causes incorrect results. Release note (bug fix): A bug has been fixed which caused incorrect evaluation of the `IN` operator when the tuple on the right-hand-side of the operator included a subquery, like `a IN ('foo', (SELECT s FROM t), 'bar')`. 69666: bazel: bazel builder image doesn't run builds as root r=jlinder a=rickystewart Adopt the same `autouseradd` script that we use in `cockroachdb/builder` to avoid creating these files as `root`. Also step down the permissions on some files that `bazci` creates -- they don't need to be so permissive any more. Closes #66162 Release justification: Non-production code change Release note: None 69681: sql: apply zone configs for copied INDEX for ALTER PK on RBR tables r=ajwerner a=otan Release justification: bug fix for existing feature Release note (bug fix): Previously, when using ALTER PRIMARY KEY on a REGIONAL BY ROW table, the copied unique index from the old PRIMARY KEY would not have the correct zone configurations applied. This commit fixes that. Users who encountered this bug should re-create the index. Co-authored-by: Jackson Owens <[email protected]> Co-authored-by: Azhng <[email protected]> Co-authored-by: Marcus Gartner <[email protected]> Co-authored-by: Ricky Stewart <[email protected]> Co-authored-by: Oliver Tan <[email protected]>
The vectorized implementation of an `element IN tuple` expression assumes that the contents of `tuple` are sorted by the optimizer. Based on this assumption, it performs a binary search instead of a linear search. However, the assumption that the optimizer sorts all tuples is incorrect. For example, there are cases where the contents of a tuple are not known at planning-time, so the tuple cannot be sorted. Performing a binary search with an unsorted tuple causes incorrect query results. Now, the vectorized engine sorts tuple contents if they are not already sorted. Fixes cockroachdb#68979 Release justification: This commit fixes a bug with the IN operator that causes incorrect results. Release note (bug fix): A bug has been fixed which caused incorrect evaluation of the `IN` operator when the tuple on the right-hand-side of the operator included a subquery, like `a IN ('foo', (SELECT s FROM t), 'bar')`.
The vectorized implementation of an `element IN tuple` expression assumes that the contents of `tuple` are sorted by the optimizer. Based on this assumption, it performs a binary search instead of a linear search. However, the assumption that the optimizer sorts all tuples is incorrect. For example, there are cases where the contents of a tuple are not known at planning-time, so the tuple cannot be sorted. Performing a binary search with an unsorted tuple causes incorrect query results. Now, the vectorized engine sorts tuple contents if they are not already sorted. Fixes cockroachdb#68979 Release justification: This commit fixes a bug with the IN operator that causes incorrect results. Release note (bug fix): A bug has been fixed which caused incorrect evaluation of the `IN` operator when the tuple on the right-hand-side of the operator included a subquery, like `a IN ('foo', (SELECT s FROM t), 'bar')`.
roachtest.tlp failed with artifacts on master @ 04a41e7915f4a89dcc1d0dbd92466c6adf79ec9f:
Reproduce
See: roachtest README
See: CI job to stress roachtests
For the CI stress job, click the ellipsis (...) next to the Run button and fill in: * Changes / Build branch: master * Parameters / `env.TESTS`: `^tlp$` * Parameters / `env.COUNT`: <number of runs>
This test on roachdash | Improve this report!
The text was updated successfully, but these errors were encountered: