-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
colexec: fix IN operator with unsorted tuple #69651
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 5 files at r1, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis and @mgartner)
pkg/sql/colexec/select_in_tmpl.go, line 232 at r1 (raw file):
// Sort si.filterRow once. We perform the sort here instead of in // fillDatumRow_TYPE because the compare overload requires the eval // context of a coldata.DatumVec target column.
nit: I see why we have to do it here, but it seems a bit ugly. Please leave a TODO(yuzefovich) to clean this up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis and @mgartner)
pkg/sql/colexec/select_in_tmpl.go, line 165 at r1 (raw file):
func fillDatumRow_TYPE(t *types.T, datumTuple *tree.DTuple) ([]_GOTYPE, bool) { conv := colconv.GetDatumToPhysicalFn(t)
I guess an alternative would be to call datumTuple.Normalize
here, right? (We can plumb the eval context into the constructor.) Curious why you decided to go with an explicit sort instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis and @mgartner)
pkg/sql/colexec/select_in_tmpl.go, line 165 at r1 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
I guess an alternative would be to call
datumTuple.Normalize
here, right? (We can plumb the eval context into the constructor.) Curious why you decided to go with an explicit sort instead.
Maybe because of the performance reasons?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @jordanlewis and @yuzefovich)
pkg/sql/colexec/select_in_tmpl.go, line 165 at r1 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
Maybe because of the performance reasons?
I found that the DTuple.sorted
field was not reliably set. So I've modified DTuple.sort
to only sort if !sorted
and the contents of the tuple are not already sorted. This is simpler now. Thanks for the suggestion!
pkg/sql/colexec/select_in_tmpl.go, line 232 at r1 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: I see why we have to do it here, but it seems a bit ugly. Please leave a TODO(yuzefovich) to clean this up.
N/A
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 4 files at r2, 2 of 2 files at r3, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @jordanlewis and @mgartner)
pkg/sql/colexec/select_in_tmpl.go, line 163 at r3 (raw file):
hasNulls bool negate bool sorted bool
nit: no longer needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @jordanlewis and @yuzefovich)
pkg/sql/colexec/select_in_tmpl.go, line 163 at r3 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: no longer needed.
Done.
Should we backport this, and, if so, how far? I've been unable to reproduce in earlier versions before #68289 was merged, but I can't entirely rule out the possibility that some reproduction is lurking in 21.1 or 20.2. |
I think we should backport this to 20.2 and 21.1. |
The vectorized implementation of an `element IN tuple` expression assumes that the contents of `tuple` are sorted by the optimizer. Based on this assumption, it performs a binary search instead of a linear search. However, the assumption that the optimizer sorts all tuples is incorrect. For example, there are cases where the contents of a tuple are not known at planning-time, so the tuple cannot be sorted. Performing a binary search with an unsorted tuple causes incorrect query results. Now, the vectorized engine sorts tuple contents if they are not already sorted. Fixes cockroachdb#68979 Release justification: This commit fixes a bug with the IN operator that causes incorrect results. Release note (bug fix): A bug has been fixed which caused incorrect evaluation of the `IN` operator when the tuple on the right-hand-side of the operator included a subquery, like `a IN ('foo', (SELECT s FROM t), 'bar')`.
TFTR! bors r+ |
Build succeeded: |
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from 9b36827 to blathers/backport-release-20.2-69651: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 20.2.x failed. See errors above. error creating merge commit from 9b36827 to blathers/backport-release-21.1-69651: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 21.1.x failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
The vectorized implementation of an
element IN tuple
expressionassumes that the contents of
tuple
are sorted by the optimizer. Basedon this assumption, it performs a binary search instead of a linear
search.
However, the assumption that the optimizer sorts all tuples is
incorrect. For example, there are cases where the contents of a tuple
are not known at planning-time, so the tuple cannot be sorted.
Performing a binary search with an unsorted tuple causes incorrect query
results.
Now, the vectorized engine sorts tuple contents if they are not already
sorted.
Fixes #68979
Release justification: This commit fixes a bug with the IN operator that
causes incorrect results.
Release note (bug fix): A bug has been fixed which caused incorrect
evaluation of the
IN
operator when the tuple on the right-hand-sideof the operator included a subquery, like
a IN ('foo', (SELECT s FROM t), 'bar')
.