Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release-21.1: colexec: fix IN operator with unsorted tuple #69979

Closed

Conversation

mgartner
Copy link
Collaborator

@mgartner mgartner commented Sep 9, 2021

Backport 1/1 commits from #69651.

/cc @cockroachdb/release

Release justifcation: This commit fixes a critical correctness bug in
the vectorized execution engine.


The vectorized implementation of an element IN tuple expression
assumes that the contents of tuple are sorted by the optimizer. Based
on this assumption, it performs a binary search instead of a linear
search.

However, the assumption that the optimizer sorts all tuples is
incorrect. For example, there are cases where the contents of a tuple
are not known at planning-time, so the tuple cannot be sorted.
Performing a binary search with an unsorted tuple causes incorrect query
results.

Now, the vectorized engine sorts tuple contents if they are not already
sorted.

Fixes #68979

Release justification: This commit fixes a bug with the IN operator that
causes incorrect results.

Release note (bug fix): A bug has been fixed which caused incorrect
evaluation of the IN operator when the tuple on the right-hand-side
of the operator included a subquery, like
a IN ('foo', (SELECT s FROM t), 'bar').

The vectorized implementation of an `element IN tuple` expression
assumes that the contents of `tuple` are sorted by the optimizer. Based
on this assumption, it performs a binary search instead of a linear
search.

However, the assumption that the optimizer sorts all tuples is
incorrect. For example, there are cases where the contents of a tuple
are not known at planning-time, so the tuple cannot be sorted.
Performing a binary search with an unsorted tuple causes incorrect query
results.

Now, the vectorized engine sorts tuple contents if they are not already
sorted.

Fixes cockroachdb#68979

Release justification: This commit fixes a bug with the IN operator that
causes incorrect results.

Release note (bug fix): A bug has been fixed which caused incorrect
evaluation of the `IN` operator when the tuple on the right-hand-side
of the operator included a subquery, like
`a IN ('foo', (SELECT s FROM t), 'bar')`.
@mgartner mgartner requested a review from yuzefovich September 9, 2021 18:24
@blathers-crl
Copy link

blathers-crl bot commented Sep 9, 2021

Thanks for opening a backport.

Please check the backport criteria before merging:

  • Patches should only be created for serious issues.
  • Patches should not break backwards-compatibility.
  • Patches should change as little code as possible.
  • Patches should not change on-disk formats or node communication protocols.
  • Patches should not add new functionality.
  • Patches must not add, edit, or otherwise modify cluster versions; or add version gates.
If some of the basic criteria cannot be satisfied, ensure that the exceptional criteria are satisfied within.
  • There is a high priority need for the functionality that cannot wait until the next release and is difficult to address in another way.
  • The new functionality is additive-only and only runs for clusters which have specifically “opted in” to it (e.g. by a cluster setting).
  • New code is protected by a conditional check that is trivial to verify and ensures that it only runs for opt-in clusters.
  • The PM and TL on the team that owns the changed code have signed off that the change obeys the above rules.

Add a brief release justification to the body of your PR to justify this backport.

Some other things to consider:

  • What did we do to ensure that a user that doesn’t know & care about this backport, has no idea that it happened?
  • Will this work in a cluster of mixed patch versions? Did we test that?
  • If a user upgrades a patch version, uses this feature, and then downgrades, what happens?

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 6 of 6 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner)

@mgartner
Copy link
Collaborator Author

mgartner commented Oct 6, 2021

@yuzefovich The build is failing with:

../../sql/logictest/testdata/logic_test/vectorize:1218: SELECT 'b' IN ('b', (SELECT NULL FROM t68979), 'a') FROM t68979
expected success, but found
(XXUUU) unable to vectorize execution plan: unable to columnarize render expression "'b' IN ('b', NULL, 'a')": couldn't find overload for string IN record
proj_const_left_ops.eg.go:30685: in GetProjectionLConstOperator()
E210909 19:02:40.586062 3151702 sql/logictest/logic.go:3538  [-] 101
E210909 19:02:40.586062 3151702 sql/logictest/logic.go:3538  [-] 101 +pq: unable to vectorize execution plan: unable to columnarize render expression "'b' IN ('b', NULL, 'a')": couldn't find overload for string IN record

I think this means that the bug is not reproducible on 21.1. In that case, do we need to backport this fix?

@yuzefovich
Copy link
Member

Hm, I'm not sure - it does sound like the bug is not reproducible (i.e. we will fallback to the row-by-row engine in all cases where the DTuple might not be sorted already). Since we haven't had a user complain, I guess I would lean on the safer side and not backport then.

@mgartner
Copy link
Collaborator Author

mgartner commented Oct 7, 2021

SGTM. Closing.

@mgartner mgartner closed this Oct 7, 2021
@mgartner mgartner deleted the backport21.1-69651 branch October 7, 2021 16:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants