Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt, invertedexpr: inverted join may return incorrect results with additional filter #55648

Closed
rytaft opened this issue Oct 16, 2020 · 5 comments · Fixed by #55649
Closed

opt, invertedexpr: inverted join may return incorrect results with additional filter #55648

rytaft opened this issue Oct 16, 2020 · 5 comments · Fixed by #55649
Assignees
Labels
A-spatial Spatial work that is *not* related to builtins. A-sql-optimizer SQL logical planning and optimizations. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. S-0-visible-logical-error Database stores inconsistent data in some cases, or queries return invalid results silently.

Comments

@rytaft
Copy link
Collaborator

rytaft commented Oct 16, 2020

There is a bug in the inverted join logic which may cause some rows to be incorrectly omitted from the output if there is an additional scalar filter that is part of the join condition. For example:

statement ok
CREATE TABLE g (
  k INT PRIMARY KEY,
  geom GEOMETRY
)

statement ok
CREATE INVERTED INDEX foo_inv ON g(geom)

statement ok
INSERT INTO g VALUES
  (1, ST_MakePolygon('LINESTRING(0 0, 0 15, 15 15, 15 0, 0 0)'::geometry)),
  (2, ST_MakePolygon('LINESTRING(0 0, 0 2, 2 2, 2 0, 0 0)'::geometry))

# This query performs an inverted join with an additional filter.
query TTT
EXPLAIN SELECT g1.k, g2.k FROM g@foo_inv AS g1, g@primary AS g2
WHERE ST_Contains(g1.geom, g2.geom)
  AND ST_Contains(g1.geom, ST_MakePolygon('LINESTRING(0 0, 0 5, 5 5, 5 0, 0 0)'::geometry))
  AND g2.k < 20
ORDER BY g1.k, g2.k
----
·                        distribution           local
·                        vectorized             true
sort                     ·                      ·
 │                       order                  +k,+k
 └── lookup join         ·                      ·
      │                  table                  g@primary
      │                  equality               (k) = (k)
      │                  equality cols are key  ·
      │                  pred                   st_contains(geom, geom) AND st_contains(geom, '010300000001000000050000000000000000000000000000000000000000000000000000000000000000001440000000000000144000000000000014400000000000001440000000000000000000000000000000000000000000000000')
      └── inverted join  ·                      ·
           │             table                  g@foo_inv
           └── scan      ·                      ·
·                        missing stats          ·
·                        table                  g@primary
·                        spans                  [ - /19]

# This query performs a cross join followed by a filter.
query TTT
EXPLAIN SELECT g1.k, g2.k FROM g@primary AS g1, g@primary AS g2
WHERE ST_Contains(g1.geom, g2.geom)
  AND ST_Contains(g1.geom, ST_MakePolygon('LINESTRING(0 0, 0 5, 5 5, 5 0, 0 0)'::geometry))
  AND g2.k < 20
ORDER BY g1.k, g2.k
----
·                    distribution   local
·                    vectorized     true
sort                 ·              ·
 │                   order          +k,+k
 └── cross join      ·              ·
      │              pred           st_contains(geom, geom)
      ├── scan       ·              ·
      │              missing stats  ·
      │              table          g@primary
      │              spans          [ - /19]
      └── filter     ·              ·
           │         filter         st_contains(geom, '010300000001000000050000000000000000000000000000000000000000000000000000000000000000001440000000000000144000000000000014400000000000001440000000000000000000000000000000000000000000000000')
           └── scan  ·              ·
·                    missing stats  ·
·                    table          g@primary
·                    spans          FULL SCAN

# This query performs an inverted join with an additional filter.
query II
SELECT g1.k, g2.k FROM g@foo_inv AS g1, g@primary AS g2
WHERE ST_Contains(g1.geom, g2.geom)
  AND ST_Contains(g1.geom, ST_MakePolygon('LINESTRING(0 0, 0 5, 5 5, 5 0, 0 0)'::geometry))
  AND g2.k < 20
ORDER BY g1.k, g2.k
----
1  1

# This query performs a cross join followed by a filter.
query II
SELECT g1.k, g2.k FROM g@primary AS g1, g@primary AS g2
WHERE ST_Contains(g1.geom, g2.geom)
  AND ST_Contains(g1.geom, ST_MakePolygon('LINESTRING(0 0, 0 5, 5 5, 5 0, 0 0)'::geometry))
  AND g2.k < 20
ORDER BY g1.k, g2.k
----
1  1
1  2

The two above queries should return exactly the same results, since the only difference is that they use different indexes. However, clearly the results are different.

@rytaft rytaft added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-sql-optimizer SQL logical planning and optimizations. S-0-visible-logical-error Database stores inconsistent data in some cases, or queries return invalid results silently. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. A-spatial Spatial work that is *not* related to builtins. labels Oct 16, 2020
@rytaft rytaft self-assigned this Oct 16, 2020
@blathers-crl
Copy link

blathers-crl bot commented Oct 16, 2020

Hi @rytaft, please add branch-* labels to identify which branch(es) this release-blocker affects.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

4 similar comments
@blathers-crl
Copy link

blathers-crl bot commented Oct 16, 2020

Hi @rytaft, please add branch-* labels to identify which branch(es) this release-blocker affects.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@blathers-crl
Copy link

blathers-crl bot commented Oct 16, 2020

Hi @rytaft, please add branch-* labels to identify which branch(es) this release-blocker affects.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@blathers-crl
Copy link

blathers-crl bot commented Oct 16, 2020

Hi @rytaft, please add branch-* labels to identify which branch(es) this release-blocker affects.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@blathers-crl
Copy link

blathers-crl bot commented Oct 16, 2020

Hi @rytaft, please add branch-* labels to identify which branch(es) this release-blocker affects.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

rytaft added a commit to rytaft/cockroach that referenced this issue Oct 16, 2020
Prior to this commit, it was possible for an inverted join to return
incorrect results if it had an additional scalar filter as part of the
join condition. For example, a join condition such as

  ST_Contains(g1.geom, g2.geom)
  AND ST_Contains(g1.geom, ST_MakePolygon('LINESTRING(0 0, 0 5, 5 5, 0 0)'))

could be susceptible to this issue, because we pre-compute the SpanExpression
for the scalar filter and store it for re-use. The problem was that the
pre-computed SpanExpression was being modified and then reused, leading to
memory corruption.

This commit fixes the problem by making a copy of the pre-computed
SpanExpression before using it and modifying it.

Fixes cockroachdb#55648

Release note (bug fix): Fixed a bug that could occur for spatial queries
involving a join between two spatial columns, when there was an additional
filter on one of the spatial columns, and that column also had an inverted
index defined. This bug could cause incorrect results to be returned, in
which some rows were omitted from the output that should have been included.
craig bot pushed a commit that referenced this issue Oct 18, 2020
55649: opt,invertedexpr: fix memory corruption issue in inverted join r=rytaft a=rytaft

Prior to this commit, it was possible for an inverted join to return
incorrect results if it had an additional scalar filter as part of the
join condition. For example, a join condition such as
```
  ST_Contains(g1.geom, g2.geom)
  AND ST_Contains(g1.geom, ST_MakePolygon('LINESTRING(0 0, 0 5, 5 5, 0 0)'))
```
could be susceptible to this issue, because we pre-compute the `SpanExpression`
for the scalar filter and store it for re-use. The problem was that the
pre-computed `SpanExpression` was being modified and then reused, leading to
memory corruption.

This commit fixes the problem by making a copy of the pre-computed
`SpanExpression` before using it and modifying it.

Fixes #55648

Release note (bug fix): Fixed a bug that could occur for spatial queries
involving a join between two spatial columns, when there was an additional
filter on one of the spatial columns, and that column also had an inverted
index defined. This bug could cause incorrect results to be returned, in
which some rows were omitted from the output that should have been included.

Co-authored-by: Rebecca Taft <[email protected]>
@craig craig bot closed this as completed in 327aa58 Oct 19, 2020
rytaft added a commit to rytaft/cockroach that referenced this issue Oct 19, 2020
Prior to this commit, it was possible for an inverted join to return
incorrect results if it had an additional scalar filter as part of the
join condition. For example, a join condition such as

  ST_Contains(g1.geom, g2.geom)
  AND ST_Contains(g1.geom, ST_MakePolygon('LINESTRING(0 0, 0 5, 5 5, 0 0)'))

could be susceptible to this issue, because we pre-compute the SpanExpression
for the scalar filter and store it for re-use. The problem was that the
pre-computed SpanExpression was being modified and then reused, leading to
memory corruption.

This commit fixes the problem by making a copy of the pre-computed
SpanExpression before using it and modifying it.

Fixes cockroachdb#55648

Release note (bug fix): Fixed a bug that could occur for spatial queries
involving a join between two spatial columns, when there was an additional
filter on one of the spatial columns, and that column also had an inverted
index defined. This bug could cause incorrect results to be returned, in
which some rows were omitted from the output that should have been included.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-spatial Spatial work that is *not* related to builtins. A-sql-optimizer SQL logical planning and optimizations. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. S-0-visible-logical-error Database stores inconsistent data in some cases, or queries return invalid results silently.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant