Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: out of range routingIndex is produced in inverted filterer #63180

Closed
angelazxu opened this issue Apr 6, 2021 · 9 comments · Fixed by #63574
Closed

sql: out of range routingIndex is produced in inverted filterer #63180

angelazxu opened this issue Apr 6, 2021 · 9 comments · Fixed by #63574
Labels
A-sql-execution Relating to SQL execution. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. GA-blocker

Comments

@angelazxu
Copy link
Contributor

Describe the problem

While running some logic tests for inverted index support for fetch/contains expressions (#63048), the following query leads to an index out of range error:

SELECT j FROM f@i WHERE j->'a' @> '"b"' AND j->'a' <@ '["b", "c", "d", "e"]' ORDER BY k

This is due to the b.routingIndex in prepareAddIndexRow being set to -1, (line 521, inverted_expr_evaluator.go). This is a quick fix, however I'm not sure if this issue could be reproduced on the current master and the fix would need to be backported.

@angelazxu angelazxu added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-sql-execution Relating to SQL execution. labels Apr 6, 2021
@angelazxu
Copy link
Contributor Author

@sumeerbhola I'm not too familiar with how the execution engine works, do you think this would need to be backported / if this bug could happen in other queries?

@yuzefovich
Copy link
Member

Angela and I debugged this a little, and we think that the problem might be here:

if _, err = ifr.invertedEval.prepareAddIndexRow(enc, nil /* encFull */); err != nil {

prepareAddIndexRow as the first parameter returns a boolean indicating whether addIndexRow should be called, but in the link above we're ignoring it. I think this is very suspicious.

@yuzefovich
Copy link
Member

I opened up #63196 that prototypes the fix as I understand it, but it is missing a regression test (I couldn't repro on master what Angela saw on her branch; possibly the issue is only present because of the improvements by Angela recently). I'd appreciate some help with coming up with the test.

@yuzefovich
Copy link
Member

@angelazxu could you please find a reproduction of this problem on master (i.e. without the changes in your PR)? I believe the bug can be exposed pretty easily, but I don't have the relevant context. Quite possibly modifying some existing query/data in inverted_index logic test file should be sufficient.

@yuzefovich
Copy link
Member

yuzefovich commented Apr 9, 2021

@sumeerbhola do you think this issue should be a GA-blocker? I believe the impact is that if the flow is vectorized, an internal error might be encountered, if the flow is row-based, then the node might crash; but I don't know the likelihood of hitting this problem. cc @rytaft

@yuzefovich
Copy link
Member

The current hypothesis proposed by Becca in #63048 (review) is that invalid JSON encoding could have been the root cause for the panic, and it aligns with what Sumeer wrote in #63196 (review). So we'll treat this issue as a non-blocker for now.

@rytaft
Copy link
Collaborator

rytaft commented Apr 13, 2021

Looks like my hypothesis that this was due to invalid JSON was wrong. I also managed to find a simple reproduction that doesn't require any of @angelazxu's new stuff, indicating that this bug exists in prior releases:

CREATE TABLE f (
  j JSON,
  INVERTED INDEX i (j)
);

INSERT INTO f VALUES ('{"a": "b", "x": "y"}}');

SELECT j FROM f@i WHERE j @> '{"a": "c"}' AND (j @> '{"a": "b"}' OR j @> '{"a": "c"}');

I think this can be explained by the fact that there are more spans scanned from the inverted index than are needed by the inverted filterer, as shown here:

> EXPLAIN (opt) SELECT j FROM f@i WHERE j @> '{"a": "c"}' AND (j @> '{"a": "b"}' OR j @> '{"a": "c"}');
                                         info
--------------------------------------------------------------------------------------
  index-join f
   └── inverted-filter
        ├── inverted expression: /5
        │    ├── tight: true, unique: false
        │    └── union spans: ["7a\x00\x01\x12c\x00\x01", "7a\x00\x01\x12c\x00\x01"]
        └── scan f@i
             ├── inverted constraint: /5/1
             │    └── spans
             │         ├── ["7a\x00\x01\x12b\x00\x01", "7a\x00\x01\x12b\x00\x01"]
             │         └── ["7a\x00\x01\x12c\x00\x01", "7a\x00\x01\x12c\x00\x01"]
             └── flags: force-index=i

I'm guessing the fix should actually be in sql/inverted/expression.go to ensure that the extra span is removed from SpansToRead. I'll work on a fix.

@rytaft
Copy link
Collaborator

rytaft commented Apr 13, 2021

I'm going to add the GA-Blocker label since I think 21.1 is the first release that we are using invertedFilterer and invertedJoiner for JSON and Array expressions.

@blathers-crl
Copy link

blathers-crl bot commented Apr 13, 2021

Hi @rytaft, please add branch-* labels to identify which branch(es) this release-blocker affects.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

rytaft added a commit to rytaft/cockroach that referenced this issue Apr 13, 2021
Prior to this commit, it was possible that the SpansToRead in
an inverted.SpanExpression were a superset of the spans that were
needed to evaluate the span expression. This could result in an index
out of range error in the invertedFilterer when a span from the input
didn't exist in its list of needed spans. This commit fixes the
problem by ensuring that SpansToRead does not contain spans that are
not needed to evaluate the span expression.

Fixes cockroachdb#63180

Release note (bug fix): Fixed an internal error that could occur
when executing queries using an inverted index. The error was an index
out of range error, and could occur in rare cases when a filter or
join predicate contained at least two JSON, Array, Geometry or
Geography expressions that were combined with AND. This has now been
fixed.
craig bot pushed a commit that referenced this issue Apr 17, 2021
63574: sql: fix index out of range error in inverted filterer r=rytaft a=rytaft

Prior to this commit, it was possible that the `SpansToRead` in
an `inverted.SpanExpression` were a superset of the spans that were
needed to evaluate the span expression. This could result in an index
out of range error in the `invertedFilterer` when a span from the input
didn't exist in its list of needed spans. This commit fixes the
problem by ensuring that `SpansToRead` does not contain spans that are
not needed to evaluate the span expression.

Fixes #63180

Release note (bug fix): Fixed an internal error that could occur
when executing queries using an inverted index. The error was an index
out of range error, and could occur in rare cases when a filter or
join predicate contained at least two JSON, Array, Geometry or
Geography expressions that were combined with AND. This has now been
fixed.

Co-authored-by: Rebecca Taft <[email protected]>
rytaft added a commit to rytaft/cockroach that referenced this issue Apr 17, 2021
Prior to this commit, it was possible that the SpansToRead in
an inverted.SpanExpression were a superset of the spans that were
needed to evaluate the span expression. This could result in an index
out of range error in the invertedFilterer when a span from the input
didn't exist in its list of needed spans. This commit fixes the
problem by ensuring that SpansToRead does not contain spans that are
not needed to evaluate the span expression.

Fixes cockroachdb#63180

Release note (bug fix): Fixed an internal error that could occur
when executing queries using an inverted index. The error was an index
out of range error, and could occur in rare cases when a filter or
join predicate contained at least two JSON, Array, Geometry or
Geography expressions that were combined with AND. This has now been
fixed.
@craig craig bot closed this as completed in 7b985b6 Apr 17, 2021
rytaft added a commit to rytaft/cockroach that referenced this issue Apr 18, 2021
Prior to this commit, it was possible that the SpansToRead in
an inverted.SpanExpression were a superset of the spans that were
needed to evaluate the span expression. This could result in an index
out of range error in the invertedFilterer when a span from the input
didn't exist in its list of needed spans. This commit fixes the
problem by ensuring that SpansToRead does not contain spans that are
not needed to evaluate the span expression.

Fixes cockroachdb#63180

Release note (bug fix): Fixed an internal error that could occur
when executing queries using an inverted index. The error was an index
out of range error, and could occur in rare cases when a filter or
join predicate contained at least two JSON, Array, Geometry or
Geography expressions that were combined with AND. This has now been
fixed.
@mgartner mgartner moved this to Done in SQL Queries Jul 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-execution Relating to SQL execution. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. GA-blocker
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants