Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NPE using bool filter with span_multi fuzzy query #52894

Closed
Sharptsa opened this issue Feb 27, 2020 · 6 comments · Fixed by #53231
Closed

NPE using bool filter with span_multi fuzzy query #52894

Sharptsa opened this issue Feb 27, 2020 · 6 comments · Fixed by #53231
Assignees
Labels
>bug :Search/Search Search-related issues that do not fall into other categories

Comments

@Sharptsa
Copy link

Elasticsearch version (bin/elasticsearch --version): 6.8.5

Plugins installed: []

JVM version (java -version): openjdk version "13.0.1" 2019-10-15

OS version (uname -a if on a Unix-like system): Linux 1f006eaddcb8 4.15.0-88-generic #88-Ubuntu SMP Tue Feb 11 20:11:34 UTC 2020 x86_64 x86_64 x86_64 GNU/Linu

Issue:

The following query gives me a null pointer exception :

{
  "query": {
    "bool": {
      "filter": [
        {
          "span_multi": {
            "match": {
              "fuzzy": {
                "documents.content": {
                  "value": "cardiau"
                }
              }
            }
          }
        }
      ]
    }
  }
}

When using must instead of filter, no more exception.

No exception either when using a wildcard or a prefix instead of the fuzzy query.

Logs:

[2020-02-27T16:44:45,866][WARN ][r.suppressed             ] [elasticsearch-3] path: /stay/_search, params: {pretty=, index=stay}
org.elasticsearch.action.search.SearchPhaseExecutionException: all shards failed
  at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:296) [elasticsearch-6.8.5.jar:6.8.5]
  at org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:133) [elasticsearch-6.8.5.jar:6.8.5]
  at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:259) [elasticsearch-6.8.5.jar:6.8.5]
  at org.elasticsearch.action.search.InitialSearchPhase.onShardFailure(InitialSearchPhase.java:100) [elasticsearch-6.8.5.jar:6.8.5]
  at org.elasticsearch.action.search.InitialSearchPhase.access$100(InitialSearchPhase.java:48) [elasticsearch-6.8.5.jar:6.8.5]
  at org.elasticsearch.action.search.InitialSearchPhase$2.lambda$onFailure$1(InitialSearchPhase.java:220) [elasticsearch-6.8.5.jar:6.8.5]
  at org.elasticsearch.action.search.InitialSearchPhase.maybeFork(InitialSearchPhase.java:174) [elasticsearch-6.8.5.jar:6.8.5]
  at org.elasticsearch.action.search.InitialSearchPhase.access$000(InitialSearchPhase.java:48) [elasticsearch-6.8.5.jar:6.8.5]
  at org.elasticsearch.action.search.InitialSearchPhase$2.onFailure(InitialSearchPhase.java:220) [elasticsearch-6.8.5.jar:6.8.5]
  at org.elasticsearch.action.search.SearchExecutionStatsCollector.onFailure(SearchExecutionStatsCollector.java:73) [elasticsearch-6.8.5.jar:6.8.5]
  at org.elasticsearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:59) [elasticsearch-6.8.5.jar:6.8.5]
  at org.elasticsearch.action.search.SearchTransportService$ConnectionCountingHandler.handleException(SearchTransportService.java:463) [elasticsearch-6.8.5.jar:6.8.5]
  at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1114) [elasticsearch-6.8.5.jar:6.8.5]
  at org.elasticsearch.transport.TransportService$DirectResponseChannel.processException(TransportService.java:1226) [elasticsearch-6.8.5.jar:6.8.5]
  at org.elasticsearch.transport.TransportService$DirectResponseChannel.sendResponse(TransportService.java:1200) [elasticsearch-6.8.5.jar:6.8.5]
  at org.elasticsearch.transport.TaskTransportChannel.sendResponse(TaskTransportChannel.java:60) [elasticsearch-6.8.5.jar:6.8.5]
  at org.elasticsearch.action.support.ChannelActionListener.onFailure(ChannelActionListener.java:56) [elasticsearch-6.8.5.jar:6.8.5]
  at org.elasticsearch.search.SearchService$2.onFailure(SearchService.java:367) [elasticsearch-6.8.5.jar:6.8.5]
  at org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:361) [elasticsearch-6.8.5.jar:6.8.5]
  at org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:355) [elasticsearch-6.8.5.jar:6.8.5]
  at org.elasticsearch.search.SearchService$4.doRun(SearchService.java:1107) [elasticsearch-6.8.5.jar:6.8.5]
  at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.8.5.jar:6.8.5]
  at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:41) [elasticsearch-6.8.5.jar:6.8.5]
  at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:751) [elasticsearch-6.8.5.jar:6.8.5]
  at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.8.5.jar:6.8.5]
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
  at java.lang.Thread.run(Thread.java:830) [?:?]
Caused by: org.elasticsearch.ElasticsearchException$1
  at org.elasticsearch.ElasticsearchException.guessRootCauses(ElasticsearchException.java:657) ~[elasticsearch-6.8.5.jar:6.8.5]
  at org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:131) [elasticsearch-6.8.5.jar:6.8.5]
  ... 26 more
Caused by: java.lang.NullPointerException
  at org.apache.lucene.search.FuzzyTermsEnum.<init>(FuzzyTermsEnum.java:119) ~[lucene-core-7.7.2.jar:7.7.2 d4c30fc2856154f2c1fefc589eb7cd070a415b94 - janhoy - 2019-05-28 23:30:25]
  at org.apache.lucene.search.FuzzyQuery.getTermsEnum(FuzzyQuery.java:154) ~[lucene-core-7.7.2.jar:7.7.2 d4c30fc2856154f2c1fefc589eb7cd070a415b94 - janhoy - 2019-05-28 23:30:25]
  at org.apache.lucene.search.MultiTermQuery$RewriteMethod.getTermsEnum(MultiTermQuery.java:78) ~[lucene-core-7.7.2.jar:7.7.2 d4c30fc2856154f2c1fefc589eb7cd070a415b94 - janhoy - 2019-05-28 23:30:25]
  at org.elasticsearch.common.lucene.search.SpanBooleanQueryRewriteWithMaxClause$1.collectTerms(SpanBooleanQueryRewriteWithMaxClause.java:95) ~[elasticsearch-6.8.5.jar:6.8.5]
  at org.elasticsearch.common.lucene.search.SpanBooleanQueryRewriteWithMaxClause$1.rewrite(SpanBooleanQueryRewriteWithMaxClause.java:75) ~[elasticsearch-6.8.5.jar:6.8.5]
  at org.elasticsearch.common.lucene.search.SpanBooleanQueryRewriteWithMaxClause.rewrite(SpanBooleanQueryRewriteWithMaxClause.java:117) ~[elasticsearch-6.8.5.jar:6.8.5]
  at org.apache.lucene.search.spans.SpanMultiTermQueryWrapper.rewrite(SpanMultiTermQueryWrapper.java:121) ~[lucene-core-7.7.2.jar:7.7.2 d4c30fc2856154f2c1fefc589eb7cd070a415b94 - janhoy - 2019-05-28 23:30:25]
  at org.apache.lucene.search.ConstantScoreQuery.rewrite(ConstantScoreQuery.java:50) ~[lucene-core-7.7.2.jar:7.7.2 d4c30fc2856154f2c1fefc589eb7cd070a415b94 - janhoy - 2019-05-28 23:30:25]
  at org.apache.lucene.search.BoostQuery.rewrite(BoostQuery.java:81) ~[lucene-core-7.7.2.jar:7.7.2 d4c30fc2856154f2c1fefc589eb7cd070a415b94 - janhoy - 2019-05-28 23:30:25]
  at org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:246) ~[lucene-core-7.7.2.jar:7.7.2 d4c30fc2856154f2c1fefc589eb7cd070a415b94 - janhoy - 2019-05-28 23:30:25]
  at org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:686) ~[lucene-core-7.7.2.jar:7.7.2 d4c30fc2856154f2c1fefc589eb7cd070a415b94 - janhoy - 2019-05-28 23:30:25]
  at org.elasticsearch.search.internal.ContextIndexSearcher.rewrite(ContextIndexSearcher.java:106) ~[elasticsearch-6.8.5.jar:6.8.5]
  at org.elasticsearch.search.DefaultSearchContext.preProcess(DefaultSearchContext.java:263) ~[elasticsearch-6.8.5.jar:6.8.5]
  at org.elasticsearch.search.query.QueryPhase.preProcess(QueryPhase.java:91) ~[elasticsearch-6.8.5.jar:6.8.5]
  at org.elasticsearch.search.SearchService.createContext(SearchService.java:649) ~[elasticsearch-6.8.5.jar:6.8.5]
  at org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:596) ~[elasticsearch-6.8.5.jar:6.8.5]
  at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:387) ~[elasticsearch-6.8.5.jar:6.8.5]
  at org.elasticsearch.search.SearchService.access$100(SearchService.java:126) ~[elasticsearch-6.8.5.jar:6.8.5]
  at org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:359) ~[elasticsearch-6.8.5.jar:6.8.5]
  ... 9 more

@dliappis dliappis added the :Search/Search Search-related issues that do not fall into other categories label Feb 27, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Search)

@jimczi
Copy link
Contributor

jimczi commented Mar 5, 2020

@cbuescher could you take a look at this one ?

@cbuescher
Copy link
Member

cbuescher commented Mar 5, 2020

I took a look at the examples in #53118 and this and can reproduce the NPE on 6.8.7 locally with an index that contains at least one document containing the field that is queried by the fuzzy query, e.g:

DELETE test_index

POST /test_index/_doc?refresh
{"content":"foobarbaz"}

GET /test_index/_search
{
  "query": {
    "bool": {
      "filter": [
        {
          "span_multi": {
            "match": {
              "fuzzy": {
                "content": {
                  "value": "foobarbiz"
                }
              }
            }
          }
        }
      ]
    }
  }
}

This fails with the following stacktrace on 6.8.7, which is almost identical except for a missing org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:246) rewrite step that the above stacktrace contains but my reproduction doesn't.

org.elasticsearch.transport.RemoteTransportException: [ZGllhvq][127.0.0.1:9300][indices:data/read/search[phase/query]]
Caused by: java.lang.NullPointerException
	at org.apache.lucene.search.FuzzyTermsEnum.<init>(FuzzyTermsEnum.java:119) ~[lucene-core-7.7.2.jar:7.7.2 d4c30fc2856154f2c1fefc589eb7cd070a415b94 - janhoy - 2019-05-28 23:30:25]
	at org.apache.lucene.search.FuzzyQuery.getTermsEnum(FuzzyQuery.java:154) ~[lucene-core-7.7.2.jar:7.7.2 d4c30fc2856154f2c1fefc589eb7cd070a415b94 - janhoy - 2019-05-28 23:30:25]
	at org.apache.lucene.search.MultiTermQuery$RewriteMethod.getTermsEnum(MultiTermQuery.java:78) ~[lucene-core-7.7.2.jar:7.7.2 d4c30fc2856154f2c1fefc589eb7cd070a415b94 - janhoy - 2019-05-28 23:30:25]
	at org.elasticsearch.common.lucene.search.SpanBooleanQueryRewriteWithMaxClause$1.collectTerms(SpanBooleanQueryRewriteWithMaxClause.java:95) ~[elasticsearch-6.8.7.jar:6.8.7]
	at org.elasticsearch.common.lucene.search.SpanBooleanQueryRewriteWithMaxClause$1.rewrite(SpanBooleanQueryRewriteWithMaxClause.java:75) ~[elasticsearch-6.8.7.jar:6.8.7]
	at org.elasticsearch.common.lucene.search.SpanBooleanQueryRewriteWithMaxClause.rewrite(SpanBooleanQueryRewriteWithMaxClause.java:117) ~[elasticsearch-6.8.7.jar:6.8.7]
	at org.apache.lucene.search.spans.SpanMultiTermQueryWrapper.rewrite(SpanMultiTermQueryWrapper.java:121) ~[lucene-core-7.7.2.jar:7.7.2 d4c30fc2856154f2c1fefc589eb7cd070a415b94 - janhoy - 2019-05-28 23:30:25]
	at org.apache.lucene.search.ConstantScoreQuery.rewrite(ConstantScoreQuery.java:50) ~[lucene-core-7.7.2.jar:7.7.2 d4c30fc2856154f2c1fefc589eb7cd070a415b94 - janhoy - 2019-05-28 23:30:25]
	at org.apache.lucene.search.BoostQuery.rewrite(BoostQuery.java:81) ~[lucene-core-7.7.2.jar:7.7.2 d4c30fc2856154f2c1fefc589eb7cd070a415b94 - janhoy - 2019-05-28 23:30:25]
	at org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:686) ~[lucene-core-7.7.2.jar:7.7.2 d4c30fc2856154f2c1fefc589eb7cd070a415b94 - janhoy - 2019-05-28 23:30:25]
	at org.elasticsearch.search.internal.ContextIndexSearcher.rewrite(ContextIndexSearcher.java:106) ~[elasticsearch-6.8.7.jar:6.8.7]
	at org.elasticsearch.search.DefaultSearchContext.preProcess(DefaultSearchContext.java:263) ~[elasticsearch-6.8.7.jar:6.8.7]

The outer boolean query surrounding the span query seems to be relevant, removing it doesn't seem to trigger the same rewrite path and the NPE. However, the same example works on 7.0.0, returning one document as expected, so I believe the reported problem is restricted to 6.8 (and possibly earlier). I haven't been able to check for any differences in the code on the call path yet, but here is the code that triggers the NPE on 6.8.7:

The "atts" AttributeSource argument here seems to be null:
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/7.7.2/lucene/core/src/java/org/apache/lucene/search/FuzzyTermsEnum.java#L119

Which gets passed down as a "null" value from
https://github.com/elastic/elasticsearch/blob/v6.8.7/server/src/main/java/org/elasticsearch/common/lucene/search/SpanBooleanQueryRewriteWithMaxClause.java#L95

I'm not sure yet why this isn't triggered e.g. when not the span in e.g. a "must" clause in the outer boolean query, nor why this is not a problem in 7.x any more, since the code in SpanBooleanQueryRewriteWithMaxClause hasn't changed, but I guess the rewriting taking place is different in that version.

@cbuescher
Copy link
Member

On 7.0 the recursive rewrite in the IndexSearcher doesn't seem to use SpanBooleanQueryRewriteWithMaxClause but Lucenes TopTermsRewrite as far as I can see. Not sure why though.

@cbuescher
Copy link
Member

This is the call stack on 7.0, the Attribute source that's null in the 6.8 case gets set here by the using the TermCollectors attributes in https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.0.0/lucene/core/src/java/org/apache/lucene/search/TermCollectingRewrite.java#L58

Screenshot 2020-03-05 at 16 35 01

@cbuescher
Copy link
Member

I think I know why this fails in 6.8 inside a boolean filter now. The FuzzyQueryBuilder used to set the rewrite method in filter mode to "constant_score", which was later removed in 7.0 (https://github.com/elastic/elasticsearch/pull/35354/files#diff-82ebdac5b91f2d2f368f55eb1e738a39L329). The surrounding SpanMultiTermQueryBuilder#doToQuery then sees that in https://github.com/elastic/elasticsearch/blob/v6.8.7/server/src/main/java/org/elasticsearch/index/query/SpanMultiTermQueryBuilder.java#L169 and then uses SpanBooleanQueryRewriteWithMaxClause in the wrapper.

I was able to reproduce the same NPE on 7.x and master by forcing the fuzzy query rewrite method to "constant_score" like this:

GET /test_index/_search
{
  "query": {
    "bool": {
      "filter": [
        {
          "span_multi": {
            "match": {
              "fuzzy": {
                "content": {
                  "value": "foobarbiz",
                  "rewrite" : "constant_score"
                }
              }
            }
          }
        }
      ]
    }
  }
}

cbuescher pushed a commit to cbuescher/elasticsearch that referenced this issue Mar 6, 2020
Under certain circumstances SpanMultiTermQueryWrapper uses
SpanBooleanQueryRewriteWithMaxClause as its rewrite method, which in turn tries
to get a TermsEnum from the wrapped MultiTermQuery currently using a `null`
AttributeSource. While queries TermsQuery or subclasses of AutomatonQuery ignore
this argument, FuzzyQuery uses it to create a FuzzyTermsEnum which triggers an
NPE when the AttributeSource is not provided. This PR fixes this by supplying an
empty AttributeSource instead of a `null` value.

Closes elastic#52894
cbuescher pushed a commit that referenced this issue Mar 9, 2020
Under certain circumstances SpanMultiTermQueryWrapper uses
SpanBooleanQueryRewriteWithMaxClause as its rewrite method, which in turn tries
to get a TermsEnum from the wrapped MultiTermQuery currently using a `null`
AttributeSource. While queries TermsQuery or subclasses of AutomatonQuery ignore
this argument, FuzzyQuery uses it to create a FuzzyTermsEnum which triggers an
NPE when the AttributeSource is not provided. This PR fixes this by supplying an
empty AttributeSource instead of a `null` value.

Closes #52894
cbuescher pushed a commit that referenced this issue Mar 9, 2020
Under certain circumstances SpanMultiTermQueryWrapper uses
SpanBooleanQueryRewriteWithMaxClause as its rewrite method, which in turn tries
to get a TermsEnum from the wrapped MultiTermQuery currently using a `null`
AttributeSource. While queries TermsQuery or subclasses of AutomatonQuery ignore
this argument, FuzzyQuery uses it to create a FuzzyTermsEnum which triggers an
NPE when the AttributeSource is not provided. This PR fixes this by supplying an
empty AttributeSource instead of a `null` value.

Closes #52894
cbuescher pushed a commit that referenced this issue Mar 9, 2020
Under certain circumstances SpanMultiTermQueryWrapper uses
SpanBooleanQueryRewriteWithMaxClause as its rewrite method, which in turn tries
to get a TermsEnum from the wrapped MultiTermQuery currently using a `null`
AttributeSource. While queries TermsQuery or subclasses of AutomatonQuery ignore
this argument, FuzzyQuery uses it to create a FuzzyTermsEnum which triggers an
NPE when the AttributeSource is not provided. This PR fixes this by supplying an
empty AttributeSource instead of a `null` value.

Closes #52894
cbuescher pushed a commit that referenced this issue Mar 9, 2020
Under certain circumstances SpanMultiTermQueryWrapper uses
SpanBooleanQueryRewriteWithMaxClause as its rewrite method, which in turn tries
to get a TermsEnum from the wrapped MultiTermQuery currently using a `null`
AttributeSource. While queries TermsQuery or subclasses of AutomatonQuery ignore
this argument, FuzzyQuery uses it to create a FuzzyTermsEnum which triggers an
NPE when the AttributeSource is not provided. This PR fixes this by supplying an
empty AttributeSource instead of a `null` value.

Closes #52894
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search/Search Search-related issues that do not fall into other categories
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants