Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dfs_query_then_fetch can cause serialization errors with 6.x nodes #75349

Closed
jtibshirani opened this issue Jul 14, 2021 · 6 comments
Closed

dfs_query_then_fetch can cause serialization errors with 6.x nodes #75349

jtibshirani opened this issue Jul 14, 2021 · 6 comments
Labels
>bug :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team

Comments

@jtibshirani
Copy link
Contributor

jtibshirani commented Jul 14, 2021

In a mixed 6.x and 7.x cluster, a search that uses dfs_query_then_fetch can cause a transport serialization error:

[2021-07-14T14:48:54,223][DEBUG][o.e.a.s.TransportSearchAction] [v6.8.11-2] [16] Failed to execute query phase
 org.elasticsearch.transport.RemoteTransportException: [v6.8.11-0][127.0.0.1:50319][indices:data/read/search[phase/query/id]]
Caused by: java.lang.IllegalArgumentException: totalTermFreq must be positive, totalTermFreq: -1
      at org.apache.lucene.search.TermStatistics.<init>(TermStatistics.java:70) ~[lucene-core-7.7.3.jar:7.7.3 1a0d2a901dfec93676b0fe8be425101ceb754b85 - noble - 2020-04-21 10:31:55]
      at org.elasticsearch.search.dfs.AggregatedDfs.<init>(AggregatedDfs.java:37) ~[elasticsearch-6.8.11.jar:6.8.11]
      at org.elasticsearch.search.query.QuerySearchRequest.<init>(QuerySearchRequest.java:48) ~[elasticsearch-6.8.11.jar:6.8.11]
      at org.elasticsearch.transport.RequestHandlerRegistry.newRequest(RequestHandlerRegistry.java:48) ~[elasticsearch-6.8.

This seems related to https://issues.apache.org/jira/browse/LUCENE-8007, which was introduced in Lucene 8 and adds stricter checks to TermStatistics.

I was able to reproduce this with our rolling upgrade tests. A rough example:

  1. In the old 6.8 cluster, create a couple documents with a keyword value like "field": "some_value".
  2. In a mixed 6.8 and 7.x cluster, perform a search on the field:
    GET /index/_search?search_type=dfs_query_then_fetch
    {
      "query": {
        "match": {
          "field": {
            "query": "some_value"
          }
        }
      }
    }
    

This search can then fail with the same serialization error.

@jtibshirani jtibshirani added >bug :Search/Search Search-related issues that do not fall into other categories labels Jul 14, 2021
@elasticmachine elasticmachine added the Team:Search Meta label for search team label Jul 14, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@jpountz
Copy link
Contributor

jpountz commented Jul 15, 2021

Oh good catch and sorry for not thinking about this when merging this Lucene change!

@163satish
Copy link

Waiting for this fix

ywelsch added a commit that referenced this issue Jul 28, 2021
In a mixed 6.x and 7.x cluster, a search that uses dfs_query_then_fetch can cause a transport serialization errors.

This is related to https://issues.apache.org/jira/browse/LUCENE-8007, which was introduced in Lucene 8 and adds stricter checks to TermStatistics and CollectionStatistics, and https://issues.apache.org/jira/browse/LUCENE-8020, which was introduced in Lucene 8 and avoids bogus term stats (e.g. docfreq=0).

Co-authored-by: Julie Tibshirani [email protected]

Closes #75349
ywelsch added a commit that referenced this issue Jul 28, 2021
In a mixed 6.x and 7.x cluster, a search that uses dfs_query_then_fetch can cause a transport serialization errors.

This is related to https://issues.apache.org/jira/browse/LUCENE-8007, which was introduced in Lucene 8 and adds stricter checks to TermStatistics and CollectionStatistics, and https://issues.apache.org/jira/browse/LUCENE-8020, which was introduced in Lucene 8 and avoids bogus term stats (e.g. docfreq=0).

Co-authored-by: Julie Tibshirani [email protected]

Closes #75349
@ywelsch
Copy link
Contributor

ywelsch commented Jul 28, 2021

Fixed by #75735

@ywelsch ywelsch closed this as completed Jul 28, 2021
@nathandh22
Copy link
Contributor

nathandh22 commented Aug 4, 2021

FYI, @ywelsch. *** raised some concerns that the Bug and PR tickets were not mentioned in our Elasticsearch version 7.14.0 release notes, but I have pointed out that while this seems to be true, the code changes do appear in the 7.14.0 codebase.

@ywelsch
Copy link
Contributor

ywelsch commented Aug 5, 2021

@nathandh22 I've redacted your post, as you shouldn't call out specific customers in public issues.

As the fix (#75735) got merged very late in the 7.14.0 release cycle, I suspect that it wasn't properly picked up by the doc release for 7.14.0 in #75873. @probakowski can you check what went wrong there?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team
Projects
None yet
Development

No branches or pull requests

6 participants