Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shortcut counts on exists queries #37475

Closed
jpountz opened this issue Jan 15, 2019 · 5 comments · Fixed by #39570
Closed

Shortcut counts on exists queries #37475

jpountz opened this issue Jan 15, 2019 · 5 comments · Fixed by #39570
Assignees
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories

Comments

@jpountz
Copy link
Contributor

jpountz commented Jan 15, 2019

TopDocsCollectorContext is already able to shortcut hit counts on match_all and term queries when there are no deletions. It would be nice to also shortcut exists queries, especially because running a match_all query on an index that has nested documents runs a DocValueFieldExists query under the hood on the _seq_no field.

This would only work on indices without deletions and fields that are indexed: fields indexed with terms can use Terms#getDocCount and fields indexed with points can use PointValues#getDocCount.

@jpountz jpountz added >enhancement :Search/Search Search-related issues that do not fall into other categories labels Jan 15, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search

@cbuescher cbuescher self-assigned this Feb 1, 2019
@cbuescher
Copy link
Member

cbuescher commented Mar 1, 2019

@jpountz working on this, while adding tests to QueryPhaseTests, I discovered the following section in Lucenes IndexSearchers#count method: https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java#L339
This covers the MatchAllDocsQuery and TermQuery in much the same way as TopDocsCollectorContext#shortcutTotalHitCount but could also benefit from adding a section handling the DocValuesFieldExistsQuery, wdyt?

@jpountz
Copy link
Contributor Author

jpountz commented Mar 1, 2019

Unfortunately this optimization is unsafe for Lucene because Lucene doesn't enforce that you are indexing the same data in an indexed and a doc-value field that share the same name. This is something that Elasticsearch enforces however: if a field is indexed and doc-valued, then indexed documents are exactly the same as doc-valued documents.

@cbuescher
Copy link
Member

Thanks, thats unfortunate. Do we wrap or overwrite Lucenes IndexSearch in ES in a consistent way somewhere so we could add this optimization only on our side?

@jpountz
Copy link
Contributor Author

jpountz commented Mar 1, 2019

@cbuescher We never use IndexSearcher#count to my knowlegde (maybe we should make it a forbidden API in a separate change?). Counting only occurs in TopDocsCollectorContext#shortcutTotalHitCount.

cbuescher pushed a commit that referenced this issue Mar 4, 2019
`TopDocsCollectorContext` can already shortcut hit counts on `match_all` and `term` queries when there are no deletions. 
This change adds this ability for `exists` queries if the index doesn't have deletions and fields are indexed.

Closes #37475
cbuescher pushed a commit to cbuescher/elasticsearch that referenced this issue Mar 4, 2019
`TopDocsCollectorContext` can already shortcut hit counts on `match_all` and `term` queries when there are no deletions. 
This change adds this ability for `exists` queries if the index doesn't have deletions and fields are indexed.

Closes elastic#37475
cbuescher pushed a commit that referenced this issue Mar 4, 2019
`TopDocsCollectorContext` can already shortcut hit counts on `match_all` and `term` queries when there are no deletions. 
This change adds this ability for `exists` queries if the index doesn't have deletions and fields are indexed.

Closes #37475
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants