Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use index sort range query when possible. #56657

Merged
merged 5 commits into from
May 13, 2020

Conversation

jtibshirani
Copy link
Contributor

@jtibshirani jtibshirani commented May 13, 2020

This PR proposes to use IndexSortSortedNumericDocValuesRangeQuery when possible to speed up certain range queries. Points-based queries are already very efficient, the only time this query makes a difference is when the range matches a large number of documents.

Some notes:

  • The optimization is only applied for fields of type date, integer, and long. I found that the query implementation isn't yet suited for double or float types (I will follow up with a Lucene issue).
  • Before applying the query, we check that the index is sorted on the query field. This isn't strictly necessary, since the query itself checks this as part of its execution. But it seemed nice to avoid wrapping the query unnecessarily -- it makes debugging easier, like when reading search profile results.

Below are benchmark results on the http-logs dataset. The following ranges were run against the logs-241998 index:

range-small (897633930, 897655999]: ~2M docs
range-medium (897623930, 897655999]: ~5M docs
range-large (897259801, 897503930]: ~21M docs

| Metric                       |         Task |    Baseline |   Contender |     Diff |   Unit |
|-----------------------------:|-------------:|------------:|------------:|---------:|-------:|
| 50th percentile service time |  range-small |     11.0228 |     8.19478 | -2.82797 |     ms |
| 95th percentile service time |  range-small |     11.8153 |     9.06257 | -2.75274 |     ms |
| 50th percentile service time | range-medium |     22.8912 |     7.23264 | -15.6585 |     ms |
| 95th percentile service time | range-medium |     25.0957 |     7.93246 | -17.1632 |     ms |
| 50th percentile service time |  range-large |     39.7224 |     6.34589 | -33.3765 |     ms |
| 95th percentile service time |  range-large |     43.9104 |     7.06604 | -36.8444 |     ms |

Relates to #48665.

@jtibshirani jtibshirani added >enhancement :Search/Search Search-related issues that do not fall into other categories labels May 13, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Search)

@elasticmachine elasticmachine added the Team:Search Meta label for search team label May 13, 2020
Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jtibshirani
Copy link
Contributor Author

Thanks @jimczi for the review.

@jtibshirani jtibshirani merged commit 7b34e22 into elastic:master May 13, 2020
@jtibshirani jtibshirani deleted the index-sort-range-query branch May 13, 2020 18:34
jimczi added a commit that referenced this pull request Sep 1, 2020
This change reverts the optimization added in #56657.
We found a bug in `IndexSortSortedNumericDocValuesRangeQuery` that can fail the entire shard search request so this commit removes the optimization and restore the old behavior (range query on points) in this release branch.

Relates #61766
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team v7.9.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants