Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize lone single bucket date_histogram #71180

Merged
merged 16 commits into from
May 12, 2021

Conversation

nik9000
Copy link
Member

@nik9000 nik9000 commented Apr 1, 2021

This optimizes the date_histogram agg when there is a single bucket
and no sub-aggregations. We expect this to happen from time to time when
the buckets are larger than a day because folks often use "daily"
indices.

This was already fairly fast, but using the metadata makes it 10x
faster. Something like 98ms becomes 7.5ms. Nice if you can get it!

Like #69377 this optimization will disable itself if you have document
level security enabled or are querying a rollup index. Also like #69377
it won't do anything if there is a top level query.

nik9000 added 3 commits March 25, 2021 17:00
This restores SQL's test for fetching `half_floats` after we backported
the precision change in that fetch (elastic#70653)
This optimizes the `date_histogram` agg when there is a single bucket
and no sub-aggregations. We expect this to happen from time to time when
the buckets are larger than a day because folks often use "daily"
indices.

This was already fairly fast, but using the metadata makes it 10x
faster. Something like 98ms becomes 7.5ms. Nice if you can get it!

Like elastic#69377 this optimization will disable itself if you have document
level security enabled or are querying a rollup index. Also like elastic#69377
it won't do anything if there is a top level query.
@nik9000
Copy link
Member Author

nik9000 commented Apr 1, 2021

|                Min Throughput |     5.0 |     5.00816 |  0.00372 |  ops/s |
|               Mean Throughput |     5.0 |     5.01744 |  0.00769 |  ops/s |
|             Median Throughput |     5.0 |     5.01392 |  0.00616 |  ops/s |
|                Max Throughput |     5.0 |     5.04775 |  0.02114 |  ops/s |
|       50th percentile latency |    92.8 |     7.15122 | -85.7464 |     ms |
|       90th percentile latency |    99.5 |     8.68909 | -90.8784 |     ms |
|       99th percentile latency |   111.8 |     10.6733 | -101.137 |     ms |
|      100th percentile latency |   132.6 |     20.7481 | -111.862 |     ms |
|  50th percentile service time |    91.7 |     5.96916 | -85.8158 |     ms |
|  90th percentile service time |    98.3 |     7.49015 | -90.8677 |     ms |
|  99th percentile service time |   110.8 |     9.28727 | -101.609 |     ms |
| 100th percentile service time |   131.1 |     19.7961 | -111.374 |     ms |
|                    error rate |      0  |           0 |        0 |      % |

@nik9000 nik9000 marked this pull request as ready for review April 14, 2021 20:25
@elasticmachine elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Apr 14, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@nik9000
Copy link
Member Author

nik9000 commented Apr 15, 2021

run elasticsearch-ci/1

Copy link
Member

@not-napoleon not-napoleon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted, I think future us will appreciate a little more documentation, but otherwise this looks fine. +1

@@ -52,6 +55,15 @@
if (query instanceof TermQuery) {
return new TermQueryToFilterAdapter(searcher, key, (TermQuery) query);
}
if (query instanceof ConstantScoreQuery) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like an obvious question "Why don't we check for a wrapped TermsQuery or MatchAllDocsQuery?" Would be good to have a comment to answer that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I've not seen it come up. But, looking at it with fresh eyes now, I think the safest thing is to always unwrap.

@@ -386,4 +398,50 @@ void collectDebugInfo(BiConsumer<String, Object> add) {
add.accept("results_from_metadata", resultsFromMetadata);
}
}

private static class DocValuesFieldExistsAdapter extends QueryToFilterAdapter<DocValuesFieldExistsQuery> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point, keeping all the implementations as static inner classes is going to get unwieldy. Do you think we should refactor QueryToFilterAdapeter into its own package and make these static inners top level package private classes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah.... I'll see about breaking them out in a mechanical follow up PR.

@nik9000 nik9000 merged commit 0cf63fc into elastic:master May 12, 2021
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request May 12, 2021
This optimizes the `date_histogram` agg when there is a single bucket
and no sub-aggregations. We expect this to happen from time to time when
the buckets are larger than a day because folks often use "daily"
indices.

This was already fairly fast, but using the metadata makes it 10x
faster. Something like 98ms becomes 7.5ms. Nice if you can get it!

Like elastic#69377 this optimization will disable itself if you have document
level security enabled or are querying a rollup index. Also like elastic#69377
it won't do anything if there is a top level query.
nik9000 added a commit that referenced this pull request May 12, 2021
…2989)

This optimizes the `date_histogram` agg when there is a single bucket
and no sub-aggregations. We expect this to happen from time to time when
the buckets are larger than a day because folks often use "daily"
indices.

This was already fairly fast, but using the metadata makes it 10x
faster. Something like 98ms becomes 7.5ms. Nice if you can get it!

Like #69377 this optimization will disable itself if you have document
level security enabled or are querying a rollup index. Also like #69377
it won't do anything if there is a top level query.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v7.14.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants