Optimize lone single bucket `date_histogram` #71180

nik9000 · 2021-04-01T13:00:51Z

This optimizes the date_histogram agg when there is a single bucket
and no sub-aggregations. We expect this to happen from time to time when
the buckets are larger than a day because folks often use "daily"
indices.

This was already fairly fast, but using the metadata makes it 10x
faster. Something like 98ms becomes 7.5ms. Nice if you can get it!

Like #69377 this optimization will disable itself if you have document
level security enabled or are querying a rollup index. Also like #69377
it won't do anything if there is a top level query.

This restores SQL's test for fetching `half_floats` after we backported the precision change in that fetch (elastic#70653)

This optimizes the `date_histogram` agg when there is a single bucket and no sub-aggregations. We expect this to happen from time to time when the buckets are larger than a day because folks often use "daily" indices. This was already fairly fast, but using the metadata makes it 10x faster. Something like 98ms becomes 7.5ms. Nice if you can get it! Like elastic#69377 this optimization will disable itself if you have document level security enabled or are querying a rollup index. Also like elastic#69377 it won't do anything if there is a top level query.

nik9000 · 2021-04-01T22:29:50Z

|                Min Throughput |     5.0 |     5.00816 |  0.00372 |  ops/s |
|               Mean Throughput |     5.0 |     5.01744 |  0.00769 |  ops/s |
|             Median Throughput |     5.0 |     5.01392 |  0.00616 |  ops/s |
|                Max Throughput |     5.0 |     5.04775 |  0.02114 |  ops/s |
|       50th percentile latency |    92.8 |     7.15122 | -85.7464 |     ms |
|       90th percentile latency |    99.5 |     8.68909 | -90.8784 |     ms |
|       99th percentile latency |   111.8 |     10.6733 | -101.137 |     ms |
|      100th percentile latency |   132.6 |     20.7481 | -111.862 |     ms |
|  50th percentile service time |    91.7 |     5.96916 | -85.8158 |     ms |
|  90th percentile service time |    98.3 |     7.49015 | -90.8677 |     ms |
|  99th percentile service time |   110.8 |     9.28727 | -101.609 |     ms |
| 100th percentile service time |   131.1 |     19.7961 | -111.374 |     ms |
|                    error rate |      0  |           0 |        0 |      % |

elasticmachine · 2021-04-14T20:25:33Z

Pinging @elastic/es-analytics-geo (Team:Analytics)

nik9000 · 2021-04-15T19:21:08Z

run elasticsearch-ci/1

not-napoleon

As noted, I think future us will appreciate a little more documentation, but otherwise this looks fine. +1

not-napoleon · 2021-05-04T14:11:59Z

.../src/main/java/org/elasticsearch/search/aggregations/bucket/filter/QueryToFilterAdapter.java

@@ -52,6 +55,15 @@
        if (query instanceof TermQuery) {
            return new TermQueryToFilterAdapter(searcher, key, (TermQuery) query);
        }
+        if (query instanceof ConstantScoreQuery) {


It seems like an obvious question "Why don't we check for a wrapped TermsQuery or MatchAllDocsQuery?" Would be good to have a comment to answer that.

Because I've not seen it come up. But, looking at it with fresh eyes now, I think the safest thing is to always unwrap.

not-napoleon · 2021-05-04T14:28:51Z

.../src/main/java/org/elasticsearch/search/aggregations/bucket/filter/QueryToFilterAdapter.java

@@ -386,4 +398,50 @@ void collectDebugInfo(BiConsumer<String, Object> add) {
            add.accept("results_from_metadata", resultsFromMetadata);
        }
    }
+
+    private static class DocValuesFieldExistsAdapter extends QueryToFilterAdapter<DocValuesFieldExistsQuery> {


At some point, keeping all the implementations as static inner classes is going to get unwieldy. Do you think we should refactor QueryToFilterAdapeter into its own package and make these static inners top level package private classes?

Yeah.... I'll see about breaking them out in a mechanical follow up PR.

This optimizes the `date_histogram` agg when there is a single bucket and no sub-aggregations. We expect this to happen from time to time when the buckets are larger than a day because folks often use "daily" indices. This was already fairly fast, but using the metadata makes it 10x faster. Something like 98ms becomes 7.5ms. Nice if you can get it! Like elastic#69377 this optimization will disable itself if you have document level security enabled or are querying a rollup index. Also like elastic#69377 it won't do anything if there is a top level query.

…2989) This optimizes the `date_histogram` agg when there is a single bucket and no sub-aggregations. We expect this to happen from time to time when the buckets are larger than a day because folks often use "daily" indices. This was already fairly fast, but using the metadata makes it 10x faster. Something like 98ms becomes 7.5ms. Nice if you can get it! Like #69377 this optimization will disable itself if you have document level security enabled or are querying a rollup index. Also like #69377 it won't do anything if there is a top level query.

nik9000 added 3 commits March 25, 2021 17:00

Remove test skip after backport

6056afa

This restores SQL's test for fetching `half_floats` after we backported the precision change in that fetch (elastic#70653)

More tests

6676f0d

nik9000 mentioned this pull request Apr 1, 2021

Benchmark for date_histo with one bucket elastic/rally-tracks#165

Merged

nik9000 added 3 commits April 1, 2021 09:02

Merge branch 'master' into field_exists_speedy

acbe5ad

Unique name

13797f7

Compile plz

ce26ba3

nik9000 added 2 commits April 12, 2021 13:22

Merge branch 'master' into field_exists_speedy

144e0f2

Merge branch 'master' into field_exists_speedy

1a270e4

nik9000 requested a review from not-napoleon April 14, 2021 20:24

nik9000 added :Analytics/Aggregations Aggregations >enhancement v7.13.0 v8.0.0 labels Apr 14, 2021

Use new tools

d10b042

nik9000 marked this pull request as ready for review April 14, 2021 20:25

elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Apr 14, 2021

nik9000 added 4 commits April 14, 2021 16:29

Line length

1d89bdc

Merge branch 'master' into field_exists_speedy

7a3cf4c

tests

9b48cff

Merge branch 'master' into field_exists_speedy

d81d3e1

pugnascotia added v7.14.0 and removed v7.13.0 labels Apr 21, 2021

Merge branch 'master' into field_exists_speedy

2f76f22

not-napoleon approved these changes May 5, 2021

View reviewed changes

nik9000 added 2 commits May 5, 2021 12:16

Merge branch 'master' into field_exists_speedy

7139c71

Always unwrap

db298be

nik9000 merged commit 0cf63fc into elastic:master May 12, 2021

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize lone single bucket `date_histogram` #71180

Optimize lone single bucket `date_histogram` #71180

nik9000 commented Apr 1, 2021

nik9000 commented Apr 1, 2021

elasticmachine commented Apr 14, 2021

nik9000 commented Apr 15, 2021

not-napoleon left a comment

not-napoleon May 4, 2021

nik9000 May 5, 2021

not-napoleon May 4, 2021

nik9000 May 5, 2021

Optimize lone single bucket date_histogram #71180

Optimize lone single bucket date_histogram #71180

Conversation

nik9000 commented Apr 1, 2021

nik9000 commented Apr 1, 2021

elasticmachine commented Apr 14, 2021

nik9000 commented Apr 15, 2021

not-napoleon left a comment

Choose a reason for hiding this comment

not-napoleon May 4, 2021

Choose a reason for hiding this comment

nik9000 May 5, 2021

Choose a reason for hiding this comment

not-napoleon May 4, 2021

Choose a reason for hiding this comment

nik9000 May 5, 2021

Choose a reason for hiding this comment

Optimize lone single bucket `date_histogram` #71180

Optimize lone single bucket `date_histogram` #71180