Disable optimization if we aren't sure its faster #74260

nik9000 · 2021-06-17T19:43:15Z

This disables the filter-by-filter aggregation optimization used by
terms, range, date_histogram, and date_range aggregations unless
we're sure that its faster than the "native" implementation. Mostly this
is when the top level query is empty or we can merge it into the filter
generated by the agg rewrite process.

Now that we have hard and fast rules we can drop the cost estimation
framework without too much fear. So we remove it in this change. It
stomps a bunch of complexity. Sadly, without the cost estimation stuff
we have to add a separate mechanism for blocking the optimization
against runtime fields for which it'd be kind of garbage. For that I
added another rule preventing the filter-by-filter aggregation from
running against the queries made by runtime fields. Its not fool-proof,
but we have control over what queries we pass as a filter so its not
wide open.

I spent a lot of time working on an alternative to this that preserved
that fancy filter-by-filter collection mechanism and was much more kind
to the query cache. It detected cases where going full filter-by-filter
was bad and grouped those filters together to collect in one pass with a
funny ORing collector. It worked. And, if we were super concerned with
the performance of the filters aggregation it'd be the way to go. But
it was very complex and it was actually slower than using the native
aggregation for things like terms and date_histogram. It was
glorious. But it was wrong for us. Too complex and optimized the wrong
things.

So here we are. Hopefully this is a fairly simple solution to a sneaky
problem.

Closes #74480

This disables the filter-by-filter aggregation optimization used by `terms`, `range`, `date_histogram`, and `date_range` aggregations unless we're *sure* that its faster than the "native" implementation. Mostly this is when the top level query is empty or we can merge it into the filter generated by the agg rewrite process. Now that we have hard and fast rules we can drop the cost estimation framework without too much fear. So we remove it in this change. It stomps a bunch of complexity. Sadly, without the cost estimation stuff we have to add a separate mechanism for blocking the optimization against runtime fields for which it'd be kind of garbage. For that I added another rule preventing the filter-by-filter aggregation from running against the queries made by runtime fields. Its not fool-proof, but we have control over what queries we pass as a filter so its not wide open. I spent a lot of time working on an alternative to this that preserved that fancy filter-by-filter collection mechanism and was much more kind to the query cache. It detected cases where going full filter-by-filter was bad and grouped those filters together to collect in one pass with a funny ORing collector. It *worked*. And, if we were super concerned with the performance of the `filters` aggregation it'd be the way to go. But it was very complex and it was actually slower than using the native aggregation for things like `terms` and `date_histogram`. It was glorious. But it was wrong for us. Too complex and optimized the wrong things. So here we are. Hopefully this is a fairly simple solution to a sneaky problem.

nik9000 · 2021-06-17T19:44:37Z

Go jenkins go! Run all those tests.

elasticmachine · 2021-06-21T12:12:52Z

Pinging @elastic/es-analytics-geo (Team:Analytics)

nik9000 · 2021-06-21T12:14:07Z

...rnalClusterTest/java/org/elasticsearch/search/profile/aggregation/AggregationProfilerIT.java

@@ -623,30 +622,23 @@ public void testFilterByFilter() throws InterruptedException, IOException {
            assertThat(breakdown.get(COLLECT), equalTo(0L));
            assertThat(breakdown.get(BUILD_AGGREGATION).longValue(), greaterThan(0L));
            assertThat(breakdown.get(REDUCE), equalTo(0L));
-            Map<String, Object> debug = histoAggResult.getDebugInfo();


There are a lot of test changes in this PR. The tests started failing because we no longer calculate the cost of the agg so I took the opportunity to convert them to debugTestCase and MapMatcher.

nik9000 · 2021-06-21T12:15:16Z

server/src/main/java/org/elasticsearch/index/mapper/KeywordScriptFieldType.java

@@ -53,7 +53,7 @@ public KeywordScriptFieldType(String name) {
        this(name, StringFieldScript.PARSE_FROM_SOURCE, null, Collections.emptyMap(), (builder, params) -> builder);
    }

-    KeywordScriptFieldType(
+    public KeywordScriptFieldType(


I'd have to go through a ton of mocking without this and I'm explicitly trying to test runtime fields so I think its probably worth it to just make it public. There is a private builder factory thing but it goes through a lot of parsing infrastructure we don't have in the tests.

This may be a dumb question, but can you not just use the public constructor above? Particularly as it looks like your use of this is just doing a source lookup, which is what the name-only constructor gives you.

No, it's a good question. I could use it if I built the _source but all the rest of those tests operate on doc values so I don't really have the infrastructure in them to build the source. Maybe that's simpler than I though and I should just do it. I'll have a look.

I gave this a shot and it didn't work. I think those public ctors don't currently make useful fields. That's worth fixing, I think, but probably not in this PR.

nik9000 · 2021-06-21T12:16:04Z

server/src/main/java/org/elasticsearch/search/aggregations/AdaptingAggregator.java

@@ -30,7 +30,7 @@
    public AdaptingAggregator(
        Aggregator parent,
        AggregatorFactories subAggregators,
-        CheckedFunction<AggregatorFactories, Aggregator, IOException> delegate
+        CheckedFunction<AggregatorFactories, ? extends Aggregator, IOException> delegate


This delegate thing is required so we can "pin" the parent references in the sub aggregators. I wish we didn't have to do that stuff, but here we are.

nik9000 · 2021-06-21T12:16:31Z

...in/java/org/elasticsearch/search/aggregations/bucket/filter/DocValuesFieldExistsAdapter.java

@@ -43,14 +42,6 @@ long count(LeafReaderContext ctx, FiltersAggregator.Counter counter, Bits live)
        return super.count(ctx, counter, live);
    }

-    @Override
-    long estimateCountCost(LeafReaderContext ctx, CheckedSupplier<Boolean, IOException> canUseMetadata) throws IOException {


No more estimate*Cost methods because we use heuristics on query type.

The problem with estimate*Cost methods was that they couldn't detect when a query was slow to prepare because you had to prepare to get the cost. And, now that we disable the optimization if the top level is non-match-all there isn't really any complex heuristics to do. We have a lot more control over which queries come in on the filters, at least in the "optimize" route.

nik9000 · 2021-06-21T12:18:48Z

.../main/java/org/elasticsearch/search/aggregations/bucket/filter/FilterByFilterAggregator.java

+ * {@link Compatible} but doesn't support when there is a parent aggregator
+ * or any child aggregators.
+ */
+public class FilterByFilterAggregator extends FiltersAggregator {


I yanked this out to the top level because it was complex. And because it has a builder.....

I appreciate that.

nik9000 · 2021-06-21T12:20:50Z

server/src/main/java/org/elasticsearch/search/aggregations/bucket/filter/FiltersAggregator.java

-         * stop counting once we've passed {@code maxEstimatedCost} if we aren't profiling.
-         */
-        @SuppressWarnings("resource") // We're not in change of anything Closeable
-        public long estimateCost(long maxCost) throws IOException {


This method is entirely gone.

nik9000 · 2021-06-21T12:21:08Z

server/src/main/java/org/elasticsearch/search/aggregations/bucket/filter/FiltersAggregator.java

-     * {@link Compatible} but doesn't support when there is a parent aggregator
-     * or any child aggregators.
-     */
-    public static class FilterByFilter extends FiltersAggregator {


Most of this class is just moved into its own file. The cost estimation is entirely gone.

nik9000 · 2021-06-21T12:22:00Z

.../src/main/java/org/elasticsearch/search/aggregations/bucket/filter/QueryToFilterAdapter.java

-     * Use {@link #bulkScorer(LeafReaderContext, Runnable)} to build the scorer
-     * when needed.
-     */
-    private BulkScorer[] bulkScorers;


No need to cache scorers because we use them once. We still want to cache the weight because we use it for every segment.

nik9000 · 2021-06-21T12:22:46Z

.../src/main/java/org/elasticsearch/search/aggregations/bucket/filter/QueryToFilterAdapter.java

+            public boolean isInefficientUnion() {
+                return true;
+            }
+        };


I'm mostly building the boolean query here out of spite - we don't use it but it is the correct query to use if we didn't disable the optimization.

nik9000 · 2021-06-21T12:24:53Z

server/src/main/java/org/elasticsearch/search/aggregations/bucket/range/RangeAggregator.java

@@ -291,23 +290,8 @@ public static Aggregator build(
            cardinality,
            metadata
        );
-        Map<String, Object> filtersDebug = null;


No more need to capture the filters debug - if we built the filters agg we kept it. If we didn't we don't have anything to keep. I suppose we could add a debug callback to the build process, but we already capture the queries it tries to build and those are all we use to decide whether or not to use filter by filter.

nik9000 · 2021-06-23T11:17:17Z

Let me see about resolving those test conflicts....

not-napoleon

LGTM. It's magic, but it's well documented magic with a lot of tests, and that's really all I can ask for.

not-napoleon · 2021-06-23T13:39:49Z

.../main/java/org/elasticsearch/search/aggregations/bucket/filter/FilterByFilterAggregator.java

+ * {@link Compatible} but doesn't support when there is a parent aggregator
+ * or any child aggregators.
+ */
+public class FilterByFilterAggregator extends FiltersAggregator {


I appreciate that.

.../main/java/org/elasticsearch/search/aggregations/bucket/filter/FilterByFilterAggregator.java

not-napoleon · 2021-06-23T13:57:19Z

.../src/main/java/org/elasticsearch/search/aggregations/bucket/filter/QueryToFilterAdapter.java

+     * "native" aggregation implementation rather than go filter by filter.
+     */
+    public boolean isInefficientUnion() {
+        return false;


so the default is that the union is good, and we override specific cases to return false for known bad unions?

The default is that filters are not "inefficient unions". In the case where we're fairly sure it is an inefficient union we override to true and bail as fast as we can.

The trick here is that when we rewrite we're fairly sure we're making efficient queries. Well, we check for runtime fields queries, but otherwise, we expect rewriting to make fairly efficient queries.

When you are using the filters aggregator you very well could provide a terrible query. But it's going to be the same speed to run it filter by filter as it would be to run it in the compatible way. There are likely some cases where it'd have been faster to rewrite it. Or, better yet, to use the "disjunction collector" idea that I showed you to run all of the filters in parallel with the top level query. But this change should conservatively fall back to the old performance. Which wasn't great, but at least it wasn't terrible.

Co-authored-by: Mark Tozzi <[email protected]>

iverase · 2021-06-24T07:35:16Z

.../main/java/org/elasticsearch/search/aggregations/bucket/filter/FilterByFilterAggregator.java

+            if (query instanceof AbstractScriptFieldQuery) {
+                /*
+                 * We know that runtime fields aren't not fast to query at all
+                 * but we expect all other sorts of queries are at least as


nit: aren't not -> aren't

iverase

LGTM. Indeed this approach is easy to follow.

nik9000 · 2021-06-24T12:05:12Z

Thanks folks! I'll fix up the spelling and merge when jenkins likes it.

…c#74260) This disables the filter-by-filter aggregation optimization used by `terms`, `range`, `date_histogram`, and `date_range` aggregations unless we're *sure* that its faster than the "native" implementation. Mostly this is when the top level query is empty or we can merge it into the filter generated by the agg rewrite process. Now that we have hard and fast rules we can drop the cost estimation framework without too much fear. So we remove it in this change. It stomps a bunch of complexity. Sadly, without the cost estimation stuff we have to add a separate mechanism for blocking the optimization against runtime fields for which it'd be kind of garbage. For that I added another rule preventing the filter-by-filter aggregation from running against the queries made by runtime fields. Its not fool-proof, but we have control over what queries we pass as a filter so its not wide open. I spent a lot of time working on an alternative to this that preserved that fancy filter-by-filter collection mechanism and was much more kind to the query cache. It detected cases where going full filter-by-filter was bad and grouped those filters together to collect in one pass with a funny ORing collector. It *worked*. And, if we were super concerned with the performance of the `filters` aggregation it'd be the way to go. But it was very complex and it was actually slower than using the native aggregation for things like `terms` and `date_histogram`. It was glorious. But it was wrong for us. Too complex and optimized the wrong things. So here we are. Hopefully this is a fairly simple solution to a sneaky problem.

#74563) This disables the filter-by-filter aggregation optimization used by `terms`, `range`, `date_histogram`, and `date_range` aggregations unless we're *sure* that its faster than the "native" implementation. Mostly this is when the top level query is empty or we can merge it into the filter generated by the agg rewrite process. Now that we have hard and fast rules we can drop the cost estimation framework without too much fear. So we remove it in this change. It stomps a bunch of complexity. Sadly, without the cost estimation stuff we have to add a separate mechanism for blocking the optimization against runtime fields for which it'd be kind of garbage. For that I added another rule preventing the filter-by-filter aggregation from running against the queries made by runtime fields. Its not fool-proof, but we have control over what queries we pass as a filter so its not wide open. I spent a lot of time working on an alternative to this that preserved that fancy filter-by-filter collection mechanism and was much more kind to the query cache. It detected cases where going full filter-by-filter was bad and grouped those filters together to collect in one pass with a funny ORing collector. It *worked*. And, if we were super concerned with the performance of the `filters` aggregation it'd be the way to go. But it was very complex and it was actually slower than using the native aggregation for things like `terms` and `date_histogram`. It was glorious. But it was wrong for us. Too complex and optimized the wrong things. So here we are. Hopefully this is a fairly simple solution to a sneaky problem.

In elastic#74260 we disabled the "filter by filter" aggregations optimizations when we couldn't be 100% sure that they were both faster and *safe* for the cluster. Before that, mostly in the 7.13 line, we would aggressively enable the optimization whenever we could and there were lots of cases where the well meaning optimization was way way slower than the original implementation. There were also cases where it'd fill up memory and knock over the cluster! Disaster. With elastic#74260 we're safe. The optimization still kicks in when its faster - like when the top level query is empty - but otherwise it get's out of the way and lets all the normal aggs run.

In #74260 we disabled the "filter by filter" aggregations optimizations when we couldn't be 100% sure that they were both faster and *safe* for the cluster. Before that, mostly in the 7.13 line, we would aggressively enable the optimization whenever we could and there were lots of cases where the well meaning optimization was way way slower than the original implementation. There were also cases where it'd fill up memory and knock over the cluster! Disaster. With #74260 we're safe. The optimization still kicks in when its faster - like when the top level query is empty - but otherwise it get's out of the way and lets all the normal aggs run.

In elastic#74260 we disabled the "filter by filter" aggregations optimizations when we couldn't be 100% sure that they were both faster and *safe* for the cluster. Before that, mostly in the 7.13 line, we would aggressively enable the optimization whenever we could and there were lots of cases where the well meaning optimization was way way slower than the original implementation. There were also cases where it'd fill up memory and knock over the cluster! Disaster. With elastic#74260 we're safe. The optimization still kicks in when its faster - like when the top level query is empty - but otherwise it get's out of the way and lets all the normal aggs run.

…5859) In #74260 we disabled the "filter by filter" aggregations optimizations when we couldn't be 100% sure that they were both faster and *safe* for the cluster. Before that, mostly in the 7.13 line, we would aggressively enable the optimization whenever we could and there were lots of cases where the well meaning optimization was way way slower than the original implementation. There were also cases where it'd fill up memory and knock over the cluster! Disaster. With #74260 we're safe. The optimization still kicks in when its faster - like when the top level query is empty - but otherwise it get's out of the way and lets all the normal aggs run.

BWC

0c0dd51

nik9000 marked this pull request as ready for review June 21, 2021 12:12

nik9000 requested review from not-napoleon and imotov June 21, 2021 12:12

nik9000 added :Analytics/Aggregations Aggregations v7.14.0 v8.0.0 labels Jun 21, 2021

elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jun 21, 2021

nik9000 commented Jun 21, 2021

View reviewed changes

more javadoc

cfdd7e5

Merge branch 'master' into filter_by_filter_cache_kindness_take_two

da7e9ff

nik9000 mentioned this pull request Jun 23, 2021

Rewrite-to-filters optimization is very slow in FiltersAggregator #74480

Closed

not-napoleon approved these changes Jun 23, 2021

View reviewed changes

nik9000 and others added 2 commits June 23, 2021 10:09

Apply suggestions from code review

c7e13ae

Co-authored-by: Mark Tozzi <[email protected]>

Merge branch 'master' into filter_by_filter_cache_kindness_take_two

293858f

iverase self-requested a review June 23, 2021 15:06

iverase reviewed Jun 24, 2021

View reviewed changes

iverase approved these changes Jun 24, 2021

View reviewed changes

Merge branch 'master' into filter_by_filter_cache_kindness_take_two

99fcff8

Words

7e1e67b

nik9000 merged commit a5af44d into elastic:master Jun 24, 2021

nik9000 mentioned this pull request Jul 9, 2021

Allow terms to run as filters #68871

Merged

stevejgordon mentioned this pull request Jul 16, 2021

Performance degradation with 7.13 against 7.12 elastic/elasticsearch-net#5910

Closed

nik9000 mentioned this pull request Jul 20, 2021

Performance regression on aggs when top level query is single-node bool #75542

Closed

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

nik9000 mentioned this pull request Jul 27, 2021

Docs: Release notes for filter by filter fix #75749

Merged

nik9000 mentioned this pull request Jul 29, 2021

Docs: Release notes for filter by filter fix (backport of #75749) #75859

Merged

probakowski added the >enhancement label Jul 30, 2021

not-napoleon mentioned this pull request Aug 4, 2021

Slow StringTermsAggregatorFromFilters #76104

Closed

nik9000 mentioned this pull request Jul 20, 2022

Optimize filters execution another way #88660

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disable optimization if we aren't sure its faster #74260

Disable optimization if we aren't sure its faster #74260

nik9000 commented Jun 17, 2021 •

edited

Loading

nik9000 commented Jun 17, 2021

elasticmachine commented Jun 21, 2021

nik9000 Jun 21, 2021

nik9000 Jun 21, 2021

romseygeek Jun 23, 2021

nik9000 Jun 23, 2021

nik9000 Jun 23, 2021

nik9000 Jun 21, 2021

nik9000 Jun 21, 2021

nik9000 Jun 21, 2021

nik9000 Jun 21, 2021

not-napoleon Jun 23, 2021

nik9000 Jun 21, 2021

nik9000 Jun 21, 2021

nik9000 Jun 21, 2021

nik9000 Jun 21, 2021

nik9000 Jun 21, 2021

nik9000 commented Jun 23, 2021

not-napoleon left a comment

not-napoleon Jun 23, 2021

not-napoleon Jun 23, 2021

nik9000 Jun 23, 2021 •

edited

Loading

nik9000 Jun 23, 2021

iverase Jun 24, 2021

iverase left a comment

nik9000 commented Jun 24, 2021

Disable optimization if we aren't sure its faster #74260

Disable optimization if we aren't sure its faster #74260

Conversation

nik9000 commented Jun 17, 2021 • edited Loading

nik9000 commented Jun 17, 2021

elasticmachine commented Jun 21, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nik9000 commented Jun 23, 2021

not-napoleon left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nik9000 Jun 23, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

iverase left a comment

Choose a reason for hiding this comment

nik9000 commented Jun 24, 2021

nik9000 commented Jun 17, 2021 •

edited

Loading

nik9000 Jun 23, 2021 •

edited

Loading