Disable optimization if we aren't sure its faster (backport of #74260) #74563
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This disables the filter-by-filter aggregation optimization used by
terms
,range
,date_histogram
, anddate_range
aggregations unlesswe're sure that its faster than the "native" implementation. Mostly this
is when the top level query is empty or we can merge it into the filter
generated by the agg rewrite process.
Now that we have hard and fast rules we can drop the cost estimation
framework without too much fear. So we remove it in this change. It
stomps a bunch of complexity. Sadly, without the cost estimation stuff
we have to add a separate mechanism for blocking the optimization
against runtime fields for which it'd be kind of garbage. For that I
added another rule preventing the filter-by-filter aggregation from
running against the queries made by runtime fields. Its not fool-proof,
but we have control over what queries we pass as a filter so its not
wide open.
I spent a lot of time working on an alternative to this that preserved
that fancy filter-by-filter collection mechanism and was much more kind
to the query cache. It detected cases where going full filter-by-filter
was bad and grouped those filters together to collect in one pass with a
funny ORing collector. It worked. And, if we were super concerned with
the performance of the
filters
aggregation it'd be the way to go. Butit was very complex and it was actually slower than using the native
aggregation for things like
terms
anddate_histogram
. It wasglorious. But it was wrong for us. Too complex and optimized the wrong
things.
So here we are. Hopefully this is a fairly simple solution to a sneaky
problem.