Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable optimization if we aren't sure its faster (backport of #74260) #74563

Merged

Conversation

nik9000
Copy link
Member

@nik9000 nik9000 commented Jun 24, 2021

This disables the filter-by-filter aggregation optimization used by
terms, range, date_histogram, and date_range aggregations unless
we're sure that its faster than the "native" implementation. Mostly this
is when the top level query is empty or we can merge it into the filter
generated by the agg rewrite process.

Now that we have hard and fast rules we can drop the cost estimation
framework without too much fear. So we remove it in this change. It
stomps a bunch of complexity. Sadly, without the cost estimation stuff
we have to add a separate mechanism for blocking the optimization
against runtime fields for which it'd be kind of garbage. For that I
added another rule preventing the filter-by-filter aggregation from
running against the queries made by runtime fields. Its not fool-proof,
but we have control over what queries we pass as a filter so its not
wide open.

I spent a lot of time working on an alternative to this that preserved
that fancy filter-by-filter collection mechanism and was much more kind
to the query cache. It detected cases where going full filter-by-filter
was bad and grouped those filters together to collect in one pass with a
funny ORing collector. It worked. And, if we were super concerned with
the performance of the filters aggregation it'd be the way to go. But
it was very complex and it was actually slower than using the native
aggregation for things like terms and date_histogram. It was
glorious. But it was wrong for us. Too complex and optimized the wrong
things.

So here we are. Hopefully this is a fairly simple solution to a sneaky
problem.

…c#74260)

This disables the filter-by-filter aggregation optimization used by
`terms`, `range`, `date_histogram`, and `date_range` aggregations unless
we're *sure* that its faster than the "native" implementation. Mostly this
is when the top level query is empty or we can merge it into the filter
generated by the agg rewrite process.

Now that we have hard and fast rules we can drop the cost estimation
framework without too much fear. So we remove it in this change. It
stomps a bunch of complexity. Sadly, without the cost estimation stuff
we have to add a separate mechanism for blocking the optimization
against runtime fields for which it'd be kind of garbage. For that I
added another rule preventing the filter-by-filter aggregation from
running against the queries made by runtime fields. Its not fool-proof,
but we have control over what queries we pass as a filter so its not
wide open.

I spent a lot of time working on an alternative to this that preserved
that fancy filter-by-filter collection mechanism and was much more kind
to the query cache. It detected cases where going full filter-by-filter
was bad and grouped those filters together to collect in one pass with a
funny ORing collector. It *worked*. And, if we were super concerned with
the performance of the `filters` aggregation it'd be the way to go. But
it was very complex and it was actually slower than using the native
aggregation for things like `terms` and `date_histogram`. It was
glorious. But it was wrong for us. Too complex and optimized the wrong
things.

So here we are. Hopefully this is a fairly simple solution to a sneaky
problem.
@nik9000 nik9000 merged commit 2d5982b into elastic:7.x Jun 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant