-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize request fetching for filters and filter aggs #136796
Comments
Pinging @elastic/kibana-app-services (Team:AppServicesSv) |
Pinging @elastic/kibana-vis-editors @elastic/kibana-vis-editors-external (Team:VisEditors) |
Relates to elastic/elasticsearch#88660 |
Pinging @elastic/kibana-data-discovery (Team:DataDiscovery) |
Thanks for linking to that issue @nik9000... Would this sort of optimization make more sense at the ES level rather than Kibana? |
Hey! Since I filed that issue in ES we've mostly shifted work from aggs to ESQL - so if you desperately need aggs optimizations that's going to be harder to get to. But ESQL is picking up many of the optimizations from aggs as we go anyway. Back to aggs - the issue that I linked to talks about optimizing the ESQL plans to grow actual syntax for most of this. We're already talking about building blocks for it here: elastic/elasticsearch#106152 . |
Thanx Nik! ok it feels as something we should freeze for now as it is going to work better in ES|QL. There are plans to move to Lens to work with _query in the background but we want more feature parity to do so. I think is ok to wait till then though. cc @timductive |
Filters specified in the filter or filters agg use a linear scan through all documents matching the top level query of the request - depending on how many documents are matching this top level query, this can be dramatically slower than sending separate requests to fetch those parts individually.
A common example, tested with ~3M documents:
Date histogram of a ratio of sum of some field (fetching the sum agg twice, with filter and without filter):
Date histogram without filter set on query level: 12s
Date histogram with filter set in top level query: depends on how many documents are matched by the query - can span from milliseconds to ~10s
Date histogram with two sum aggs, using a nested filter agg (no matter how much documents are matched by the filter): 21s
If the filter hits a lot of data, doing it in a single request is just as fast or slightly faster than doing two separate requests - however, if the filter is hitting little data, it can be much faster. The same would apply if the ratio is done with two separate non-overlapping filters hitting a small amount of documents each.
This problem gets worse if lots of different filter aggs are used in the same query - effectively every filtered metric is as expensive as a whole separate search requiring a linear scan (like doing a metric agg like sum or a date histogram without a top level query). It's affecting the filters agg and multiple filter aggs in the same way (the only performant way is to narrow down documents in the top level query).
In the best case doing a single request is roughly as fast as doing multiple requests (minus some static overhead per request which doesn’t matter too much for requests hitting large amounts of data), in the worst case it’s orders of magnitudes slower because the top level query is hitting optimizations that are not available to in-agg filters, making certain requests feasible in the first place (like just getting the top level count instead of doing aggs requiring a scan).
Due to Elasticsearch parallelizing a lot of work across multiple shards, on a healthy cluster this is often not resulting in widely longer response times for the inefficient query itself, but it unnecessarily increases the load on the cluster which can turn into problems for busy clusters (queued searches, CPU throttling).
Possible optimizations
On Lens level
in the to_expression function of the datasource, if filtered metrics or the filter agg is used, create separate esaggs calls, adding the same filter to the top level query - each of those has all the bucket aggs plus the metrics with the same filter - simplification for a POC: have one request per metric
Pass disabled filters agg so the datatable meta data is set up right
Define a merge_tables expression function which is merging all rows along the bucket columns (for filters agg there should be an argument for the filters column Id and the label to set it to)
Upside: leverage all top level query optimizations with dramatic performance gains in some special cases, with negligible to no impact in query runtime in other cases
Downside: Map and tabify the bucket structure multiple times and also merge it later on which is more taxing on the client
On AggConfig level
The same thing as above, but on aggconfig level, not operating on the table but merging responses similar to how timeshift is implemented before tabification.
Another option not leveraging as much potential but theoretically being easier to implement is to collect all filters for filters and filter aggs from the agg tree and adding all of them as
OR
clauses to the top level query. This will limit the amount of documents a linear scan has to be performed on in some cases without catching all possible optimizations - e.g. in the example above with a ratio of a filtered metric vs an unfiltered metric it wouldn't be possible to prevent a another linear scan for the filtered metric.The text was updated successfully, but these errors were encountered: