-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up terms agg when alone #69377
Conversation
This speeds up the `terms` agg in a very specific case: 1. It has no child aggregations 2. It has no parent aggregations 3. There are no deleted documents 4. You are not using document level security 5. There is no top level query 6. The field has global ordinals 7. There are less than one thousand distinct terms That is a lot of restirctions! But the speed up pretty substantial because in those cases we can serve the entire aggregation using metadata that lucene precomputes while it builds the index. In a real rally track we have we get a 92% speed improvement, but the index isn't *that* big: ``` | 90th percentile service time | keyword-terms-low-cardinality | 446.031 | 36.7677 | -409.263 | ms | ``` In a rally track with a larger index I ran some tests by hand and the aggregation went from 2200ms to 8ms. Even though there are 7 restrictions on this, I expect it to come into play enough to matter. Restriction 6 just means you are aggregating on a `keyword` field. Or an `ip`. And its fairly common for `keyword`s to have less than a thousand distinct values. Certainly not everywhere, but some places. I expect "cold tier" indices are very very likely not to have deleted documents at all. And the optimization works segment by segment - so it'll save some time on each segment without deleted documents. But more time if the entire index doesn't have any. The optimization builds on elastic#68871 which translates `terms` aggregations against low cardinality fields with global ordinals into a `filters` aggregation. This teaches the `filters` aggregation to recognize when it can get its results from the index metadata. Rather, it creates the infrastructure to make that fairly simple and applies it in the case of the queries generated by the terms aggregation.
Pinging @elastic/es-analytics-geo (Team:Analytics) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check out the profile output:
$ curl -s -HContent-Type:application/json -uelastic:password localhost:9200/_search?pretty -d'{
"size": 0,
"profile": true,
"aggs": {
"t": {
"terms": {
"field": "rate_code_id"
}
}
}
}'
{
"took" : 17,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10000,
"relation" : "gte"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"t" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "1",
"doc_count" : 160957580
},
{
"key" : "2",
"doc_count" : 3141764
},
{
"key" : "5",
"doc_count" : 904379
},
{
"key" : "3",
"doc_count" : 269558
},
{
"key" : "4",
"doc_count" : 65655
},
{
"key" : "99",
"doc_count" : 5661
},
{
"key" : "6",
"doc_count" : 2095
}
]
}
},
"profile" : {
"shards" : [
{
"id" : "[tG5FPmcfQ6aI9KXq9V9cTg][nyc_taxis][0]",
"searches" : [
{
"query" : [
{
"type" : "MatchAllDocsQuery",
"description" : "*:*",
"time_in_nanos" : 4265,
"breakdown" : {
"set_min_competitive_score_count" : 0,
"match_count" : 0,
"shallow_advance_count" : 0,
"set_min_competitive_score" : 0,
"next_doc" : 0,
"match" : 0,
"next_doc_count" : 0,
"score_count" : 0,
"compute_max_score_count" : 0,
"compute_max_score" : 0,
"advance" : 0,
"advance_count" : 0,
"score" : 0,
"build_scorer_count" : 0,
"create_weight" : 4265,
"shallow_advance" : 0,
"create_weight_count" : 1,
"build_scorer" : 0
}
}
],
"rewrite_time" : 14563,
"collector" : [
{
"name" : "MultiCollector",
"reason" : "search_multi",
"time_in_nanos" : 9553931,
"children" : [
{
"name" : "EarlyTerminatingCollector",
"reason" : "search_count",
"time_in_nanos" : 170074
},
{
"name" : "ProfilingAggregator: [org.elasticsearch.search.aggregations.bucket.terms.StringTermsAggregatorFromFilters@3bdd7fd1]",
"reason" : "aggregation",
"time_in_nanos" : 9155907
}
]
}
]
}
],
"aggregations" : [
{
"type" : "StringTermsAggregatorFromFilters",
"description" : "t",
"time_in_nanos" : 9181873,
"breakdown" : {
"reduce" : 0,
"post_collection_count" : 1,
"build_leaf_collector" : 9060609,
"build_aggregation" : 114667,
"build_aggregation_count" : 1,
"build_leaf_collector_count" : 29,
"post_collection" : 2672,
"initialize" : 3925,
"initialize_count" : 1,
"reduce_count" : 0,
"collect" : 0,
"collect_count" : 0
},
"debug" : {
"delegate" : "FiltersAggregator.FilterByFilter",
"delegate_debug" : {
"segments_with_doc_count_field" : 0,
"segments_with_deleted_docs" : 0,
"filters" : [
{
"results_from_metadata" : 29,
"query" : "rate_code_id:1",
"scorers_prepared_while_estimating_cost" : 0,
"type" : "term"
},
{
"results_from_metadata" : 29,
"query" : "rate_code_id:2",
"scorers_prepared_while_estimating_cost" : 0,
"type" : "term"
},
{
"results_from_metadata" : 29,
"query" : "rate_code_id:3",
"scorers_prepared_while_estimating_cost" : 0,
"type" : "term"
},
{
"results_from_metadata" : 29,
"query" : "rate_code_id:4",
"scorers_prepared_while_estimating_cost" : 0,
"type" : "term"
},
{
"results_from_metadata" : 29,
"query" : "rate_code_id:5",
"scorers_prepared_while_estimating_cost" : 0,
"type" : "term"
},
{
"results_from_metadata" : 29,
"query" : "rate_code_id:6",
"scorers_prepared_while_estimating_cost" : 0,
"type" : "term"
},
{
"results_from_metadata" : 29,
"query" : "rate_code_id:99",
"scorers_prepared_while_estimating_cost" : 0,
"type" : "term"
}
]
}
}
}
]
}
]
}
}
this.profiling = context.profiling(); | ||
} | ||
|
||
/** | ||
* Estimate the number of documents that this aggregation must visit. We'll | ||
* stop counting once we've passed {@code maxEstimatedCost} if we aren't profiling. | ||
*/ | ||
@SuppressWarnings("resource") // We're not in change of anything Closeable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I was porting this to use the new QueryToFilterAdatapter
I noticed a problem: previously this would abort estimating the cost if the cost went over the max which is probably good. And when you profile it wouldn't abort the estimating so that you could get back what the "real total" estimate would be. Also probably good. But we timed how long the estimate took. And previously the timer included how long it took to estimate without the early abort. That felt wrong. Now we stop the timer as soon as we pass the limit. Or when we return.
/** | ||
* Special case when the filter can't match anything. | ||
*/ | ||
private static class MatchNoneQueryToFilterAdapter extends QueryToFilterAdapter { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't strictly need this for the PR but it feels like a nice example of how you'd implement this for a filter that can come purely from metadata.
/** | ||
* Filter that matches every document. | ||
*/ | ||
private static class MatchAllQueryToFilterAdapter extends QueryToFilterAdapter { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And you don't strictly need this one either because folks won't usually write match_all
in their filters
agg. But, again, its a fairly compact way of looking at how you'd read from metadata.
long count(LeafReaderContext ctx, FiltersAggregator.Counter counter, Bits live) throws IOException { | ||
if (countCanUseMetadata(counter, live)) { | ||
resultsFromMetadata++; | ||
return ctx.reader().docFreq(query().getTerm()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This right here is the magic.
We certainly could have done with without adding this whole abstraction. But the abstraction really helps me reason about what we want from queries and having a subclass per query that has a fancy optimization helps me to identify which fancy optimizations we have. And it helps us track how they are being applied when you profile it.
*/ | ||
public void testMatchAllOnFilteredIndex() throws IOException { | ||
AggregationBuilder builder = new FiltersAggregationBuilder("test", new KeyedFilter("q1", new MatchAllQueryBuilder())); | ||
try (Directory directory = newDirectory()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is kind of a mess because it doesn't fit into how the agg tests usually work. But this is a fairly rare thing.
That failure looks real to me! |
All better now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea behind QueryToFilterAdapter
as an abstraction in general, but I think it's too abstract right now. I don't think adding the second layer CommonQueryToFilterAdapter
just to avoid having an unused query parameter on the match all & match none cases is worth it, and I'd rather see one or zero abstract classes. Willing to discuss more.
this.otherBucketKey = otherBucketKey; | ||
} | ||
|
||
List<QueryToFilterAdapter> filters() { | ||
return filters; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this mutable on purpose? Does it have to be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nevermind, I see now that we use List.copyOf
in the ctor to make this immutable before we get to this point.
/** | ||
* Adapts a Lucene {@link Query} to the behaviors used be the | ||
* {@link FiltersAggregator}. In general we try to delegate to {@linkplain Query} | ||
* when we don't have |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you forgot to finish this comment, and I'm not sure what you meant to say
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. I do that.
* {@link FiltersAggregator}. In general we try to delegate to {@linkplain Query} | ||
* when we don't have | ||
*/ | ||
public abstract class QueryToFilterAdapter { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we'd talked about making the base class concrete and using it for the default implementation. What changed your mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two things bothered me about it:
- The base class would need a
<>
operator which is kind of a pain because it's really just an implementation detail. - MatchNone doesn't really want any of the "default" stuff. It'd fully override it. MatchAll is pretty similar. But those are the only two like that.
I'd certainly be willing to try and do all the merging and see what it looks like.
|
||
/** | ||
* Is it safe to use index metadata like | ||
* {@link IndexReader#docFreq} or {@link IndexReader#maxDoc} to count. the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* {@link IndexReader#docFreq} or {@link IndexReader#maxDoc} to count. the | |
* {@link IndexReader#docFreq} or {@link IndexReader#maxDoc} to count the |
/** | ||
* Is it safe to use index metadata like | ||
* {@link IndexReader#docFreq} or {@link IndexReader#maxDoc} to count. the | ||
* number of matching documents. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would benefit from explaining what "safe" means in this context. In particular, I don't think it's obvious what Bits live
is or why it should be null here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
*/ | ||
abstract QueryToFilterAdapter union(Query extraQuery) throws IOException; | ||
|
||
abstract IntPredicate matchingDocIds(LeafReaderContext ctx) throws IOException; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should have some javadoc to indicate how to implement it, as should count
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
/** | ||
* Abstract superclass of filters that delegates everything to the query. | ||
*/ | ||
private abstract static class CommonQueryToFilterAdapter<Q extends Query> extends QueryToFilterAdapter { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two layers of abstraction seems excessive here. Is there ever a case where we'd want something to accept a CommonQueryToFilterAdapter
but not a QueryToFilterAdapter
? It looks to me like this just exists for code reuse, in which case I'd rather it wasn't abstract, just make this the default implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah - its just for code reuse. I'll see what making it the default implementation looks like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! If it looks worse, we can go with this, but I find the two abstract base classes confusing, personally.
* Note: This method rewrites the query against the {@link IndexSearcher}. | ||
*/ | ||
QueryToFilterAdapter<?> union(Query extraQuery) throws IOException { | ||
extraQuery = searcher().rewrite(extraQuery); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method seems like a limited duplication of logic of boolean query rewrite. What's the main reason for not just wrapping this into boolean with 2 must closes and allow it to figure out optimization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BooleanQuery doesn't know how to merge point range queries and that speeds up when we rewrite date_histogram
into filters by a factor of about 2. Its something @iverase and I've talked about some. And I've talked about it with @romseygeek. I think at some point Lucene'll be able to do more of this sort of thing, but until then, here we are.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. I think a TODO here with a link to the issue would help future readers.
*/ | ||
public static QueryToFilterAdapter<?> build(IndexSearcher searcher, String key, Query query) throws IOException { | ||
query = searcher.rewrite(query); | ||
if (query instanceof TermQuery) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IDK, I cannot quite put my finger on it, but that smells like we are compensating for some functionality or optimization that is missing in lucene. I would love somebody closer to lucene to take a look at it.
We are indeed compensating. Maybe prototyping? I think it'd be super
reasonable for lucene to have some of these methods in some form. It has a
kind of limited count optimization already. I was hoping to build what we
need and cherry pick it into lucene when they want it.
I'll ping some lucene folk to review too.
…On Tue, Feb 23, 2021, 15:56 Igor Motov ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In
server/src/main/java/org/elasticsearch/search/aggregations/bucket/filter/QueryToFilterAdapter.java
<#69377 (comment)>
:
> + }
+ /*
+ * We can only use metadata if we're not using the special docCount
+ * field. Otherwise we wouldn't know how many documents each lucene
+ * document represents.
+ */
+ return counter.docCount.alwaysOne();
+ }
+
+ /**
+ * Make a filter that matches this filter and the provided query.
+ * <p>
+ * Note: This method rewrites the query against the ***@***.*** IndexSearcher}.
+ */
+ QueryToFilterAdapter<?> union(Query extraQuery) throws IOException {
+ extraQuery = searcher().rewrite(extraQuery);
This method seems like a limited duplication of logic of boolean query
rewrite. What's the main reason for not just wrapping this into boolean
with 2 must closes and allow it to figure out optimization?
------------------------------
In
server/src/main/java/org/elasticsearch/search/aggregations/bucket/filter/QueryToFilterAdapter.java
<#69377 (comment)>
:
> +import java.util.function.IntPredicate;
+
+/**
+ * Adapts a Lucene ***@***.*** Query} to the behaviors used be the
+ * ***@***.*** FiltersAggregator}. In general we try to delegate to ***@***.*** Query}
+ * when we don't have a special optimization.
+ */
+public class QueryToFilterAdapter<Q extends Query> {
+ /**
+ * Build a filter for the query against the provided searcher.
+ * <p>
+ * Note: This method rewrites the query against the ***@***.*** IndexSearcher}
+ */
+ public static QueryToFilterAdapter<?> build(IndexSearcher searcher, String key, Query query) throws IOException {
+ query = searcher.rewrite(query);
+ if (query instanceof TermQuery) {
IDK, I cannot quite put my finger on it, but that smells like we are
compensating for some functionality or optimization that is missing in
lucene. I would love somebody closer to lucene to take a look at it.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#69377 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABUXIT5G37KMWAEF2CRMVDTAQJALANCNFSM4YBAZ5UA>
.
|
* Build a predicate that the "compatible" implementation of the | ||
* {@link FiltersAggregator} will use to figure out if the filter matches. | ||
* <p> | ||
* Consumers of this method will always call it with non-negative, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Consumers of this method will always call it with non-negative, | |
* Consumers of the returned predicate will always call it with non-negative, |
I found the pronoun ambiguous in this context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the less abstract QueryToFilterAdapter
considerably more, thank you for changing it. +1 to merge.
I talked privately with a Lucene folk who thought my strategy of "make it here and hope we can cherry pick some of it to Lucene" made sense. He didn't believe we'd be able to cherry pick all of it. He didn't believe cherry picking any of it would be a fast process either. |
There is a whole class of optimizations that we can make in ES because we have |
This speeds up the `terms` agg in a very specific case: 1. It has no child aggregations 2. It has no parent aggregations 3. There are no deleted documents 4. You are not using document level security 5. There is no top level query 6. The field has global ordinals 7. There are less than one thousand distinct terms That is a lot of restirctions! But the speed up pretty substantial because in those cases we can serve the entire aggregation using metadata that lucene precomputes while it builds the index. In a real rally track we have we get a 92% speed improvement, but the index isn't *that* big: ``` | 90th percentile service time | keyword-terms-low-cardinality | 446.031 | 36.7677 | -409.263 | ms | ``` In a rally track with a larger index I ran some tests by hand and the aggregation went from 2200ms to 8ms. Even though there are 7 restrictions on this, I expect it to come into play enough to matter. Restriction 6 just means you are aggregating on a `keyword` field. Or an `ip`. And its fairly common for `keyword`s to have less than a thousand distinct values. Certainly not everywhere, but some places. I expect "cold tier" indices are very very likely not to have deleted documents at all. And the optimization works segment by segment - so it'll save some time on each segment without deleted documents. But more time if the entire index doesn't have any. The optimization builds on elastic#68871 which translates `terms` aggregations against low cardinality fields with global ordinals into a `filters` aggregation. This teaches the `filters` aggregation to recognize when it can get its results from the index metadata. Rather, it creates the infrastructure to make that fairly simple and applies it in the case of the queries generated by the terms aggregation.
This speeds up the `terms` agg in a very specific case: 1. It has no child aggregations 2. It has no parent aggregations 3. There are no deleted documents 4. You are not using document level security 5. There is no top level query 6. The field has global ordinals 7. There are less than one thousand distinct terms That is a lot of restirctions! But the speed up pretty substantial because in those cases we can serve the entire aggregation using metadata that lucene precomputes while it builds the index. In a real rally track we have we get a 92% speed improvement, but the index isn't *that* big: ``` | 90th percentile service time | keyword-terms-low-cardinality | 446.031 | 36.7677 | -409.263 | ms | ``` In a rally track with a larger index I ran some tests by hand and the aggregation went from 2200ms to 8ms. Even though there are 7 restrictions on this, I expect it to come into play enough to matter. Restriction 6 just means you are aggregating on a `keyword` field. Or an `ip`. And its fairly common for `keyword`s to have less than a thousand distinct values. Certainly not everywhere, but some places. I expect "cold tier" indices are very very likely not to have deleted documents at all. And the optimization works segment by segment - so it'll save some time on each segment without deleted documents. But more time if the entire index doesn't have any. The optimization builds on #68871 which translates `terms` aggregations against low cardinality fields with global ordinals into a `filters` aggregation. This teaches the `filters` aggregation to recognize when it can get its results from the index metadata. Rather, it creates the infrastructure to make that fairly simple and applies it in the case of the queries generated by the terms aggregation.
Now that we've backported elastic#69377 to 7.x we can run backwards compatibility tests against it.
Now that we've backported #69377 to 7.x we can run backwards compatibility tests against it.
This optimizes the `date_histogram` agg when there is a single bucket and no sub-aggregations. We expect this to happen from time to time when the buckets are larger than a day because folks often use "daily" indices. This was already fairly fast, but using the metadata makes it 10x faster. Something like 98ms becomes 7.5ms. Nice if you can get it! Like elastic#69377 this optimization will disable itself if you have document level security enabled or are querying a rollup index. Also like elastic#69377 it won't do anything if there is a top level query.
This optimizes the `date_histogram` agg when there is a single bucket and no sub-aggregations. We expect this to happen from time to time when the buckets are larger than a day because folks often use "daily" indices. This was already fairly fast, but using the metadata makes it 10x faster. Something like 98ms becomes 7.5ms. Nice if you can get it! Like #69377 this optimization will disable itself if you have document level security enabled or are querying a rollup index. Also like #69377 it won't do anything if there is a top level query.
This optimizes the `date_histogram` agg when there is a single bucket and no sub-aggregations. We expect this to happen from time to time when the buckets are larger than a day because folks often use "daily" indices. This was already fairly fast, but using the metadata makes it 10x faster. Something like 98ms becomes 7.5ms. Nice if you can get it! Like elastic#69377 this optimization will disable itself if you have document level security enabled or are querying a rollup index. Also like elastic#69377 it won't do anything if there is a top level query.
…2989) This optimizes the `date_histogram` agg when there is a single bucket and no sub-aggregations. We expect this to happen from time to time when the buckets are larger than a day because folks often use "daily" indices. This was already fairly fast, but using the metadata makes it 10x faster. Something like 98ms becomes 7.5ms. Nice if you can get it! Like #69377 this optimization will disable itself if you have document level security enabled or are querying a rollup index. Also like #69377 it won't do anything if there is a top level query.
This speeds up the
terms
agg in a very specific case:That is a lot of restirctions! But the speed up pretty substantial because
in those cases we can serve the entire aggregation using metadata that
lucene precomputes while it builds the index. In a real rally track we
have we get a 92% speed improvement, but the index isn't that big:
In a rally track with a larger index I ran some tests by hand and the
aggregation went from 2200ms to 8ms.
Even though there are 7 restrictions on this, I expect it to come into
play enough to matter. Restriction 6 just means you are aggregating on
a
keyword
field. Or anip
. And its fairly common forkeyword
s tohave less than a thousand distinct values. Certainly not everywhere, but
some places.
I expect "cold tier" indices are very very likely not to have deleted
documents at all. And the optimization works segment by segment - so
it'll save some time on each segment without deleted documents. But more
time if the entire index doesn't have any.
The optimization builds on #68871 which translates
terms
aggregationsagainst low cardinality fields with global ordinals into a
filters
aggregation. This teaches the
filters
aggregation to recognize whenit can get its results from the index metadata. Rather, it creates the
infrastructure to make that fairly simple and applies it in the case of
the queries generated by the terms aggregation.