Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up terms agg when alone #69377

Merged
merged 11 commits into from
Feb 25, 2021
Merged

Conversation

nik9000
Copy link
Member

@nik9000 nik9000 commented Feb 22, 2021

This speeds up the terms agg in a very specific case:

  1. It has no child aggregations
  2. It has no parent aggregations
  3. There are no deleted documents
  4. You are not using document level security
  5. There is no top level query
  6. The field has global ordinals
  7. There are less than one thousand distinct terms

That is a lot of restirctions! But the speed up pretty substantial because
in those cases we can serve the entire aggregation using metadata that
lucene precomputes while it builds the index. In a real rally track we
have we get a 92% speed improvement, but the index isn't that big:

| 90th percentile service time | keyword-terms-low-cardinality |     446.031 |     36.7677 | -409.263 |     ms |

In a rally track with a larger index I ran some tests by hand and the
aggregation went from 2200ms to 8ms.

Even though there are 7 restrictions on this, I expect it to come into
play enough to matter. Restriction 6 just means you are aggregating on
a keyword field. Or an ip. And its fairly common for keywords to
have less than a thousand distinct values. Certainly not everywhere, but
some places.

I expect "cold tier" indices are very very likely not to have deleted
documents at all. And the optimization works segment by segment - so
it'll save some time on each segment without deleted documents. But more
time if the entire index doesn't have any.

The optimization builds on #68871 which translates terms aggregations
against low cardinality fields with global ordinals into a filters
aggregation. This teaches the filters aggregation to recognize when
it can get its results from the index metadata. Rather, it creates the
infrastructure to make that fairly simple and applies it in the case of
the queries generated by the terms aggregation.

This speeds up the `terms` agg in a very specific case:
1. It has no child aggregations
2. It has no parent aggregations
3. There are no deleted documents
4. You are not using document level security
5. There is no top level query
6. The field has global ordinals
7. There are less than one thousand distinct terms

That is a lot of restirctions! But the speed up pretty substantial because
in those cases we can serve the entire aggregation using metadata that
lucene precomputes while it builds the index. In a real rally track we
have we get a 92% speed improvement, but the index isn't *that* big:

```
| 90th percentile service time | keyword-terms-low-cardinality |     446.031 |     36.7677 | -409.263 |     ms |
```

In a rally track with a larger index I ran some tests by hand and the
aggregation went from 2200ms to 8ms.

Even though there are 7 restrictions on this, I expect it to come into
play enough to matter. Restriction 6 just means you are aggregating on
a `keyword` field. Or an `ip`. And its fairly common for `keyword`s to
have less than a thousand distinct values. Certainly not everywhere, but
some places.

I expect "cold tier" indices are very very likely not to have deleted
documents at all. And the optimization works segment by segment - so
it'll save some time on each segment without deleted documents. But more
time if the entire index doesn't have any.

The optimization builds on elastic#68871 which translates `terms` aggregations
against low cardinality fields with global ordinals into a `filters`
aggregation. This teaches the `filters` aggregation to recognize when
it can get its results from the index metadata. Rather, it creates the
infrastructure to make that fairly simple and applies it in the case of
the queries generated by the terms aggregation.
@elasticmachine elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Feb 22, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

Copy link
Member Author

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check out the profile output:

$ curl -s -HContent-Type:application/json -uelastic:password localhost:9200/_search?pretty -d'{
  "size": 0,
  "profile": true,
  "aggs": {
    "t": {
      "terms": {
        "field": "rate_code_id"
      }
    }
  }
}'
{
  "took" : 17,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10000,
      "relation" : "gte"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "t" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "1",
          "doc_count" : 160957580
        },
        {
          "key" : "2",
          "doc_count" : 3141764
        },
        {
          "key" : "5",
          "doc_count" : 904379
        },
        {
          "key" : "3",
          "doc_count" : 269558
        },
        {
          "key" : "4",
          "doc_count" : 65655
        },
        {
          "key" : "99",
          "doc_count" : 5661
        },
        {
          "key" : "6",
          "doc_count" : 2095
        }
      ]
    }
  },
  "profile" : {
    "shards" : [
      {
        "id" : "[tG5FPmcfQ6aI9KXq9V9cTg][nyc_taxis][0]",
        "searches" : [
          {
            "query" : [
              {
                "type" : "MatchAllDocsQuery",
                "description" : "*:*",
                "time_in_nanos" : 4265,
                "breakdown" : {
                  "set_min_competitive_score_count" : 0,
                  "match_count" : 0,
                  "shallow_advance_count" : 0,
                  "set_min_competitive_score" : 0,
                  "next_doc" : 0,
                  "match" : 0,
                  "next_doc_count" : 0,
                  "score_count" : 0,
                  "compute_max_score_count" : 0,
                  "compute_max_score" : 0,
                  "advance" : 0,
                  "advance_count" : 0,
                  "score" : 0,
                  "build_scorer_count" : 0,
                  "create_weight" : 4265,
                  "shallow_advance" : 0,
                  "create_weight_count" : 1,
                  "build_scorer" : 0
                }
              }
            ],
            "rewrite_time" : 14563,
            "collector" : [
              {
                "name" : "MultiCollector",
                "reason" : "search_multi",
                "time_in_nanos" : 9553931,
                "children" : [
                  {
                    "name" : "EarlyTerminatingCollector",
                    "reason" : "search_count",
                    "time_in_nanos" : 170074
                  },
                  {
                    "name" : "ProfilingAggregator: [org.elasticsearch.search.aggregations.bucket.terms.StringTermsAggregatorFromFilters@3bdd7fd1]",
                    "reason" : "aggregation",
                    "time_in_nanos" : 9155907
                  }
                ]
              }
            ]
          }
        ],
        "aggregations" : [
          {
            "type" : "StringTermsAggregatorFromFilters",
            "description" : "t",
            "time_in_nanos" : 9181873,
            "breakdown" : {
              "reduce" : 0,
              "post_collection_count" : 1,
              "build_leaf_collector" : 9060609,
              "build_aggregation" : 114667,
              "build_aggregation_count" : 1,
              "build_leaf_collector_count" : 29,
              "post_collection" : 2672,
              "initialize" : 3925,
              "initialize_count" : 1,
              "reduce_count" : 0,
              "collect" : 0,
              "collect_count" : 0
            },
            "debug" : {
              "delegate" : "FiltersAggregator.FilterByFilter",
              "delegate_debug" : {
                "segments_with_doc_count_field" : 0,
                "segments_with_deleted_docs" : 0,
                "filters" : [
                  {
                    "results_from_metadata" : 29,
                    "query" : "rate_code_id:1",
                    "scorers_prepared_while_estimating_cost" : 0,
                    "type" : "term"
                  },
                  {
                    "results_from_metadata" : 29,
                    "query" : "rate_code_id:2",
                    "scorers_prepared_while_estimating_cost" : 0,
                    "type" : "term"
                  },
                  {
                    "results_from_metadata" : 29,
                    "query" : "rate_code_id:3",
                    "scorers_prepared_while_estimating_cost" : 0,
                    "type" : "term"
                  },
                  {
                    "results_from_metadata" : 29,
                    "query" : "rate_code_id:4",
                    "scorers_prepared_while_estimating_cost" : 0,
                    "type" : "term"
                  },
                  {
                    "results_from_metadata" : 29,
                    "query" : "rate_code_id:5",
                    "scorers_prepared_while_estimating_cost" : 0,
                    "type" : "term"
                  },
                  {
                    "results_from_metadata" : 29,
                    "query" : "rate_code_id:6",
                    "scorers_prepared_while_estimating_cost" : 0,
                    "type" : "term"
                  },
                  {
                    "results_from_metadata" : 29,
                    "query" : "rate_code_id:99",
                    "scorers_prepared_while_estimating_cost" : 0,
                    "type" : "term"
                  }
                ]
              }
            }
          }
        ]
      }
    ]
  }
}

this.profiling = context.profiling();
}

/**
* Estimate the number of documents that this aggregation must visit. We'll
* stop counting once we've passed {@code maxEstimatedCost} if we aren't profiling.
*/
@SuppressWarnings("resource") // We're not in change of anything Closeable
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I was porting this to use the new QueryToFilterAdatapter I noticed a problem: previously this would abort estimating the cost if the cost went over the max which is probably good. And when you profile it wouldn't abort the estimating so that you could get back what the "real total" estimate would be. Also probably good. But we timed how long the estimate took. And previously the timer included how long it took to estimate without the early abort. That felt wrong. Now we stop the timer as soon as we pass the limit. Or when we return.

/**
* Special case when the filter can't match anything.
*/
private static class MatchNoneQueryToFilterAdapter extends QueryToFilterAdapter {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't strictly need this for the PR but it feels like a nice example of how you'd implement this for a filter that can come purely from metadata.

/**
* Filter that matches every document.
*/
private static class MatchAllQueryToFilterAdapter extends QueryToFilterAdapter {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And you don't strictly need this one either because folks won't usually write match_all in their filters agg. But, again, its a fairly compact way of looking at how you'd read from metadata.

long count(LeafReaderContext ctx, FiltersAggregator.Counter counter, Bits live) throws IOException {
if (countCanUseMetadata(counter, live)) {
resultsFromMetadata++;
return ctx.reader().docFreq(query().getTerm());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This right here is the magic.

We certainly could have done with without adding this whole abstraction. But the abstraction really helps me reason about what we want from queries and having a subclass per query that has a fancy optimization helps me to identify which fancy optimizations we have. And it helps us track how they are being applied when you profile it.

*/
public void testMatchAllOnFilteredIndex() throws IOException {
AggregationBuilder builder = new FiltersAggregationBuilder("test", new KeyedFilter("q1", new MatchAllQueryBuilder()));
try (Directory directory = newDirectory()) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is kind of a mess because it doesn't fit into how the agg tests usually work. But this is a fairly rare thing.

@nik9000
Copy link
Member Author

nik9000 commented Feb 22, 2021

That failure looks real to me!

@nik9000
Copy link
Member Author

nik9000 commented Feb 22, 2021

That failure looks real to me!

All better now.

Copy link
Member

@not-napoleon not-napoleon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea behind QueryToFilterAdapter as an abstraction in general, but I think it's too abstract right now. I don't think adding the second layer CommonQueryToFilterAdapter just to avoid having an unused query parameter on the match all & match none cases is worth it, and I'd rather see one or zero abstract classes. Willing to discuss more.

this.otherBucketKey = otherBucketKey;
}

List<QueryToFilterAdapter> filters() {
return filters;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this mutable on purpose? Does it have to be?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, I see now that we use List.copyOf in the ctor to make this immutable before we get to this point.

/**
* Adapts a Lucene {@link Query} to the behaviors used be the
* {@link FiltersAggregator}. In general we try to delegate to {@linkplain Query}
* when we don't have
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you forgot to finish this comment, and I'm not sure what you meant to say

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I do that.

* {@link FiltersAggregator}. In general we try to delegate to {@linkplain Query}
* when we don't have
*/
public abstract class QueryToFilterAdapter {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we'd talked about making the base class concrete and using it for the default implementation. What changed your mind?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two things bothered me about it:

  1. The base class would need a <> operator which is kind of a pain because it's really just an implementation detail.
  2. MatchNone doesn't really want any of the "default" stuff. It'd fully override it. MatchAll is pretty similar. But those are the only two like that.

I'd certainly be willing to try and do all the merging and see what it looks like.


/**
* Is it safe to use index metadata like
* {@link IndexReader#docFreq} or {@link IndexReader#maxDoc} to count. the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* {@link IndexReader#docFreq} or {@link IndexReader#maxDoc} to count. the
* {@link IndexReader#docFreq} or {@link IndexReader#maxDoc} to count the

/**
* Is it safe to use index metadata like
* {@link IndexReader#docFreq} or {@link IndexReader#maxDoc} to count. the
* number of matching documents.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would benefit from explaining what "safe" means in this context. In particular, I don't think it's obvious what Bits live is or why it should be null here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

*/
abstract QueryToFilterAdapter union(Query extraQuery) throws IOException;

abstract IntPredicate matchingDocIds(LeafReaderContext ctx) throws IOException;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should have some javadoc to indicate how to implement it, as should count

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

/**
* Abstract superclass of filters that delegates everything to the query.
*/
private abstract static class CommonQueryToFilterAdapter<Q extends Query> extends QueryToFilterAdapter {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two layers of abstraction seems excessive here. Is there ever a case where we'd want something to accept a CommonQueryToFilterAdapter but not a QueryToFilterAdapter? It looks to me like this just exists for code reuse, in which case I'd rather it wasn't abstract, just make this the default implementation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - its just for code reuse. I'll see what making it the default implementation looks like.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! If it looks worse, we can go with this, but I find the two abstract base classes confusing, personally.

* Note: This method rewrites the query against the {@link IndexSearcher}.
*/
QueryToFilterAdapter<?> union(Query extraQuery) throws IOException {
extraQuery = searcher().rewrite(extraQuery);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method seems like a limited duplication of logic of boolean query rewrite. What's the main reason for not just wrapping this into boolean with 2 must closes and allow it to figure out optimization?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BooleanQuery doesn't know how to merge point range queries and that speeds up when we rewrite date_histogram into filters by a factor of about 2. Its something @iverase and I've talked about some. And I've talked about it with @romseygeek. I think at some point Lucene'll be able to do more of this sort of thing, but until then, here we are.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I think a TODO here with a link to the issue would help future readers.

*/
public static QueryToFilterAdapter<?> build(IndexSearcher searcher, String key, Query query) throws IOException {
query = searcher.rewrite(query);
if (query instanceof TermQuery) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IDK, I cannot quite put my finger on it, but that smells like we are compensating for some functionality or optimization that is missing in lucene. I would love somebody closer to lucene to take a look at it.

@nik9000
Copy link
Member Author

nik9000 commented Feb 23, 2021 via email

* Build a predicate that the "compatible" implementation of the
* {@link FiltersAggregator} will use to figure out if the filter matches.
* <p>
* Consumers of this method will always call it with non-negative,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Consumers of this method will always call it with non-negative,
* Consumers of the returned predicate will always call it with non-negative,

I found the pronoun ambiguous in this context.

Copy link
Member

@not-napoleon not-napoleon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the less abstract QueryToFilterAdapter considerably more, thank you for changing it. +1 to merge.

@nik9000
Copy link
Member Author

nik9000 commented Feb 25, 2021

I'll ping some lucene folk to review too.

I talked privately with a Lucene folk who thought my strategy of "make it here and hope we can cherry pick some of it to Lucene" made sense. He didn't believe we'd be able to cherry pick all of it. He didn't believe cherry picking any of it would be a fast process either.

@nik9000
Copy link
Member Author

nik9000 commented Feb 25, 2021

There is a whole class of optimizations that we can make in ES because we have MappedFieldType which guarantees certain data shapes that are not possible in Lucene right now.

@nik9000 nik9000 merged commit 4ffdad3 into elastic:master Feb 25, 2021
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Feb 25, 2021
This speeds up the `terms` agg in a very specific case:
1. It has no child aggregations
2. It has no parent aggregations
3. There are no deleted documents
4. You are not using document level security
5. There is no top level query
6. The field has global ordinals
7. There are less than one thousand distinct terms

That is a lot of restirctions! But the speed up pretty substantial because
in those cases we can serve the entire aggregation using metadata that
lucene precomputes while it builds the index. In a real rally track we
have we get a 92% speed improvement, but the index isn't *that* big:

```
| 90th percentile service time | keyword-terms-low-cardinality |     446.031 |     36.7677 | -409.263 |     ms |
```

In a rally track with a larger index I ran some tests by hand and the
aggregation went from 2200ms to 8ms.

Even though there are 7 restrictions on this, I expect it to come into
play enough to matter. Restriction 6 just means you are aggregating on
a `keyword` field. Or an `ip`. And its fairly common for `keyword`s to
have less than a thousand distinct values. Certainly not everywhere, but
some places.

I expect "cold tier" indices are very very likely not to have deleted
documents at all. And the optimization works segment by segment - so
it'll save some time on each segment without deleted documents. But more
time if the entire index doesn't have any.

The optimization builds on elastic#68871 which translates `terms` aggregations
against low cardinality fields with global ordinals into a `filters`
aggregation. This teaches the `filters` aggregation to recognize when
it can get its results from the index metadata. Rather, it creates the
infrastructure to make that fairly simple and applies it in the case of
the queries generated by the terms aggregation.
nik9000 added a commit that referenced this pull request Mar 1, 2021
This speeds up the `terms` agg in a very specific case:
1. It has no child aggregations
2. It has no parent aggregations
3. There are no deleted documents
4. You are not using document level security
5. There is no top level query
6. The field has global ordinals
7. There are less than one thousand distinct terms

That is a lot of restirctions! But the speed up pretty substantial because
in those cases we can serve the entire aggregation using metadata that
lucene precomputes while it builds the index. In a real rally track we
have we get a 92% speed improvement, but the index isn't *that* big:

```
| 90th percentile service time | keyword-terms-low-cardinality |     446.031 |     36.7677 | -409.263 |     ms |
```

In a rally track with a larger index I ran some tests by hand and the
aggregation went from 2200ms to 8ms.

Even though there are 7 restrictions on this, I expect it to come into
play enough to matter. Restriction 6 just means you are aggregating on
a `keyword` field. Or an `ip`. And its fairly common for `keyword`s to
have less than a thousand distinct values. Certainly not everywhere, but
some places.

I expect "cold tier" indices are very very likely not to have deleted
documents at all. And the optimization works segment by segment - so
it'll save some time on each segment without deleted documents. But more
time if the entire index doesn't have any.

The optimization builds on #68871 which translates `terms` aggregations
against low cardinality fields with global ordinals into a `filters`
aggregation. This teaches the `filters` aggregation to recognize when
it can get its results from the index metadata. Rather, it creates the
infrastructure to make that fairly simple and applies it in the case of
the queries generated by the terms aggregation.
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Mar 1, 2021
Now that we've backported elastic#69377 to 7.x we can run backwards
compatibility tests against it.
nik9000 added a commit that referenced this pull request Mar 1, 2021
Now that we've backported #69377 to 7.x we can run backwards
compatibility tests against it.
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Mar 26, 2021
This optimizes the `date_histogram` agg when there is a single bucket
and no sub-aggregations. We expect this to happen from time to time when
the buckets are larger than a day because folks often use "daily"
indices.

This was already fairly fast, but using the metadata makes it 10x
faster. Something like 98ms becomes 7.5ms. Nice if you can get it!

Like elastic#69377 this optimization will disable itself if you have document
level security enabled or are querying a rollup index. Also like elastic#69377
it won't do anything if there is a top level query.
@jrodewig jrodewig mentioned this pull request Apr 27, 2021
nik9000 added a commit that referenced this pull request May 12, 2021
This optimizes the `date_histogram` agg when there is a single bucket
and no sub-aggregations. We expect this to happen from time to time when
the buckets are larger than a day because folks often use "daily"
indices.

This was already fairly fast, but using the metadata makes it 10x
faster. Something like 98ms becomes 7.5ms. Nice if you can get it!

Like #69377 this optimization will disable itself if you have document
level security enabled or are querying a rollup index. Also like #69377
it won't do anything if there is a top level query.
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request May 12, 2021
This optimizes the `date_histogram` agg when there is a single bucket
and no sub-aggregations. We expect this to happen from time to time when
the buckets are larger than a day because folks often use "daily"
indices.

This was already fairly fast, but using the metadata makes it 10x
faster. Something like 98ms becomes 7.5ms. Nice if you can get it!

Like elastic#69377 this optimization will disable itself if you have document
level security enabled or are querying a rollup index. Also like elastic#69377
it won't do anything if there is a top level query.
nik9000 added a commit that referenced this pull request May 12, 2021
…2989)

This optimizes the `date_histogram` agg when there is a single bucket
and no sub-aggregations. We expect this to happen from time to time when
the buckets are larger than a day because folks often use "daily"
indices.

This was already fairly fast, but using the metadata makes it 10x
faster. Something like 98ms becomes 7.5ms. Nice if you can get it!

Like #69377 this optimization will disable itself if you have document
level security enabled or are querying a rollup index. Also like #69377
it won't do anything if there is a top level query.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants