Terms agg with zero-result filtered query searches whole index #8406

avleen · 2014-11-09T15:17:43Z

We're using a filtered query followed by a terms agg, as so:

{
  "query":{
    "filtered":{
      "query":{ 
        "query_string":{
          "query": "type:web_info_log AND log_namespace:FOO AND message:\"bar\""
        }
      },
      "filter":{
        "bool":{
          "must":[{
            "range":{
              "@timestamp":{
                "from":1415464349189,"to":1415465564077
              }
            }
          }]
        }
      }
    }
  },
  "aggs":{
    "histogram_aggregation":{
      "histogram":{
        "interval":10,"field":"@timestamp"
      },
      "aggs":{
        "terms_bucket":{
          "terms":{
            "field":"log_message","execution_hint":"map"
          }
        }
      }
    }
  }
}

Normally the filtered query returns some small number of requests (under 100) and everything is fine.
However, when it returns zero results, the terms aggregations runs on the entire index (in our case, ~4bn docs), causing repeated GCs which take many hours and make the cluster unusable.
It should, in fact, not run at all because it got zero results from the filtered query.

The text was updated successfully, but these errors were encountered:

avleen · 2014-11-09T15:21:13Z

This is on Elasticsearch 1.3.2.
If this is repeatable for others, any chance the fix could make it into 1.4? It's brought our cluster down repeatedly in the last 2 weeks and I can reproduce it here easily.

jpountz · 2014-11-10T10:16:46Z

@avleen This looks wrong indeed. I just tried to reproduce the issue with no success, can you please help me understand a bit more what happens:

do you actually get a doc_count for every term in the response or did you infer that it ran on all docs because of CPU or I/O activity?
can you confirm you are not setting min_doc_count to 0? (this option combined with the map execution hint could make elasticsearch search the whole index).
if you can still reproduce this issue, could you try to capture the output of the hot threads while the query is running so that I can see what is calling the aggregation.

avleen · 2014-11-10T17:54:33Z

Hi @jpountz
Here're the results from the query when searching over 2 days of indices, and running the agg on a field with very low cardinality (maybe a few dozen unique entries):

{
   "took": 6148,
   "timed_out": false,
   "_shards": {
      "total": 78,
      "successful": 78,
      "failed": 0
   },
   "hits": {
      "total": 0,
      "max_score": null,
      "hits": []
   },
   "aggregations": {
      "terms": {
         "buckets": []
      }
   }
}

When I run it against our log_message field, which is not_analyzed and has a ton of large strings (up to 10kb long) and the vast majority of the messages are unique:

{
   "took": 184340,
   "timed_out": false,
   "_shards": {
      "total": 39,
      "successful": 39,
      "failed": 0
   },
   "hits": {
      "total": 0,
      "max_score": null,
      "hits": []
   },
   "aggregations": {
      "terms": {
         "buckets": []
      }
   }
}

jpountz · 2014-11-10T18:57:50Z

Hmm this might be due to fielddata loading. This is expected since we need to load fielddata before knowing what matches but I will check that we are not loading global ordinals with the "map" execution hint since they are not required.

Can you confirm that only the first few queries are slow?

avleen · 2014-11-10T21:21:55Z

This seems to be pretty slow all the time. It's hard to confirm because running the query takes down our production cluster for several minutes. It just happened again and one node fell out of the cluster because it was trying to do garbage collection for several minutes :-)

Running the filtered query part is really fast. It rarely takes more than 5 or 10s for ~5bn docs.
Running the aggs on top of that, even when are zero results from the query, takes many minutes.
So in this case, why is fielddata being loaded when there were zero results?

jpountz · 2014-11-12T09:10:35Z

I can try to explain a bit more how running works on a shard level: Elasticsearch creates an object called BulkScorer which is responsible for finding matching documents and optionally scoring them. This BulkScorer takes a Collector which is basically a callback that gets called for every matching document. Examples of collectors include TotalHitCountCollector (to just count the number of matches), TopDocsCollector (to compute the top matching documents) or AggregationsCollector (to run aggregations). For instance your request would use both a TopDocsCollector and a AggregationsCollector. When we create the collector for aggregations, we need to prepare everything to get ready for collecting matches, which includes loading fielddata. But at this point we don't know if the query is going to match anything yet. This is why even queries that do not match any hits load field data.

jpountz · 2014-11-12T09:27:48Z

I just checked that the map execution hint does not load global ordinals, as expected. So the only idea I have is that these slow times for queries that match no queries is due to fielddata loading.

However, I'm still concerned that you mentioned that everything works fine when the query returns ~100 matches. Do you mean that requests that match no documents are more harmful to your cluster than those queries that match few documents?

avleen · 2014-11-24T16:54:20Z

It seemed that way but don't take it as gospel. It's possible that during
those instances, field data was already cached.

Could the field data loading be delayed until we know what documents
actually need to be loaded in?
It seems problematic is always load in huge amounts of field data when you
are running a filtered query to reduce the number of documents for
aggregation.

On Wed, Nov 12, 2014, 01:27 Adrien Grand [email protected] wrote:

I just checked that the map execution hint does not load global ordinals,
as expected. So the only idea I have is that these slow times for queries
that match no queries is due to fielddata loading.

However, I'm still concerned that you mentioned that everything works fine
when the query returns ~100 matches. Do you mean that requests that match
no documents are more harmful to your cluster than those queries that match
few documents?

Reply to this email directly or view it on GitHub
#8406 (comment)
.

clintongormley · 2014-11-25T13:10:01Z

@avleen If you want to use a field for sorting/aggregating, then you need it in fielddata. Perhaps this query doesn't match but the next one will. Fielddata is loaded from disk by uninverting the index, so it has to happen in one go. It would be much much much slower if we were to load only matching docs.

Instead, use doc values for this field. Then the memory requirements go away.

avleen · 2014-11-25T13:15:18Z

We'll give doc values another site Clinton. When we tested them a while
back,we traded lower memory for significantly higher disk IO which slowed
indexing.
We'll test to see if the recent perf improvements with doc values make them
more viable now :)

Thanks!

On Tue, Nov 25, 2014, 14:10 Clinton Gormley [email protected]
wrote:

@avleen https://github.com/avleen If you want to use a field for
sorting/aggregating, then you need it in fielddata. Perhaps this query
doesn't match but the next one will. Fielddata is loaded from disk by
uninverting the index, so it has to happen in one go. It would be much much
much slower if we were to load only matching docs.

Instead, use doc values for this field. Then the memory requirements go
away.

Reply to this email directly or view it on GitHub
#8406 (comment)
.

clintongormley · 2014-11-25T16:54:13Z

@avleen Make sure you aren't using them on the message.raw field (in fact, just disable the message.raw field completely). Also, in Lucene 5, doc values merging is better than it is today, which should help as well. Would be interested in any stats you can give us about the effect of doc values on your indexing.

clintongormley · 2014-12-31T17:22:47Z

Nothing more to do here. Closing in favour of #8312

jpountz self-assigned this Nov 10, 2014

clintongormley closed this as completed Dec 31, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Terms agg with zero-result filtered query searches whole index #8406

Terms agg with zero-result filtered query searches whole index #8406

avleen commented Nov 9, 2014

avleen commented Nov 9, 2014

jpountz commented Nov 10, 2014

avleen commented Nov 10, 2014

jpountz commented Nov 10, 2014

avleen commented Nov 10, 2014

jpountz commented Nov 12, 2014

jpountz commented Nov 12, 2014

avleen commented Nov 24, 2014

clintongormley commented Nov 25, 2014

avleen commented Nov 25, 2014

clintongormley commented Nov 25, 2014

clintongormley commented Dec 31, 2014

Terms agg with zero-result filtered query searches whole index #8406

Terms agg with zero-result filtered query searches whole index #8406

Comments

avleen commented Nov 9, 2014

avleen commented Nov 9, 2014

jpountz commented Nov 10, 2014

avleen commented Nov 10, 2014

jpountz commented Nov 10, 2014

avleen commented Nov 10, 2014

jpountz commented Nov 12, 2014

jpountz commented Nov 12, 2014

avleen commented Nov 24, 2014

clintongormley commented Nov 25, 2014

avleen commented Nov 25, 2014

clintongormley commented Nov 25, 2014

clintongormley commented Dec 31, 2014