Filter Editor Suggestions can take forever and drastically increase the load in a cluster. #12692

trevan · 2017-07-06T20:18:43Z

I'm currently upgrading to the latest master and noticed the Filter Editor suggestions from #11375. I looked at the search requests that it makes and there is a BIG problem. I took one of the search requests and ran it against our production data to see how it would fair. I let it run for 3 minutes before I killed it and then took the next several minutes trying to keep the cluster a float. All nodes in our cluster had their load average go from the 20's to the 70's.

I think the main issues are that:

It doesn't have any time filter. It searches across all shards. In my case, we have 6000 shards.
It runs an aggregation for any searchable fields. This includes fields that use "fielddata".

Bargs · 2017-07-06T21:50:55Z

@lukasolson

trevan · 2017-07-07T14:43:25Z

After more looking at it, I think if there is some sort of time filter or shard limit, then this would not be a problem. The aggregation on fields with fielddata is only problematic (for me, at least) when you try and do an aggregation on them across all shards.

lukasolson · 2017-07-07T18:12:49Z

Adding the time filter to the request has pros and cons. If you include the time filter as part of the request for suggestions, then you're assuming the user will always update the time filter prior to adding filters in the filter editor UI. Users might be confused and expect values to show up as suggestions because they know the field contains those values. On the other hand, users might actually expect to only see suggestions for the given time range, and obviously it would improve performance, and would make suggestions more specific and applicable to the data you're actually viewing.

What kind of impact does including the time range in the terms request have for you, @trevan? Also, what would you think about including shard_size in the request to improve performance? Does specifying "shard_size": 10 make any noticeable difference in the performance of the query?

Thanks for helping us out here, @trevan!

trevan · 2017-07-08T00:02:45Z

@lukasolson, there appears to be some sort of cache. I'm limiting the number of indices in my test query now to prevent issues but once I find a set of indices that cause 30 seconds, subsequent queries take only 5-6 seconds. I've tried _cache/clear and request_cache=false but that doesn't seem to do anything.

But if I take the "cached" results that take 5-6 seconds, adding the "shard_size: 10" does nothing. It still stays between 5-6 seconds. Setting "size: 0" also doesn't affect the speed. Removing the leading regex part doesn't affect it either. Adding a timespan (24 hours which shrinks the number of shards to 36 for my test query) takes less than a 1 second.

I then tried using the 24 hour timespan on the full index list. It took 26 seconds to run the first time. The second time was 7 seconds. The third time was 3 seconds. The next several runs were all around 3 seconds.

I wonder if you could edit your PR and add an additional option that would have the suggestions use the timespan. I understand your concerns but I would much rather have some suggestions with a timespan vs no suggestions using your PR.

trevan · 2017-07-10T16:05:25Z

@lukasolson, I ran the query against the full dataset during a down time. It took a little over 4 minutes for the first time. Subsequent runs took around 40-50 seconds.

Adding the "shard_size:10" didn't seem to make any change to the time. It still took around 40 seconds.

Adding a 24 hour limit didn't affect the first run, but subsequent runs it only took 6 seconds.

I tried adding "routing=1" to the request and that brought the original query down to 12-15 seconds while bringing down the 24 hour limit query to 1-2 seconds. We don't use custom routing but it appears to give a decent win.

lukasolson · 2017-07-10T23:14:32Z

Thanks for troubleshooting, @trevan. I'm reaching out to the Elasticsearch team to see if we can preserve this functionality without causing the problems you've run into.

trevan · 2017-07-11T17:51:39Z

@lukasolson, do we want to keep this issue open since it would be nice to have suggestions without a performance hit?

lukasolson · 2017-07-11T20:02:01Z

@trevan Yup, unintentionally closed this by tagging it in the PR.

lukasolson · 2017-07-18T18:07:43Z

@trevan could you try another thing for me with your dataset? In my tests it doesn't seem to make a big impact but a second opinion is always nice 😄

Here's the query it currently makes when using suggestions:

POST health-*/_search
{
  "size": 0,
  "aggs": {
    "suggestions": {
      "terms": {
        "field": "device.search",
        "include": ".*0x.*"
      }
    }
  }
}

Someone suggested we try using sampling. Here's how it might look instead:

POST health-*/_search
{
  "size": 0,
  "aggs": {
    "sample": {
      "sampler": {
        "shard_size": 10
      },
      "aggs": {
        "suggestions": {
          "terms": {
            "field": "device.search",
            "include": ".*0x.*"
          }
        }
      }
    }
  }
}

Would you mind giving that a try and seeing if it has a large impact on your performance?

lukasolson · 2017-07-18T18:09:15Z

Also, if you want to see how they perform without considering the cache, you could always clear the cache for a specific set of indices. Probably wouldn't want to do this on production, but yeah.

https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-clearcache.html

trevan · 2017-07-19T16:47:31Z

@lukasolson, I ran that query. It took it from 40 seconds to 18 seconds. So it was definitely an improvement. I tried it with shard_size: 1 but that didn't decrease it noticeably.

Also, I did try using the _cache/clear as well as request_cache back when I first mentioned the cache issue. If I let elasticsearch use the cache, after a while (about 4-5 requests), the request only takes 1-2 seconds.

filex · 2017-07-25T16:28:25Z

Same problem here. The filter editor was saturating all CPUs in our ES5 cluster with 5 nodes searching for term suggestions.

As soon as a keyword field is selected and I choose the is operator, Kibana sends two XHRs. Both go to /api/kibana/suggestions/values/logstash-* (according to the currently selected index pattern). But they differ in payload:

one is

{"field":"@source_host.keyword"}

the other is

{"query":"","field":"@source_host.keyword"}

They both ran into timeouts for logstash-*. The get shorter when I limit the time with a smaller index pattern like logstash-2017.07.2*.

The query is not only extremely expensive – there are two of them running in parallel. Do we need both queries? When they eventually finish both of them return the same list of 10 terms.

The find-as-you-type style makes it even worse by firing new searches with every key I press.

I agree, that it is important to bring down query times. But could it also be beneficial to limit the number of suggest searches one client may run at a time?

lukasolson · 2017-07-25T16:58:24Z

@filex Good point, there should only be one at a time for sure. Thanks for pointing this out.

lruslan · 2017-07-25T22:27:53Z

Same problem. We have cluster of 4 datanodes with 1.5TB of daily index, 1500 fields.
For us there is no use case for such filter suggestions - users know their data and in most cases filter on values already known in advance.

Would be handy to let administrator disable/enable auto suggestion through the Management -> Advanced Settings, this way we could disable this feature and save cluster from being stressed out.

Also I don't know why but from the moment I open filter editor and fill in complete expression i see approximately 30-50 requests made to single datanode in elasticsearch (i have slow log with logging of all queries enabled)
For example in this example I'm trying to fill in filter expression, without hitting enter/save button:
"field:kubernetes.container_name" +"is"+"slowtest"
You can see resulted amount of queries in the attached query log(our slow log itself parsed by separate ES instance): filterlog.txt

lukasolson · 2017-08-02T15:23:03Z

@trevan Could you play around with adding the terminate_after parameter in your request? I'm thinking about exposing this as an advanced setting in the filter editor suggestions request, and if it's set to 0, disabling suggestions altogether. It can be used in the request like this:

POST health-*/_search?request_cache=false
{
  "size": 0,
  "terminate_after": 100000,
  "aggs": {
    "suggestions": {
      "terms": {
        "field": "device.search",
        "include": ".*0x.*",
        "execution_hint": "map"
      }
    }
  }
}

Basically, this will tell the request to stop after hitting 100000 documents. The higher this number, the better the suggestions, but the longer the query will run.

trevan · 2017-08-03T14:55:24Z

@lukasolson, that gives really impressive results. With that value of terminate_after, the query takes around 1 second. I increased terminate_after by 10x and it took 5 seconds. Setting terminate_after to 0, though, doesn't seem to disable the query. Instead, it just disables the terminate_after setting because it took 40-50 seconds.

I'm not sure about the execution_hint change. With terminate_after set to 1000000, it takes 5 seconds without the execution_hint and 12 seconds with the execution_hint. At 100000, it seems to add a few tenths of a second as well.

lukasolson · 2017-08-04T18:09:28Z

@trevan When I mentioned that setting the value to 0 would disable the suggestions entirely, I meant that we'd have a condition in the code that turned them off, not that the query itself would be disabled.

LoadingZhang · 2019-01-28T07:01:58Z

Still same problem. With ES/Kibana 6.4.0.
We have 16TB+/150+Bil indices in ES, with high cardinality fields such as uid.
If turn on autocomplete feature, then input uid: in search bar, kibana would send this query to ES:

{
  "size": 0,
  "timeout": "1s",
  "terminate_after": 100000,
  "query": {
    "match_all": {
      "boost": 1.0
    }
  },
  "aggregations": {
    "suggestions": {
      "terms": {
        "field": "uid",
        "size": 10,
        "shard_size": 10,
        "min_doc_count": 1,
        "shard_min_doc_count": 0,
        "show_term_doc_count_error": false,
        "execution_hint": "map",
        "order": [
          {
            "_count": "desc"
          },
          {
            "_key": "asc"
          }
        ],
        "include": ".*"
      }
    }
  }
}

After 10+min, It will load 160GB+ fielddata to memory, seems loading fielddata can't terminate property.

LoadingZhang · 2019-01-28T11:09:38Z

And this is hot thread:

10/10 snapshots sharing following 41 elements
       org.apache.lucene.index.OrdinalMap.build(OrdinalMap.java:168)
       org.apache.lucene.index.OrdinalMap.build(OrdinalMap.java:147)
       org.elasticsearch.index.fielddata.ordinals.GlobalOrdinalsBuilder.build(GlobalOrdinalsBuilder.java:65)
       org.elasticsearch.index.fielddata.plain.SortedSetDVOrdinalsIndexFieldData.localGlobalDirect(SortedSetDVOrdinalsIndexFieldData.java:130)
       org.elasticsearch.index.fielddata.plain.SortedSetDVOrdinalsIndexFieldData.localGlobalDirect(SortedSetDVOrdinalsIndexFieldData.java:47)
       org.elasticsearch.indices.fielddata.cache.IndicesFieldDataCache$IndexFieldCache.lambda$load$1(IndicesFieldDataCache.java:165)
       org.elasticsearch.indices.fielddata.cache.IndicesFieldDataCache$IndexFieldCache$$Lambda$3267/1059313702.load(Unknown Source)
       org.elasticsearch.common.cache.Cache.computeIfAbsent(Cache.java:433)
       org.elasticsearch.indices.fielddata.cache.IndicesFieldDataCache$IndexFieldCache.load(IndicesFieldDataCache.java:162)
       org.elasticsearch.index.fielddata.plain.SortedSetDVOrdinalsIndexFieldData.loadGlobal(SortedSetDVOrdinalsIndexFieldData.java:118)
       org.elasticsearch.search.aggregations.support.ValuesSource$Bytes$WithOrdinals$FieldData.globalOrdinalsValues(ValuesSource.java:151)
       org.elasticsearch.search.aggregations.support.ValuesSource$Bytes$WithOrdinals.globalMaxOrd(ValuesSource.java:124)
       org.elasticsearch.search.aggregations.bucket.terms.TermsAggregatorFactory.getMaxOrd(TermsAggregatorFactory.java:215)
       org.elasticsearch.search.aggregations.bucket.terms.TermsAggregatorFactory.doCreateInternal(TermsAggregatorFactory.java:137)
       org.elasticsearch.search.aggregations.support.ValuesSourceAggregatorFactory.createInternal(ValuesSourceAggregatorFactory.java:55)
       org.elasticsearch.search.aggregations.AggregatorFactory.create(AggregatorFactory.java:216)
       org.elasticsearch.search.aggregations.AggregatorFactories.createTopLevelAggregators(AggregatorFactories.java:216)
       org.elasticsearch.search.aggregations.AggregationPhase.preProcess(AggregationPhase.java:55)
       org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:105)
       org.elasticsearch.indices.IndicesService.lambda$loadIntoContext$17(IndicesService.java:1184)
       org.elasticsearch.indices.IndicesService$$Lambda$3192/791757691.accept(Unknown Source)
       org.elasticsearch.indices.IndicesService.lambda$cacheShardLevelResult$18(IndicesService.java:1237)
       org.elasticsearch.indices.IndicesService$$Lambda$3194/1117148458.get(Unknown Source)
       org.elasticsearch.indices.IndicesRequestCache$Loader.load(IndicesRequestCache.java:160)
       org.elasticsearch.indices.IndicesRequestCache$Loader.load(IndicesRequestCache.java:143)
       org.elasticsearch.common.cache.Cache.computeIfAbsent(Cache.java:433)
       org.elasticsearch.indices.IndicesRequestCache.getOrCompute(IndicesRequestCache.java:116)
       org.elasticsearch.indices.IndicesService.cacheShardLevelResult(IndicesService.java:1243)
       org.elasticsearch.indices.IndicesService.loadIntoContext(IndicesService.java:1183)
       org.elasticsearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:322)
       org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:357)
       org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:333)
       org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:329)
       org.elasticsearch.search.SearchService$3.doRun(SearchService.java:1019)
       org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:723)
       org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
       org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:41)
       org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
       java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
       java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       java.lang.Thread.run(Thread.java:748)

looks like es is building ordinal map

cjcenizal added :Discovery Feature:Filters labels Jul 6, 2017

Bargs assigned lukasolson Jul 6, 2017

lukasolson mentioned this issue Jul 7, 2017

Make filter editor suggestions opt-in #12710

Merged

lukasolson added the PR sent label Jul 7, 2017

lukasolson closed this as completed in #12710 Jul 11, 2017

lukasolson reopened this Jul 11, 2017

lukasolson removed the PR sent label Jul 18, 2017

lukasolson mentioned this issue Jul 19, 2017

Display scaled time interval for Date Histogram when scaling is forced #12816

Closed

alexfrancoeur mentioned this issue Jul 25, 2017

Autocomplete in the query bar #12315

Closed

lukasolson mentioned this issue Aug 2, 2017

Turn filter editor suggestions back on by default #13286

Closed

lukasolson mentioned this issue Aug 7, 2017

Re-enable filter editor suggestions #13376

Merged

lukasolson added the PR sent label Aug 8, 2017

lukasolson closed this as completed in #13376 Aug 25, 2017

lukasolson mentioned this issue Oct 8, 2024

[Unified Search] Value autocomplete without field name specified #193608

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Filter Editor Suggestions can take forever and drastically increase the load in a cluster. #12692

Filter Editor Suggestions can take forever and drastically increase the load in a cluster. #12692

trevan commented Jul 6, 2017

Bargs commented Jul 6, 2017

trevan commented Jul 7, 2017

lukasolson commented Jul 7, 2017

trevan commented Jul 8, 2017

trevan commented Jul 10, 2017

lukasolson commented Jul 10, 2017

trevan commented Jul 11, 2017

lukasolson commented Jul 11, 2017

lukasolson commented Jul 18, 2017

lukasolson commented Jul 18, 2017

trevan commented Jul 19, 2017

filex commented Jul 25, 2017

lukasolson commented Jul 25, 2017

lruslan commented Jul 25, 2017

lukasolson commented Aug 2, 2017 •

edited

Loading

trevan commented Aug 3, 2017

lukasolson commented Aug 4, 2017

LoadingZhang commented Jan 28, 2019

LoadingZhang commented Jan 28, 2019

Filter Editor Suggestions can take forever and drastically increase the load in a cluster. #12692

Filter Editor Suggestions can take forever and drastically increase the load in a cluster. #12692

Comments

trevan commented Jul 6, 2017

Bargs commented Jul 6, 2017

trevan commented Jul 7, 2017

lukasolson commented Jul 7, 2017

trevan commented Jul 8, 2017

trevan commented Jul 10, 2017

lukasolson commented Jul 10, 2017

trevan commented Jul 11, 2017

lukasolson commented Jul 11, 2017

lukasolson commented Jul 18, 2017

lukasolson commented Jul 18, 2017

trevan commented Jul 19, 2017

filex commented Jul 25, 2017

lukasolson commented Jul 25, 2017

lruslan commented Jul 25, 2017

lukasolson commented Aug 2, 2017 • edited Loading

trevan commented Aug 3, 2017

lukasolson commented Aug 4, 2017

LoadingZhang commented Jan 28, 2019

LoadingZhang commented Jan 28, 2019

lukasolson commented Aug 2, 2017 •

edited

Loading