Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The KQL autocomplete values can take long time #46054

Closed
AlonaNadler opened this issue Sep 18, 2019 · 27 comments
Closed

The KQL autocomplete values can take long time #46054

AlonaNadler opened this issue Sep 18, 2019 · 27 comments
Assignees
Labels
Feature:KQL KQL feedback_needed impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. loe:small Small Level of Effort

Comments

@AlonaNadler
Copy link

Currently, KQL autocompletes value suggestions can be slow to appear, especially when there is a lot of data to query to get these possible values. This results in a really slow and frustrating autocomplete.
One of the possible reasons (might be others) for the slowness is that the autocomplete doesn't take into account the time range the users are looking at and suggest all possible values.

Perhaps we can filter the values based on the time range the users are looking at when querying.
@Bargs that was raised as an issue in SIEM and APM in the past, would be good to think how we can improve that in order to keep this implementation consistent in Kibana and not have each solution making their own implementation

@Bargs @TinaHeiligers

@AlonaNadler AlonaNadler added the Team:Visualizations Visualization editors, elastic-charts and infrastructure label Sep 18, 2019
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-app

@Bargs
Copy link
Contributor

Bargs commented Sep 18, 2019

@AlonaNadler by default the value suggestions should never take more than a second to appear because we have a timeout. However there is a setting in kibana.yml for configuring this timeout. When you experienced the slowness, do you know if this timeout had been increased above its default?

@smalenfant
Copy link

+1 on time values. We are experiencing huge slowdown in queries and high CPU usage of our cluster because suggestions take a long time to get values from cluster (hot+cold with 50TB of data). It also queries for every single character you type in the filter box. 20 searches sent in serial that needs to complete before the final results come back. 100% CPU across all my warm nodes results from that. I'd like to be able to turn off "keystroke by keystroke" suggestion and turn on suggestion. Currently doesn't seem possible.

All queries should also be terminated when a user has fully entered when they needed (or save the filter).

This is a great feature although it doesn't play well with bug clusters used for time series data.

@smalenfant
Copy link

See the firefox network console when I type phn: dukecdedge01.rd.at.cox.net:

image

@atoom
Copy link

atoom commented Oct 17, 2019

We are experiencing the same issue. I will copy and paste by comment from the Elastic dicussion board thread here: https://discuss.elastic.co/t/kql-related-performance-issue/199420

We have just updated our ELK installation from version 6.7.1 to 7.3.2 and we are experiencing the same issue. After a couple of days looking at everything from segment count, GC settings to disk i/o on the hosts I managed to pinpoint our high cpu usage and high response times to Kibana and auto completion of filter values together with KQL. When using Lucene syntax or setting filterEditor:suggestValues to Off as suggested above everything is much more responsive!
I have attached a screenshot from Chrome DevTools showing a waterfall diagram of all requests created when trying to search from the Disover view using KQL in Kibana when filter value suggestions are enabled. After the the number of in-flight request threads hits the browser max value subsequent requests are stalled until a previous request is completed - this can result in the actual search query request times out after 30 seconds.

kibana-7 3-filter-editor-suggest-values-enabled-xhr-waterfall

@markharwood
Copy link
Contributor

Adding another case here.

Took a long time (weeks) to diagnose the reason for a slow response - KQL autocomplete was adding >30s to response times in this case.

@timroes timroes added the Feature:KQL KQL label Dec 2, 2019
@smalenfant
Copy link

Any updates on when the suggest feature will use the "time range" selected instead of the full index scan?

@rayafratkina
Copy link
Contributor

#48450 went into 7.6 and should resolve this issue

@timroes timroes added Team:AppArch and removed Team:Visualizations Visualization editors, elastic-charts and infrastructure labels Feb 20, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-app-arch (Team:AppArch)

@smalenfant
Copy link

The fix provided provided some help about making sure 30 requests doesn't make it to the cluster while a user types.

The time range has not been. Turning on the filter suggest value bring our cluster to a crawl for hours since it's trying to hit all our indexes (cold and frozen).

@AlonaNadler
Copy link
Author

Thanks for the feeback @smalenfant.
@elastic/kibana-app-arch I'm adding that to our short term, this is a friction point which is important to solve.

@kustodian
Copy link

Any updates on the progress of this issue?

@erickjordan
Copy link

Still slow in Kibana 7.7.

@jimczi
Copy link

jimczi commented Jun 17, 2020

We have a bug in 7.x that makes the terminate_after option to be ignored on search requests that use a size of 0. That explains, I think, why value suggestions are slower in 7.x (the bug was introduced in 7.0).
Although I agree with the comments made here, a value suggester that needs to hit all shards on every keystroke and retrieve 100k docs per shards will likely be slow on large deployment even with the fix. We should look at a more scalable solution and evaluate the cost of having this feature enabled by default on every deployment.

@lizozom
Copy link
Contributor

lizozom commented Sep 28, 2020

@lukasolson made an interesting suggestion: to use async search to fetch search results progressively.
We'd give a 1s initial timeout for the results, and continue fetching them, as long as the user is not typing something new or chooses an option.
We also talked about applying some kind of sorting, to make sure "hot" data is queries first.

@jimczi does this make sense?

@lizozom
Copy link
Contributor

lizozom commented Oct 4, 2020

For testing purposes, I used my large data cluster and replaced the query used to fetch autocomplete, to simply getting the latest 20 documents over a 3 year time range.

I used async search, but it takes ~10 seconds until the first result even for this simple query. I'm getting similar results running this query in Dev Tools.

How can we improve this? Or this is the performance to be expected?

{
    "size": 50,
    "sort": [
      {
        "@timestamp": {
          "order": "desc"
        }
      }
    ],
    "docvalue_fields": [
      "@message.keyword"
    ],
    "_source": false,
    "query": {
      "bool": {
        "filter": []
      }
    }
}

@weltenwort
Copy link
Member

I could see several ways to optimize the query:

  • don't sort
  • use terminate_after
  • use a time range filter
  • disable total hits tracking (or set it to the same as size)

@lizozom
Copy link
Contributor

lizozom commented Oct 14, 2020

@weltenwort I posted this query as a part of bench marking different autocomplete query combinations :)
I definitely tried not tracking, using the time range filter and disabling total hits tracking.
Terminate after would just yield partial results, correct?

@weltenwort
Copy link
Member

Yes, AFAIK it would return as soon as the hit count is reached.

@kustodian
Copy link

kustodian commented Oct 15, 2020

I don't understand why are we discussing how to optimize this query when Kibana 6 just limited the time range and it worked great. That should revert how auto-complete worked before, which would fix most of the issues. Later on, we can discuss if it can be optimized better.

@lizozom
Copy link
Contributor

lizozom commented Oct 15, 2020

I've benchmarked the performance of our current terms aggregation autocomplete query.
I also tried fetching the latest 50 documents (to potentially combine it with the terms results to speed up the process) and played around with a significant_terms aggregation, with and without a sampler.

I tried out the following configurations:

  • With and without trackTotalHits
  • With and without a timerange applied
  • With and without sorting
  • With various shard_size configurations

The data used for these tests is a ~40 million logs data set that was generated into a 7.10 staging cloud instance with default configuration.

Results

Times are taken from the took field on the Elasticsearch response in ms.

Record # TERMS w/totals, wo/timerange, wo/sort TERMS w/totals, w/sort LATEST w/totals, w/sort TERMS wo/totals, w/sort LATEST wo/totals, w/sort TERMS wo/totals, wo/sort SIG TERMS wo/totals, w/sort SIG TERMS wo/totals, w/sort, sampler
12M 6175 1862 275 1024 428 976 2458 3702
20M 6155 4688 3003 3260 1312 3257 6365 4048
40M 6208 7548 3465 5774 350 6104 9352 7023

So it's evident from this table that:

  • If timerange is not used, the terms aggregation runs at the maximal possible runtime, but even with the timerange, performance does not reach acceptable levels, on an average dataset and with no other queries running on the cluster.
  • Fetching last X docs is a good way to improve time to initial results
  • We shouldn't fetch totals when fetching autocomplete results
  • Sorting doesn't have a visible impact on the terms aggregation
  • Using a significant terms agg with a sampler didn't seem to make a difference (at least with a basic configuration)

@kustodian
Copy link

I'm more interested in what are the results when the time range is smaller, like 1h or 1 day. Because currently even if the selected time range is 1h Kibana still queries all the unique terms in the whole cluster, which totally kills the cluster, that's why I'm saying that the time range should be implemented first, and other optimization should be added later.

@lizozom
Copy link
Contributor

lizozom commented Nov 5, 2020

@lizozom
Copy link
Contributor

lizozom commented Nov 5, 2020

@kustodian
Copy link

I guess that Trello board is internal only?

@weltenwort
Copy link
Member

@kustodian These are just an artifact of a misconfigured integration. This is still the main issue used to track and discuss.

@exalate-issue-sync exalate-issue-sync bot added impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. loe:small Small Level of Effort labels Jun 2, 2021
@lukasolson
Copy link
Member

Fixed by #100174.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:KQL KQL feedback_needed impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. loe:small Small Level of Effort
Projects
None yet
Development

No branches or pull requests