-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filter Editor Suggestions can take forever and drastically increase the load in a cluster. #12692
Comments
After more looking at it, I think if there is some sort of time filter or shard limit, then this would not be a problem. The aggregation on fields with fielddata is only problematic (for me, at least) when you try and do an aggregation on them across all shards. |
Adding the time filter to the request has pros and cons. If you include the time filter as part of the request for suggestions, then you're assuming the user will always update the time filter prior to adding filters in the filter editor UI. Users might be confused and expect values to show up as suggestions because they know the field contains those values. On the other hand, users might actually expect to only see suggestions for the given time range, and obviously it would improve performance, and would make suggestions more specific and applicable to the data you're actually viewing. What kind of impact does including the time range in the terms request have for you, @trevan? Also, what would you think about including Thanks for helping us out here, @trevan! |
@lukasolson, there appears to be some sort of cache. I'm limiting the number of indices in my test query now to prevent issues but once I find a set of indices that cause 30 seconds, subsequent queries take only 5-6 seconds. I've tried _cache/clear and request_cache=false but that doesn't seem to do anything. But if I take the "cached" results that take 5-6 seconds, adding the "shard_size: 10" does nothing. It still stays between 5-6 seconds. Setting "size: 0" also doesn't affect the speed. Removing the leading regex part doesn't affect it either. Adding a timespan (24 hours which shrinks the number of shards to 36 for my test query) takes less than a 1 second. I then tried using the 24 hour timespan on the full index list. It took 26 seconds to run the first time. The second time was 7 seconds. The third time was 3 seconds. The next several runs were all around 3 seconds. I wonder if you could edit your PR and add an additional option that would have the suggestions use the timespan. I understand your concerns but I would much rather have some suggestions with a timespan vs no suggestions using your PR. |
@lukasolson, I ran the query against the full dataset during a down time. It took a little over 4 minutes for the first time. Subsequent runs took around 40-50 seconds. Adding the "shard_size:10" didn't seem to make any change to the time. It still took around 40 seconds. Adding a 24 hour limit didn't affect the first run, but subsequent runs it only took 6 seconds. I tried adding "routing=1" to the request and that brought the original query down to 12-15 seconds while bringing down the 24 hour limit query to 1-2 seconds. We don't use custom routing but it appears to give a decent win. |
Thanks for troubleshooting, @trevan. I'm reaching out to the Elasticsearch team to see if we can preserve this functionality without causing the problems you've run into. |
@lukasolson, do we want to keep this issue open since it would be nice to have suggestions without a performance hit? |
@trevan Yup, unintentionally closed this by tagging it in the PR. |
@trevan could you try another thing for me with your dataset? In my tests it doesn't seem to make a big impact but a second opinion is always nice 😄 Here's the query it currently makes when using suggestions: POST health-*/_search
{
"size": 0,
"aggs": {
"suggestions": {
"terms": {
"field": "device.search",
"include": ".*0x.*"
}
}
}
} Someone suggested we try using sampling. Here's how it might look instead: POST health-*/_search
{
"size": 0,
"aggs": {
"sample": {
"sampler": {
"shard_size": 10
},
"aggs": {
"suggestions": {
"terms": {
"field": "device.search",
"include": ".*0x.*"
}
}
}
}
}
} Would you mind giving that a try and seeing if it has a large impact on your performance? |
Also, if you want to see how they perform without considering the cache, you could always clear the cache for a specific set of indices. Probably wouldn't want to do this on production, but yeah. https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-clearcache.html |
@lukasolson, I ran that query. It took it from 40 seconds to 18 seconds. So it was definitely an improvement. I tried it with shard_size: 1 but that didn't decrease it noticeably. Also, I did try using the _cache/clear as well as request_cache back when I first mentioned the cache issue. If I let elasticsearch use the cache, after a while (about 4-5 requests), the request only takes 1-2 seconds. |
Same problem here. The filter editor was saturating all CPUs in our ES5 cluster with 5 nodes searching for term suggestions. As soon as a keyword field is selected and I choose the one is
the other is
They both ran into timeouts for The query is not only extremely expensive – there are two of them running in parallel. Do we need both queries? When they eventually finish both of them return the same list of 10 terms. The find-as-you-type style makes it even worse by firing new searches with every key I press. I agree, that it is important to bring down query times. But could it also be beneficial to limit the number of suggest searches one client may run at a time? |
@filex Good point, there should only be one at a time for sure. Thanks for pointing this out. |
Same problem. We have cluster of 4 datanodes with 1.5TB of daily index, 1500 fields. Would be handy to let administrator disable/enable auto suggestion through the Management -> Advanced Settings, this way we could disable this feature and save cluster from being stressed out. Also I don't know why but from the moment I open filter editor and fill in complete expression i see approximately 30-50 requests made to single datanode in elasticsearch (i have slow log with logging of all queries enabled) |
@trevan Could you play around with adding the POST health-*/_search?request_cache=false
{
"size": 0,
"terminate_after": 100000,
"aggs": {
"suggestions": {
"terms": {
"field": "device.search",
"include": ".*0x.*",
"execution_hint": "map"
}
}
}
} Basically, this will tell the request to stop after hitting 100000 documents. The higher this number, the better the suggestions, but the longer the query will run. |
@lukasolson, that gives really impressive results. With that value of terminate_after, the query takes around 1 second. I increased terminate_after by 10x and it took 5 seconds. Setting terminate_after to 0, though, doesn't seem to disable the query. Instead, it just disables the terminate_after setting because it took 40-50 seconds. I'm not sure about the execution_hint change. With terminate_after set to 1000000, it takes 5 seconds without the execution_hint and 12 seconds with the execution_hint. At 100000, it seems to add a few tenths of a second as well. |
@trevan When I mentioned that setting the value to 0 would disable the suggestions entirely, I meant that we'd have a condition in the code that turned them off, not that the query itself would be disabled. |
Still same problem. With ES/Kibana 6.4.0.
After 10+min, It will load 160GB+ fielddata to memory, seems loading fielddata can't terminate property. |
And this is hot thread:
looks like es is building ordinal map |
I'm currently upgrading to the latest master and noticed the Filter Editor suggestions from #11375. I looked at the search requests that it makes and there is a BIG problem. I took one of the search requests and ran it against our production data to see how it would fair. I let it run for 3 minutes before I killed it and then took the next several minutes trying to keep the cluster a float. All nodes in our cluster had their load average go from the 20's to the 70's.
I think the main issues are that:
The text was updated successfully, but these errors were encountered: