-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Caching of filters #16108
Comments
That's partly true, in a boolean query the |
@jimferenczi hmm .. yes, query rewrites might cause an issue here, but if the intent of the user is to not do caching for a particular query, maybe cache=false should be set in the rewritten queries also .. right? |
@santanusinha yes this is my point. If the user knows that the query should not be cached (even if the score is not needed) then we should have something in the query that state clearly that we don't want this part of the query to enter the filter cache. |
This should not longer happen as filters will only be cached after repeated use - this is one of the reasons for the rewrite, to stop overcaching filters by default. Really, this is something that the user should't have to think about; Elasticsearch should be smart enough to figure it out for itself. Of course, these algorithms need iteration to improve. Another feature which is already there, but needs improvement, is shard request caching... to explain: a typical use case is showing page views per hour for the last month. Using the index-per-day model, only the data for today's index is changing. The request cache (you need to turn on caching) will cache the aggregation results for all of the other indices, and only recalculate the results for today's index - huge improvement. But there are a couple of issues that we are working on fixing. The first is that the JSON request must be exactly the same in order to retrieve the cached version. This can be tricky because the order of keys in JSON can vary. The search refactoring happening in #10217 will fix this because we'll use the parsed representation of the query for caching instead of the JSON. The second is that these queries usually use a time range. If you use Once the search refactoring is done, we can improve this situation by checking whether the min and max values in range query are lower/higher respectively than the min/max values for a particular shard and, if so, rewrite the range query as a match_all. This would mean that, even though |
Thanks for the explanation. Will keep my eyes out for issues and report back if we see anything. |
Hi,
As I can see in Filter Auto Caching section, the control has been taken away from users for disabling caching in known cases.
This would cause a lot of problems in situations where elasticsearch is being used as a timeseries database. Typically analysts might run some one-off queries over older time ranges causing the filter cache to blow up without any reason. In previous versions, we would have turned of caching for the time range filter for queries over older ranges. Getting the data would be slower, but given that the data is coming for older date range, people could live with it.
I might be missing something, but right now it seems impossible to mimic this behaviour due to lack of configurability in choosing which queries to cache in the filter segment and which queries not to. The strategy discussed in the aforementioned documentation make sense in most of the general situation, but not 100% of the time. I feel that the option should at-least be present so that people can use it in times of need.
If you don't mind my asking, what was the rationale behind the decision to remove the filter caching configuration, and are there any chances that this will be brought back in the future?
The text was updated successfully, but these errors were encountered: