-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Terms agg with zero-result filtered query searches whole index #8406
Comments
This is on Elasticsearch 1.3.2. |
@avleen This looks wrong indeed. I just tried to reproduce the issue with no success, can you please help me understand a bit more what happens:
|
Hi @jpountz
When I run it against our
|
Hmm this might be due to fielddata loading. This is expected since we need to load fielddata before knowing what matches but I will check that we are not loading global ordinals with the "map" execution hint since they are not required. Can you confirm that only the first few queries are slow? |
This seems to be pretty slow all the time. It's hard to confirm because running the query takes down our production cluster for several minutes. It just happened again and one node fell out of the cluster because it was trying to do garbage collection for several minutes :-) Running the |
I can try to explain a bit more how running works on a shard level: Elasticsearch creates an object called |
I just checked that the However, I'm still concerned that you mentioned that everything works fine when the query returns ~100 matches. Do you mean that requests that match no documents are more harmful to your cluster than those queries that match few documents? |
It seemed that way but don't take it as gospel. It's possible that during Could the field data loading be delayed until we know what documents On Wed, Nov 12, 2014, 01:27 Adrien Grand [email protected] wrote:
|
@avleen If you want to use a field for sorting/aggregating, then you need it in fielddata. Perhaps this query doesn't match but the next one will. Fielddata is loaded from disk by uninverting the index, so it has to happen in one go. It would be much much much slower if we were to load only matching docs. Instead, use doc values for this field. Then the memory requirements go away. |
We'll give doc values another site Clinton. When we tested them a while Thanks! On Tue, Nov 25, 2014, 14:10 Clinton Gormley [email protected]
|
@avleen Make sure you aren't using them on the |
Nothing more to do here. Closing in favour of #8312 |
We're using a filtered query followed by a terms agg, as so:
Normally the filtered query returns some small number of requests (under 100) and everything is fine.
However, when it returns zero results, the terms aggregations runs on the entire index (in our case, ~4bn docs), causing repeated GCs which take many hours and make the cluster unusable.
It should, in fact, not run at all because it got zero results from the filtered query.
The text was updated successfully, but these errors were encountered: