-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove ability to specify size: 0
on aggregations
#18838
Comments
This change deprecates `size: 0` for the terms, significant terms and geohash grid aggregations Relates to #18838
This change deprecates `size: 0` for the terms, significant terms and geohash grid aggregations Relates to #18838
Since this requires some changes in Kibana I've reverted the removal. Though it is now deprecated in 2.x/2.4. |
Oups wrong issue ;) |
@colings86 |
Deprecation logging was added as commented in #18854. Also in that PR we added it to the breaking changes document for 5.0: https://github.com/elastic/elasticsearch/pull/18854/files#diff-b044a68ebfc190743082f46a3361748cR24 |
We could add a note to the 2.4 documentation as well if you think it will help. Fancy raising a PR to change the 2.4 documentation? |
Can do. I need to find a sample page with documented deprecation to use as an example of how to "style" such a deprecation. |
Setting the size to 0 was a shortcut allowing to return everything. This is no longer allowed and we need to set a value here. relates to elastic/elasticsearch#18838 Change-Id: I030c8a46eb37057a84339b56ca370c59991a410d
Specifying for aggregations was used as an alias to get all the buckets on the aggregation terms. This is deprecated since ES 2.x. We are now required to specify the number of buckets. See elastic/elasticsearch#18838
Currently
size: 0
is used as an alias forget me all the buckets
on theterms
,significant_terms
andgeohash_grid
aggregations. This is very trappy as it hides the fact that this is actually a really bad idea which, for high cardinality fields will use a lot of memory and stream huge amounts of data from the shards back to the coordinating node to reduce. It cost huge amounts in CPU (for the reduce), memory (on both the shards and the reduce node) and network bandwidth. It also has the ability to starve the node of resources potentially destabilising the cluster (either through creating a lot of garbage to be collected which could trigger stop the world GC pauses or through tying up the CPU or network and blocking the liveness pings back to the master.Having
size: 0
as an option makes it look like there is a short cut here and we can do thegive me all the buckets
case in a different very efficient way, which we can't. Internally we rewritesize:0
toInteger.MAX_VALUE
Removing this option (
size: 0
) and requiring the user to specify a number means it becomes more obvious that asking for huge number of bugs will incur a high cost by the very fact that the user actually has to ask for that high number.The only case (IMO) for asking for all buckets of a terms agg back which is not a bad idea is when you have a low cardinality and in that case you almost always know, at least to an order of magnitude, how many values you expect (usually <10, 10s or low 100s). In these cases having to specify a size is not a big burden as you can still set it to a number higher than you expect (e.g. for 10s of values set size to 200) and in the case where there is a bug and your low cardinality field accidentally becomes very high cardinality you are protected against doing a very heavy operation. Other use-cases should not be asking for all buckets but instead should be slicing and analysing the data in a way where they do not need to ask for everything (you can't show that much information to a user anyway).
The text was updated successfully, but these errors were encountered: