Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove ability to specify size: 0 on aggregations #18838

Closed
colings86 opened this issue Jun 13, 2016 · 6 comments
Closed

Remove ability to specify size: 0 on aggregations #18838

colings86 opened this issue Jun 13, 2016 · 6 comments

Comments

@colings86
Copy link
Contributor

Currently size: 0 is used as an alias for get me all the buckets on the terms, significant_terms and geohash_grid aggregations. This is very trappy as it hides the fact that this is actually a really bad idea which, for high cardinality fields will use a lot of memory and stream huge amounts of data from the shards back to the coordinating node to reduce. It cost huge amounts in CPU (for the reduce), memory (on both the shards and the reduce node) and network bandwidth. It also has the ability to starve the node of resources potentially destabilising the cluster (either through creating a lot of garbage to be collected which could trigger stop the world GC pauses or through tying up the CPU or network and blocking the liveness pings back to the master.

Having size: 0 as an option makes it look like there is a short cut here and we can do the give me all the buckets case in a different very efficient way, which we can't. Internally we rewrite size:0 to Integer.MAX_VALUE

Removing this option (size: 0) and requiring the user to specify a number means it becomes more obvious that asking for huge number of bugs will incur a high cost by the very fact that the user actually has to ask for that high number.

The only case (IMO) for asking for all buckets of a terms agg back which is not a bad idea is when you have a low cardinality and in that case you almost always know, at least to an order of magnitude, how many values you expect (usually <10, 10s or low 100s). In these cases having to specify a size is not a big burden as you can still set it to a number higher than you expect (e.g. for 10s of values set size to 200) and in the case where there is a bug and your low cardinality field accidentally becomes very high cardinality you are protected against doing a very heavy operation. Other use-cases should not be asking for all buckets but instead should be slicing and analysing the data in a way where they do not need to ask for everything (you can't show that much information to a user anyway).

colings86 added a commit that referenced this issue Jun 14, 2016
This change deprecates `size: 0` for the terms, significant terms and geohash grid aggregations

Relates to #18838
colings86 added a commit that referenced this issue Jun 14, 2016
This change deprecates `size: 0` for the terms, significant terms and geohash grid aggregations

Relates to #18838
@jimczi
Copy link
Contributor

jimczi commented Jun 17, 2016

Since this requires some changes in Kibana I've reverted the removal. Though it is now deprecated in 2.x/2.4.

@jimczi jimczi reopened this Jun 17, 2016
@jimczi
Copy link
Contributor

jimczi commented Jun 17, 2016

Oups wrong issue ;)

@jimczi jimczi closed this as completed Jun 17, 2016
@GlenRSmith
Copy link
Contributor

GlenRSmith commented Oct 12, 2016

@colings86
Should this really have been removed from the documentation? Would it have been better to signal the deprecation in the documentation?

@colings86
Copy link
Contributor Author

Deprecation logging was added as commented in #18854. Also in that PR we added it to the breaking changes document for 5.0: https://github.com/elastic/elasticsearch/pull/18854/files#diff-b044a68ebfc190743082f46a3361748cR24

@colings86
Copy link
Contributor Author

We could add a note to the 2.4 documentation as well if you think it will help. Fancy raising a PR to change the 2.4 documentation?

@GlenRSmith
Copy link
Contributor

Can do. I need to find a sample page with documented deprecation to use as an example of how to "style" such a deprecation.

wmfgerrit pushed a commit to wikimedia/mediawiki-extensions-ApiFeatureUsage that referenced this issue Mar 16, 2017
Setting the size to 0 was a shortcut allowing to return
everything. This is no longer allowed and we need to set
a value here.

relates to elastic/elasticsearch#18838

Change-Id: I030c8a46eb37057a84339b56ca370c59991a410d
sheetaluk pushed a commit to hypothesis/h that referenced this issue Jul 9, 2018
Specifying  for aggregations was used as an alias to get all the
buckets on the aggregation terms. This is deprecated since ES 2.x.
We are now required to specify the number of buckets.
See elastic/elasticsearch#18838
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants