Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to Elasticsearch 7.9.0 #19

Closed
wants to merge 15 commits into from

Conversation

rkouye
Copy link

@rkouye rkouye commented Sep 30, 2020

This PR upgrade to ES 7.9.0 and also fix #17 by inheriting DeferableBucketAggregator.

Breaking change on ES side:

Using DeferableBucketAggregator
See #17 for more context.

High cardinality aggregation consumes memory and time, and ES had an internal limit around the number of buckets you can build (prior to 7.8 during the collection phase, now during the reduce phase).

To prevent hitting this limit, we added the TermsAggregator inspired shard_size parameter, but we were still building all buckets (we need to sort them, and you had to build the bucket and its aggregations, in case you are sorting by an aggregation).

This PR make our aggregator inherit DeferableBucketAggregator, and defer the collection of aggregation not used for sorting. We then order the buckets using a partiallyBuiltBucketComparator just like TermsAggregator. And finally we select the top shard_size buckets and fully build them. It should be faster. But it needs more tests.

The next step will be to introduce the parameter CollectMode to allow disabling this behavior.

It would have been better to inherit TermsAggregator instead of duplicating this logic, but I learned the hard way that depending on ES class is not a good idea. TLDR, you cannot inherit InternalTerms because of a package-private abstract class. I don't know if it was a typo or on purpose.

Misc

  • A new task validateNebulaPom was failing for missing licenses and developers' information in generated pom. We aren't publishing, but instead of disabling it, I just added the missing info.

  • YamlRestTest build fails without log4j, I suspect it is a dependency of elasticsearch:test-framework that is not satisfied by the plugin

  • IntegTest are not deleted yet in 7.9.0, so I had to disable them until it is done upstream

  • Upgraded gradle to 6.5 (should have done 6.6 now that I think about it) because 6.1 was failing with Java 14

TODO

  • Tests, accuracy tests, perf tests

  • Release note

  • Use github action for CI and maybe release automation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

too_many_buckets_exception is thrown regardless of size/shard_size setting in query
1 participant