Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid global ordinals in composite aggregation #74559

Merged
merged 18 commits into from
Jun 29, 2021

Conversation

ywelsch
Copy link
Contributor

@ywelsch ywelsch commented Jun 24, 2021

Composite aggregations can paginate all buckets from a multi-level aggregation efficiently. It is heavily used by the transform functionality, for example to convert existing Elasticsearch indices into entity-centric indices that summarize the behavior of users or sessions.

Composite aggregations on keyword fields used global ordinals (see "What are global ordinals") to ensure fast comparisons between segments. Global ordinals on high cardinality fields can however use a lot of heap memory as part of the field data cache.

With this PR, composite aggregations no longer need global ordinals, reducing resource consumption for batch-like jobs such as transform. The trick is that composite aggregations only need to keep track of the top N composite buckets. Since N is typically small, we can just use the segment ordinal for comparison when collecting inside a segment and remap ordinals when we go to the next segment.

Closes #47452

@ywelsch ywelsch requested review from not-napoleon and jimczi June 24, 2021 13:05
@elasticmachine elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jun 24, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

Copy link
Member

@not-napoleon not-napoleon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few notes, but I don't think any of them are blockers on merging. Thanks for taking this one!

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic looks good to me. I left two comments to avoid the remapping when it's not needed.

@ywelsch ywelsch requested a review from jimczi June 29, 2021 11:22
Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

ywelsch added 3 commits June 29, 2021 14:03
CompositeValuesCollectorQueueTests.testRandom was timing out as it took too long to run,
because we kept checking the invariant every time on copyCurrent
@ywelsch ywelsch merged commit 8cc4b38 into elastic:master Jun 29, 2021
ywelsch added a commit that referenced this pull request Jun 29, 2021
A composite aggregation on a keyword field requires global ordinals today to ensure fast comparisons between
segments. It only needs to keep track of the top N composite buckets, however. Since N is typically small, we can just
use the segment ordinal for comparison when collecting inside a segment and remap ordinals when we go to the next
segment.

Closes #47452
ywelsch added a commit that referenced this pull request Jul 5, 2021
ywelsch added a commit that referenced this pull request Jul 5, 2021
ywelsch added a commit that referenced this pull request Jul 5, 2021
ywelsch added a commit that referenced this pull request Aug 2, 2021
Adds release highlights for match_only_text (#66172) and more memory-efficient composite aggregations (#74559).
probakowski pushed a commit to probakowski/elasticsearch that referenced this pull request Aug 2, 2021
Adds release highlights for match_only_text (elastic#66172) and more memory-efficient composite aggregations (elastic#74559).
not-napoleon added a commit to not-napoleon/elasticsearch that referenced this pull request Oct 7, 2021
This reverts commit 5cfcb2f.

 Conflicts:
	server/src/main/java/org/elasticsearch/search/aggregations/bucket/composite/CompositeValuesCollectorQueue.java
	server/src/main/java/org/elasticsearch/search/aggregations/bucket/composite/OrdinalValuesSource.java
	server/src/main/java/org/elasticsearch/search/aggregations/bucket/composite/TermsValuesSourceBuilder.java
	server/src/test/java/org/elasticsearch/search/aggregations/bucket/composite/SingleDimensionValuesSourceTests.java
not-napoleon added a commit to not-napoleon/elasticsearch that referenced this pull request Oct 7, 2021
This reverts commit 5cfcb2f.

 Conflicts:
	server/src/main/java/org/elasticsearch/search/aggregations/bucket/composite/CompositeValuesCollectorQueue.java
	server/src/main/java/org/elasticsearch/search/aggregations/bucket/composite/OrdinalValuesSource.java
not-napoleon added a commit that referenced this pull request Oct 7, 2021
* Revert "Update docs that composite agg no longer uses global ords (#74754)"

This reverts commit ec799ab.

* Revert "Avoid global ordinals in composite aggregation (#74559)"

This reverts commit 5cfcb2f.

 Conflicts:
	server/src/main/java/org/elasticsearch/search/aggregations/bucket/composite/CompositeValuesCollectorQueue.java
	server/src/main/java/org/elasticsearch/search/aggregations/bucket/composite/OrdinalValuesSource.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement release highlight Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v7.14.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Composite aggregation could avoid loading global ordinals entirely
6 participants