-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid global ordinals in composite aggregation #74559
Conversation
Pinging @elastic/es-analytics-geo (Team:Analytics) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few notes, but I don't think any of them are blockers on merging. Thanks for taking this one!
...va/org/elasticsearch/search/aggregations/bucket/composite/CompositeValuesCollectorQueue.java
Outdated
Show resolved
Hide resolved
...rc/main/java/org/elasticsearch/search/aggregations/bucket/composite/OrdinalValuesSource.java
Show resolved
Hide resolved
...rc/main/java/org/elasticsearch/search/aggregations/bucket/composite/OrdinalValuesSource.java
Outdated
Show resolved
Hide resolved
...rc/main/java/org/elasticsearch/search/aggregations/bucket/composite/OrdinalValuesSource.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic looks good to me. I left two comments to avoid the remapping when it's not needed.
...rc/main/java/org/elasticsearch/search/aggregations/bucket/composite/OrdinalValuesSource.java
Outdated
Show resolved
Hide resolved
...va/org/elasticsearch/search/aggregations/bucket/composite/CompositeValuesCollectorQueue.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
CompositeValuesCollectorQueueTests.testRandom was timing out as it took too long to run, because we kept checking the invariant every time on copyCurrent
A composite aggregation on a keyword field requires global ordinals today to ensure fast comparisons between segments. It only needs to keep track of the top N composite buckets, however. Since N is typically small, we can just use the segment ordinal for comparison when collecting inside a segment and remap ordinals when we go to the next segment. Closes #47452
Adds release highlights for match_only_text (elastic#66172) and more memory-efficient composite aggregations (elastic#74559).
This reverts commit 5cfcb2f. Conflicts: server/src/main/java/org/elasticsearch/search/aggregations/bucket/composite/CompositeValuesCollectorQueue.java server/src/main/java/org/elasticsearch/search/aggregations/bucket/composite/OrdinalValuesSource.java server/src/main/java/org/elasticsearch/search/aggregations/bucket/composite/TermsValuesSourceBuilder.java server/src/test/java/org/elasticsearch/search/aggregations/bucket/composite/SingleDimensionValuesSourceTests.java
This reverts commit 5cfcb2f. Conflicts: server/src/main/java/org/elasticsearch/search/aggregations/bucket/composite/CompositeValuesCollectorQueue.java server/src/main/java/org/elasticsearch/search/aggregations/bucket/composite/OrdinalValuesSource.java
* Revert "Update docs that composite agg no longer uses global ords (#74754)" This reverts commit ec799ab. * Revert "Avoid global ordinals in composite aggregation (#74559)" This reverts commit 5cfcb2f. Conflicts: server/src/main/java/org/elasticsearch/search/aggregations/bucket/composite/CompositeValuesCollectorQueue.java server/src/main/java/org/elasticsearch/search/aggregations/bucket/composite/OrdinalValuesSource.java
Composite aggregations can paginate all buckets from a multi-level aggregation efficiently. It is heavily used by the transform functionality, for example to convert existing Elasticsearch indices into entity-centric indices that summarize the behavior of users or sessions.
Composite aggregations on
keyword
fields used global ordinals (see "What are global ordinals") to ensure fast comparisons between segments. Global ordinals on high cardinality fields can however use a lot of heap memory as part of the field data cache.With this PR, composite aggregations no longer need global ordinals, reducing resource consumption for batch-like jobs such as transform. The trick is that composite aggregations only need to keep track of the top N composite buckets. Since N is typically small, we can just use the segment ordinal for comparison when collecting inside a segment and remap ordinals when we go to the next segment.
Closes #47452