Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid bucket copies in Aggregations #110261

Merged

Conversation

original-brownbear
Copy link
Member

Motivated by heap dumps for aggregations often containing mostly duplicate bucket instances.
Note that This is a start, but there's a lot of duplication we can remove in follow-ups.
But for this one, there's simply a lot of straight forward spots where we can avoid copying a bucket/list-of-buckets and derived instances.

Motivated by heap dumps for aggregations often containing mostly
duplicate bucket instances.
Note that This is a start, but there's a lot of duplication we can remove in
follow-ups.
But for this one, there's simply a lot of straight forward spots where we can avoid copying a
bucket/list-of-buckets and derived instances.
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jun 28, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

);
if (internalAggregation.equals(agg) == false) {
if (internalAggregations == null) {
internalAggregations = new ArrayList<>(reducedInternalAggs);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to be tricky here IMO, copying the full list is likely similar in cost to doing anything tricky like copying the first I unchanged elements and then moving to a different loop or so, given how short these lists are.

Copy link
Contributor

@iverase iverase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@@ -84,6 +84,9 @@ public InternalAggregation[] buildAggregations(long[] owningBucketOrds) throws I
double key = roundKey * interval + offset;
return new InternalHistogram.Bucket(key, docCount, keyed, formatter, subAggregationResults);
}, (owningBucketOrd, buckets) -> {
if (buckets.isEmpty()) {
return buildEmptyAggregation();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -340,6 +340,9 @@ public InternalAggregation[] buildAggregations(long[] owningBucketOrds) throws I
return buildAggregationsForVariableBuckets(owningBucketOrds, bucketOrds, (bucketValue, docCount, subAggregationResults) -> {
return new InternalDateHistogram.Bucket(bucketValue, docCount, keyed, formatter, subAggregationResults);
}, (owningBucketOrd, buckets) -> {
if (buckets.isEmpty()) {
return buildEmptyAggregation();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@original-brownbear
Copy link
Member Author

Thanks Ignacio!

@original-brownbear original-brownbear merged commit 42502fe into elastic:main Jul 1, 2024
14 of 15 checks passed
@original-brownbear original-brownbear deleted the aggressively-dedup-aggs branch July 1, 2024 08:40
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Jul 1, 2024
Same as elastic#110261 but more globally applied. Removed copying of
aggregation instances as well as needlessly wrapping bucket list (to
make this equals freeish in most cases).
@benwtrent
Copy link
Member

@original-brownbear @iverase do we know if this has any measurable improvement in memory usage or CPU?

This change is causing some nasty bugs and we need to make sure either we back this change out, or fix the individual bugs.

@original-brownbear
Copy link
Member Author

@benwtrent

do we know if this has any measurable improvement in memory usage or CPU?

Yes it should, if I remember correctly this was motivated by a heap dump that showed O(100M) potential savings here. I'll try to look into this and see if we can make this safer easily, sorry for the bugs :/ Reverting would be a reasonable plan B here though IMO, there's so many issues in eggs and this I just one of them :)

@nik9000
Copy link
Member

nik9000 commented Aug 9, 2024

I'm going to revert this change now and we can tie it to a rework on equals for aggs to make it harder to make mistakes with them. I'm not sure we want to make that change at all, but this change is going to be locked behind it.

@benwtrent
Copy link
Member

@original-brownbear the only way to make this change work is to enforce that equals is overridden, this will cause plugins to fail to compile, so we need to revert this change, enforce equals, and then we can add it back.

^ Yeah, what @nik9000 said.

nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Aug 9, 2024
This reverts elastic#110261 which we can't land until elastic#111757 - we need to be
sure that the `equals` implementations on subclasses of
`InternalAggregations` is correct before this optimization is safe.
elasticsearchmachine pushed a commit that referenced this pull request Aug 9, 2024
This reverts #110261 which we can't land until #111757 - we need to be
sure that the `equals` implementations on subclasses of
`InternalAggregations` is correct before this optimization is safe.

Closes #111679
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Aug 9, 2024
This reverts elastic#110261 which we can't land until elastic#111757 - we need to be
sure that the `equals` implementations on subclasses of
`InternalAggregations` is correct before this optimization is safe.

Closes elastic#111679
elasticsearchmachine pushed a commit that referenced this pull request Aug 9, 2024
This reverts #110261 which we can't land until #111757 - we need to be
sure that the `equals` implementations on subclasses of
`InternalAggregations` is correct before this optimization is safe.

Closes #111679
cbuescher pushed a commit to cbuescher/elasticsearch that referenced this pull request Sep 4, 2024
This reverts elastic#110261 which we can't land until elastic#111757 - we need to be
sure that the `equals` implementations on subclasses of
`InternalAggregations` is correct before this optimization is safe.

Closes elastic#111679
davidkyle pushed a commit to davidkyle/elasticsearch that referenced this pull request Sep 5, 2024
This reverts elastic#110261 which we can't land until elastic#111757 - we need to be
sure that the `equals` implementations on subclasses of
`InternalAggregations` is correct before this optimization is safe.

Closes elastic#111679
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.15.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants