Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix aggregation memory leak for CCS #78404

Merged

Conversation

henningandersen
Copy link
Contributor

@henningandersen henningandersen commented Sep 28, 2021

When a CCS search is proxied, the memory for the aggregations on the
proxy node would not be freed.

Now only use the non-copying byte referencing version on the coordinating node,
which itself ensures that memory is freed by calling consumeAggs.

Relates #72309

When a CCS search is proxied, the memory for the aggregations on the
proxy node would not be freed.
Now does ref-counting on the `QuerySearchResult` object (and friends) in
order to ensure that the memory for aggregations is eventually freed.
@henningandersen
Copy link
Contributor Author

@elasticmachine update branch

@henningandersen henningandersen marked this pull request as ready for review September 30, 2021 19:36
@elasticmachine elasticmachine added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. labels Sep 30, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch ! Although I think we can avoid the ref counting here. The fact that we delay the loading of the aggregation is relevant only in the coordinating node when reducing lots of shards. Scrolls don't expose aggregations so I'd prefer that we use an instance that don't use the bytes reference at all. So can we instead add an option in the ctr of the QuerySearchResult to decide whether the reference should be kept or not ?
The default should be to fully read the aggs and we can opt-in for the reference in SearchTransportService#sendExecuteQuery ?

@henningandersen henningandersen requested a review from jimczi October 1, 2021 12:38
Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, can you edit the description to reflect the new approach ?

@henningandersen henningandersen added the auto-backport Automatically create backport pull requests when merged label Oct 4, 2021
@henningandersen
Copy link
Contributor Author

Thanks Jim!

@henningandersen henningandersen merged commit 80792a1 into elastic:master Oct 4, 2021
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
7.x Commit could not be cherrypicked due to conflicts
7.15 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 78404

1 similar comment
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
7.x Commit could not be cherrypicked due to conflicts
7.15 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 78404

henningandersen added a commit to henningandersen/elasticsearch that referenced this pull request Oct 4, 2021
When a CCS search is proxied, the memory for the aggregations on the
proxy node would not be freed.

Now only use the non-copying byte referencing version on the coordinating node,
which itself ensures that memory is freed by calling `consumeAggs`.
henningandersen added a commit that referenced this pull request Oct 5, 2021
When a CCS search is proxied, the memory for the aggregations on the
proxy node would not be freed.

Now only use the non-copying byte referencing version on the coordinating node,
which itself ensures that memory is freed by calling `consumeAggs`.

Relates #72309
henningandersen added a commit that referenced this pull request Oct 5, 2021
When a CCS search is proxied, the memory for the aggregations on the
proxy node would not be freed.

Now only use the non-copying byte referencing version on the coordinating node,
which itself ensures that memory is freed by calling `consumeAggs`.

Relates #72309
@ppf2
Copy link
Member

ppf2 commented Oct 5, 2021

Adding comment from @tbrooks8 to clarify the impact:

This issue can happen in all modes of remote connections (sniff and proxy). In sniff mode, we only connect to a subset of the remote nodes (by default 3). So if the remote node we want to send a request to is not one of those 3, we must send the request as a proxy request.

ppf2 added a commit that referenced this pull request Oct 6, 2021
There is currently no plans to back port #78404 to 7.14.x.
Adding this as a known issue to the release notes for 7.14.x per discussion with @qhoxie .  Please also back port this doc change to 7.14.0 and 7.14.1. Thx!
ppf2 added a commit that referenced this pull request Oct 6, 2021
Adding this as a known issue (#78404) to the release notes for 7.15.0 (fix will be in 7.15.1). Thx!
jrodewig added a commit that referenced this pull request Oct 6, 2021
Adds #78404 as a known issue to the 7.15.0 and 7.14.n release notes.

Co-authored-by: James Rodewig <[email protected]>
elasticsearchmachine pushed a commit that referenced this pull request Oct 6, 2021
…) (#78788)

Adds #78404 as a known issue to the 7.15.0 and 7.14.n release notes.

Co-authored-by: James Rodewig <[email protected]>

Co-authored-by: Pius <[email protected]>
elasticsearchmachine pushed a commit that referenced this pull request Oct 6, 2021
…) (#78789)

Adds #78404 as a known issue to the 7.15.0 and 7.14.n release notes.

Co-authored-by: James Rodewig <[email protected]>

Co-authored-by: Pius <[email protected]>
@jgq2008303393
Copy link

why not backport to 7.14 & 7.15?

@imotov
Copy link
Contributor

imotov commented Dec 23, 2021

@jgq2008303393 it was back-ported to 7.15: see a5cc08f

@jgq2008303393
Copy link

jgq2008303393 commented Dec 23, 2021

How about 7.14? Is this patch incompatible with 7.14 or is this patch not critical enough? @imotov

@imotov
Copy link
Contributor

imotov commented Dec 23, 2021

It can be backported but there will be no more releases of 7.14 so it would be a wasted effort. Please see https://www.elastic.co/support/eol. If you have any question about this policy, let's continue this discussion on our discussion forum. We use github for bug reports and enhancement requests and this discussion is neither.

@jgq2008303393
Copy link

Thanks for reply very much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations auto-backport Automatically create backport pull requests when merged >bug :Distributed Coordination/Network Http and internode communication implementations Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v7.15.1 v7.16.0 v8.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants