Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read Aggregations Directly from Pooled Buffers #72309

Merged

Conversation

original-brownbear
Copy link
Member

These aggregations can be of considerable size. We must not allocate a single byte[] of them.
Especially nowadays, using G1GC, contiguous allocations of this size are problematic.
This commit makes it so that we take the aggregation bytes as a slice out of the network buffers in
a 0-copy fashion, cutting the peak memory use for reading them in half effectively for large allocations.

These aggregations can be of considerable size. We must not allocate a single `byte[]` of them.
Especially nowadays, using G1GC, contiguous allocations of this size are problematic.
This commit makes it so that we take the aggregation bytes as a slice out of the network buffers in
a 0-copy fashion, cutting the peak memory use for reading them in half effectively for large allocations.
@elasticmachine elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Apr 27, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@@ -46,7 +47,7 @@
* when {@link #expand()} is called.
*/
public static <T extends Writeable> DelayableWriteable<T> delayed(Writeable.Reader<T> reader, StreamInput in) throws IOException {
return new Serialized<>(reader, in.getVersion(), in.namedWriteableRegistry(), in.readBytesReference());
return new Serialized<>(reader, in.getVersion(), in.namedWriteableRegistry(), in.readReleasableBytesReference());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would literally create O(100M) sized byte arrays in a recent user issue, completely locking up GC on the coordinating node.

Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know we had a way to grab a slice of the buffer! That's quite nice. Is it safe to hold onto that reference for longer than the request? For, like, a lot longer?

@jimczi
Copy link
Contributor

jimczi commented Apr 27, 2021

I didn't know we had a way to grab a slice of the buffer!

Same here, this is great!
Considering that aggs should be the biggest part of the response that shouldn't be a problem. The next step but that's a different scope would be to keep the entire response in the network buffer until we reduce it. We also discussed some ways to put these bytes on disk at the network level so this change is a good step forward imo.

This would literally create O(100M) sized byte arrays in a recent user issue

100M is indeed problematic but I wonder if this is not an abusive case that we should limit somewhere else. If a single shard response in a binary form has this size, I don't want to see the final json response.

@original-brownbear
Copy link
Member Author

Is it safe to hold onto that reference for longer than the request? For, like, a lot longer?

Yea as long as you eventually release the buffers and it's not a situation where you only care about a small piece of whatever buffer (which this isn't I'd say) it's completely safe. We're using the same mechanism already for recovery chunks, ccr-chunks and the cluster state.

100M is indeed problematic but I wonder if this is not an abusive case that we should limit somewhere else. If a single shard response in a binary form has this size, I don't want to see the final json response.

Yea the linked issue is definitely an outlier I guess, but even if we're talking about O(1M) it's not great to allocate a bunch of byte arrays of that size all the time when working with G1GC (even after our fixes to settings).

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic in the code looks good to me.
Do we check that buffers are released in our tests suite ? Or do we need explicit tests to ensure that we're not leaking bytes ? I really like this change but it makes me a little bit nervous considering the impact of a bug.

@@ -98,7 +99,8 @@ public T expand() {
} catch (IOException e) {
throw new RuntimeException("unexpected error writing writeable to buffer", e);
}
return new Serialized<>(reader, Version.CURRENT, registry, buffer.bytes());
// TODO: this path is currently not used in production code, if it ever is this should start using pooled buffers
return new Serialized<>(reader, Version.CURRENT, registry, ReleasableBytesReference.wrap(buffer.bytes()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

production mode activated!

@nik9000
Copy link
Member

nik9000 commented Apr 27, 2021 via email

@original-brownbear
Copy link
Member Author

@nik9000 @jimczi yes we have leak tracking infrastructure for all kinds of buffers. For our own buffer pool it was added in #67688 and we have similar infrastructure for Netty based buffers running in all tests including the REST tests. So any leak in test-covered code paths will show up for sure.

@nik9000
Copy link
Member

nik9000 commented Apr 28, 2021 via email

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM too !

@original-brownbear
Copy link
Member Author

Thanks Nik + Jim!

@original-brownbear original-brownbear merged commit 50bb6d8 into elastic:master Apr 28, 2021
@original-brownbear original-brownbear deleted the pool-aggregation-bytes branch April 28, 2021 11:06
original-brownbear added a commit that referenced this pull request Apr 28, 2021
These aggregations can be of considerable size. We must not allocate a single `byte[]` of them.
Especially nowadays, using G1GC, contiguous allocations of this size are problematic.
This commit makes it so that we take the aggregation bytes as a slice out of the network buffers in
a 0-copy fashion, cutting the peak memory use for reading them in half effectively for large allocations.
henningandersen added a commit to henningandersen/elasticsearch that referenced this pull request May 12, 2021
If consuming a query result were disrupted by circuit breaker we would
leak memory for aggs in buffered query results, fixed.

Relates elastic#62439 and elastic#72309
henningandersen added a commit that referenced this pull request May 12, 2021
If consuming a query result were disrupted by circuit breaker we would
leak memory for aggs in buffered query results, fixed.

Relates #62439 and #72309

Closes #72923
henningandersen added a commit that referenced this pull request May 12, 2021
If consuming a query result were disrupted by circuit breaker we would
leak memory for aggs in buffered query results, fixed.

Relates #62439 and #72309

Closes #72923
henningandersen added a commit that referenced this pull request Oct 5, 2021
When a CCS search is proxied, the memory for the aggregations on the
proxy node would not be freed.

Now only use the non-copying byte referencing version on the coordinating node,
which itself ensures that memory is freed by calling `consumeAggs`.

Relates #72309
henningandersen added a commit that referenced this pull request Oct 5, 2021
When a CCS search is proxied, the memory for the aggregations on the
proxy node would not be freed.

Now only use the non-copying byte referencing version on the coordinating node,
which itself ensures that memory is freed by calling `consumeAggs`.

Relates #72309
@original-brownbear original-brownbear restored the pool-aggregation-bytes branch April 18, 2023 20:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v7.14.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants