Read Aggregations Directly from Pooled Buffers #72309

original-brownbear · 2021-04-27T12:39:44Z

These aggregations can be of considerable size. We must not allocate a single byte[] of them.
Especially nowadays, using G1GC, contiguous allocations of this size are problematic.
This commit makes it so that we take the aggregation bytes as a slice out of the network buffers in
a 0-copy fashion, cutting the peak memory use for reading them in half effectively for large allocations.

These aggregations can be of considerable size. We must not allocate a single `byte[]` of them. Especially nowadays, using G1GC, contiguous allocations of this size are problematic. This commit makes it so that we take the aggregation bytes as a slice out of the network buffers in a 0-copy fashion, cutting the peak memory use for reading them in half effectively for large allocations.

elasticmachine · 2021-04-27T12:39:48Z

Pinging @elastic/es-analytics-geo (Team:Analytics)

original-brownbear · 2021-04-27T12:43:08Z

server/src/main/java/org/elasticsearch/common/io/stream/DelayableWriteable.java

@@ -46,7 +47,7 @@
     * when {@link #expand()} is called.
     */
    public static <T extends Writeable> DelayableWriteable<T> delayed(Writeable.Reader<T> reader, StreamInput in) throws IOException {
-        return new Serialized<>(reader, in.getVersion(), in.namedWriteableRegistry(), in.readBytesReference());
+        return new Serialized<>(reader, in.getVersion(), in.namedWriteableRegistry(), in.readReleasableBytesReference());


This would literally create O(100M) sized byte arrays in a recent user issue, completely locking up GC on the coordinating node.

nik9000

I didn't know we had a way to grab a slice of the buffer! That's quite nice. Is it safe to hold onto that reference for longer than the request? For, like, a lot longer?

jimczi · 2021-04-27T19:09:30Z

I didn't know we had a way to grab a slice of the buffer!

Same here, this is great!
Considering that aggs should be the biggest part of the response that shouldn't be a problem. The next step but that's a different scope would be to keep the entire response in the network buffer until we reduce it. We also discussed some ways to put these bytes on disk at the network level so this change is a good step forward imo.

This would literally create O(100M) sized byte arrays in a recent user issue

100M is indeed problematic but I wonder if this is not an abusive case that we should limit somewhere else. If a single shard response in a binary form has this size, I don't want to see the final json response.

original-brownbear · 2021-04-27T19:25:06Z

Is it safe to hold onto that reference for longer than the request? For, like, a lot longer?

Yea as long as you eventually release the buffers and it's not a situation where you only care about a small piece of whatever buffer (which this isn't I'd say) it's completely safe. We're using the same mechanism already for recovery chunks, ccr-chunks and the cluster state.

100M is indeed problematic but I wonder if this is not an abusive case that we should limit somewhere else. If a single shard response in a binary form has this size, I don't want to see the final json response.

Yea the linked issue is definitely an outlier I guess, but even if we're talking about O(1M) it's not great to allocate a bunch of byte arrays of that size all the time when working with G1GC (even after our fixes to settings).

jimczi

The logic in the code looks good to me.
Do we check that buffers are released in our tests suite ? Or do we need explicit tests to ensure that we're not leaking bytes ? I really like this change but it makes me a little bit nervous considering the impact of a bug.

jimczi · 2021-04-27T19:27:06Z

server/src/main/java/org/elasticsearch/common/io/stream/DelayableWriteable.java

@@ -98,7 +99,8 @@ public T expand() {
            } catch (IOException e) {
                throw new RuntimeException("unexpected error writing writeable to buffer", e);
            }
-            return new Serialized<>(reader, Version.CURRENT, registry, buffer.bytes());
+            // TODO: this path is currently not used in production code, if it ever is this should start using pooled buffers
+            return new Serialized<>(reader, Version.CURRENT, registry, ReleasableBytesReference.wrap(buffer.bytes()));


production mode activated!

nik9000 · 2021-04-27T19:47:31Z

+1 to what Jim said. If the test suite auto-enforces that we close this like we do with BigArrays then I'm good.

…

On Tue, Apr 27, 2021, 15:30 Jim Ferenczi ***@***.***> wrote: ***@***.**** commented on this pull request. The logic in the code looks good to me. Do we check that buffers are released in our tests suite ? Or do we need explicit tests to ensure that we're not leaking bytes ? I really like this change but it makes me a little bit nervous considering the impact of a bug. ------------------------------ In server/src/main/java/org/elasticsearch/common/io/stream/DelayableWriteable.java <#72309 (comment)> : > @@ -98,7 +99,8 @@ public T expand() { } catch (IOException e) { throw new RuntimeException("unexpected error writing writeable to buffer", e); } - return new Serialized<>(reader, Version.CURRENT, registry, buffer.bytes()); + // TODO: this path is currently not used in production code, if it ever is this should start using pooled buffers + return new Serialized<>(reader, Version.CURRENT, registry, ReleasableBytesReference.wrap(buffer.bytes())); production mode activated! — You are receiving this because you are on a team that was mentioned. Reply to this email directly, view it on GitHub <#72309 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABUXIVPFNHNTBD3RJRTQDDTK4GGTANCNFSM43U5HM4Q> .

original-brownbear · 2021-04-28T09:58:12Z

@nik9000 @jimczi yes we have leak tracking infrastructure for all kinds of buffers. For our own buffer pool it was added in #67688 and we have similar infrastructure for Netty based buffers running in all tests including the REST tests. So any leak in test-covered code paths will show up for sure.

nik9000 · 2021-04-28T10:02:44Z

Lgtm then!

…

On Wed, Apr 28, 2021, 05:58 Armin Braun ***@***.***> wrote: @nik9000 <https://github.com/nik9000> @jimczi <https://github.com/jimczi> yes we have leak tracking infrastructure for all kinds of buffers. For our own buffer pool it was added in #67688 <#67688> and we have similar infrastructure for Netty based buffers running in all tests including the REST tests. So any leak in test-covered code paths will show up for sure. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#72309 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABUXIRV5WIQFGCK7E5EZF3TK7L4RANCNFSM43U5HM4Q> .

jimczi

LGTM too !

original-brownbear · 2021-04-28T11:06:22Z

Thanks Nik + Jim!

These aggregations can be of considerable size. We must not allocate a single `byte[]` of them. Especially nowadays, using G1GC, contiguous allocations of this size are problematic. This commit makes it so that we take the aggregation bytes as a slice out of the network buffers in a 0-copy fashion, cutting the peak memory use for reading them in half effectively for large allocations.

If consuming a query result were disrupted by circuit breaker we would leak memory for aggs in buffered query results, fixed. Relates elastic#62439 and elastic#72309

If consuming a query result were disrupted by circuit breaker we would leak memory for aggs in buffered query results, fixed. Relates #62439 and #72309 Closes #72923

When a CCS search is proxied, the memory for the aggregations on the proxy node would not be freed. Now only use the non-copying byte referencing version on the coordinating node, which itself ensures that memory is freed by calling `consumeAggs`. Relates #72309

original-brownbear added >non-issue :Analytics/Aggregations Aggregations v8.0.0 v7.14.0 labels Apr 27, 2021

elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Apr 27, 2021

original-brownbear commented Apr 27, 2021

View reviewed changes

original-brownbear requested review from jimczi and nik9000 April 27, 2021 13:56

nik9000 reviewed Apr 27, 2021

View reviewed changes

original-brownbear requested a review from nik9000 April 27, 2021 19:25

jimczi reviewed Apr 27, 2021

View reviewed changes

original-brownbear requested a review from jimczi April 28, 2021 09:58

jimczi approved these changes Apr 28, 2021

View reviewed changes

jimczi added >enhancement and removed >non-issue labels Apr 28, 2021

original-brownbear merged commit 50bb6d8 into elastic:master Apr 28, 2021

original-brownbear deleted the pool-aggregation-bytes branch April 28, 2021 11:06

original-brownbear mentioned this pull request Apr 28, 2021

Read Aggregations to Pooled Buffers (#72309) #72370

Merged

henningandersen mentioned this pull request May 12, 2021

Release memory held by aggs on failure #72966

Merged

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

This was referenced Sep 30, 2021

Fix aggregation memory leak for CCS #78404

Merged

Fix aggregation memory leak for CCS (#78404) #78604

Merged

original-brownbear restored the pool-aggregation-bytes branch April 18, 2023 20:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Read Aggregations Directly from Pooled Buffers #72309

Read Aggregations Directly from Pooled Buffers #72309

original-brownbear commented Apr 27, 2021

elasticmachine commented Apr 27, 2021

original-brownbear Apr 27, 2021

nik9000 left a comment

jimczi commented Apr 27, 2021

original-brownbear commented Apr 27, 2021

jimczi left a comment

jimczi Apr 27, 2021

nik9000 commented Apr 27, 2021 via email

original-brownbear commented Apr 28, 2021

nik9000 commented Apr 28, 2021 via email

jimczi left a comment

original-brownbear commented Apr 28, 2021

Read Aggregations Directly from Pooled Buffers #72309

Read Aggregations Directly from Pooled Buffers #72309

Conversation

original-brownbear commented Apr 27, 2021

elasticmachine commented Apr 27, 2021

original-brownbear Apr 27, 2021

Choose a reason for hiding this comment

nik9000 left a comment

Choose a reason for hiding this comment

jimczi commented Apr 27, 2021

original-brownbear commented Apr 27, 2021

jimczi left a comment

Choose a reason for hiding this comment

jimczi Apr 27, 2021

Choose a reason for hiding this comment

nik9000 commented Apr 27, 2021 via email

original-brownbear commented Apr 28, 2021

nik9000 commented Apr 28, 2021 via email

jimczi left a comment

Choose a reason for hiding this comment

original-brownbear commented Apr 28, 2021