-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce garbage for requests with unpooled buffers #32228
Reduce garbage for requests with unpooled buffers #32228
Conversation
With this commit we avoiding copying request contents on the HTTP layer (NIO and Netty 4) when the buffer that holds the request body contents is unpooled. As the unpooled allocator is usually used on smaller heaps this helps to reduce garbage further in those situations.
Pinging @elastic/es-core-infra |
@jasontedor can you please review the Netty parts? I plan to backport the Netty parts also to 6.x. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Theoretically this LGTM. But I think that it will cause the build to fail once #32354 is merged.
I think we always want to release the request? Maybe for the Unpooled version we want request.content().duplicate()
? And then release the request?
I actually want to point out that this is also not strictly compliant with what is possible in netty. The following is valid:
This creates a composite byte buffer that is backed by a pooled byte buffer. But uses the unpooled allocator. This would be a memory leak in this PR. I don't think this should happen since we tend to only use one allocator type at once. But it is possible. I think maybe we should address this PR by converting the bytes to not be a netty thing? |
I guess this is a little tricky right now. So probably not the best approach for this PR. |
@tbrooks8 after our discussion yesterday I added an additional assert that checks that all associated buffers are unpooled so we get failures if the assertion is violated. Can you please have a look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM -
Looking at netty - If you set -Dio.netty.allocator.numHeapArenas=0
it looks like the pool allocator delegates to Unpooled.
if (heapArena != null) {
buf = heapArena.allocate(cache, initialCapacity, maxCapacity);
} else {
buf = PlatformDependent.hasUnsafe() ?
new UnpooledUnsafeHeapByteBuf(this, initialCapacity, maxCapacity) :
new UnpooledHeapByteBuf(this, initialCapacity, maxCapacity);
}
We should maybe look to a follow up where we have that set to 0 in the unpooled case. Like we set both the unpooled
system property (making it the default) and heap arenas = 0 (making a manual usage of pooled allocator be delegated to unpooled).
An then maybe the logic can be that those settings are enabled?
Thanks for review and good point with the follow-up. When we choose the unpooled allocator ergonomically, this is a pretty simple change (code-wise). When it is set by the user in |
I just want to provide an update why I held back merging this change. First of all, I managed to fix this issue for all allocator types (pooled and unpooled) by releasing the buffer at the appropriate place. Then I ran several benchmarks and it turns out that with this change indexing throughput actually decreases by roughly 10% (I tested several of our Rally tracks on 1GB and 4GB heaps). Profiling reveals the reason: The bulk API uses the buffer's random access API (i.e. As a next step I want to benchmark the performance impact on our NIO implementation but I'll wait until #32757 is merged. |
After running several benchmarks also for the NIO implementation with different workloads and heap sizes it turns out that we do not get a practical benefit from avoiding this copy. Therefore, I'm closing the PR unmerged. |
* Copying the request is not necessary here. We can simply release it once the response has been generated and a lot of `Unpooled` allocations that way * Relates elastic#32228 * I think the issue that preventet that PR that PR from being merged was solved by elastic#39634 that moved the bulk index marker search to ByteBuf bulk access so the composite buffer shouldn't require many additional bounds checks (I'd argue the bounds checks we add, we save when copying the composite buffer) * I couldn't neccessarily reproduce much of a speedup from this change, but I could reproduce a very measureable reduction in GC time with e.g. Rally's PMC (4g heap node and bulk requests of size 5k saw a reduction in young GC time by ~10% for me)
* Copying the request is not necessary here. We can simply release it once the response has been generated and a lot of `Unpooled` allocations that way * Relates #32228 * I think the issue that preventet that PR that PR from being merged was solved by #39634 that moved the bulk index marker search to ByteBuf bulk access so the composite buffer shouldn't require many additional bounds checks (I'd argue the bounds checks we add, we save when copying the composite buffer) * I couldn't neccessarily reproduce much of a speedup from this change, but I could reproduce a very measureable reduction in GC time with e.g. Rally's PMC (4g heap node and bulk requests of size 5k saw a reduction in young GC time by ~10% for me)
* Copying the request is not necessary here. We can simply release it once the response has been generated and a lot of `Unpooled` allocations that way * Relates elastic#32228 * I think the issue that preventet that PR that PR from being merged was solved by elastic#39634 that moved the bulk index marker search to ByteBuf bulk access so the composite buffer shouldn't require many additional bounds checks (I'd argue the bounds checks we add, we save when copying the composite buffer) * I couldn't neccessarily reproduce much of a speedup from this change, but I could reproduce a very measureable reduction in GC time with e.g. Rally's PMC (4g heap node and bulk requests of size 5k saw a reduction in young GC time by ~10% for me)
* Copying the request is not necessary here. We can simply release it once the response has been generated and a lot of `Unpooled` allocations that way * Relates #32228 * I think the issue that preventet that PR that PR from being merged was solved by #39634 that moved the bulk index marker search to ByteBuf bulk access so the composite buffer shouldn't require many additional bounds checks (I'd argue the bounds checks we add, we save when copying the composite buffer) * I couldn't neccessarily reproduce much of a speedup from this change, but I could reproduce a very measureable reduction in GC time with e.g. Rally's PMC (4g heap node and bulk requests of size 5k saw a reduction in young GC time by ~10% for me)
* Copying the request is not necessary here. We can simply release it once the response has been generated and a lot of `Unpooled` allocations that way * Relates elastic#32228 * I think the issue that preventet that PR that PR from being merged was solved by elastic#39634 that moved the bulk index marker search to ByteBuf bulk access so the composite buffer shouldn't require many additional bounds checks (I'd argue the bounds checks we add, we save when copying the composite buffer) * I couldn't neccessarily reproduce much of a speedup from this change, but I could reproduce a very measureable reduction in GC time with e.g. Rally's PMC (4g heap node and bulk requests of size 5k saw a reduction in young GC time by ~10% for me)
With this commit we avoiding copying request contents on the HTTP layer
(NIO and Netty 4) when the buffer that holds the request body contents
is unpooled. As the unpooled allocator is usually used on smaller heaps
this helps to reduce garbage further in those situations.