Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce garbage for requests with unpooled buffers #32228

Closed

Conversation

danielmitterdorfer
Copy link
Member

With this commit we avoiding copying request contents on the HTTP layer
(NIO and Netty 4) when the buffer that holds the request body contents
is unpooled. As the unpooled allocator is usually used on smaller heaps
this helps to reduce garbage further in those situations.

With this commit we avoiding copying request contents on the HTTP layer
(NIO and Netty 4) when the buffer that holds the request body contents
is unpooled. As the unpooled allocator is usually used on smaller heaps
this helps to reduce garbage further in those situations.
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

@danielmitterdorfer
Copy link
Member Author

@jasontedor can you please review the Netty parts?
@tbrooks8 can you please review the NIO parts?

I plan to backport the Netty parts also to 6.x.

Copy link
Contributor

@Tim-Brooks Tim-Brooks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theoretically this LGTM. But I think that it will cause the build to fail once #32354 is merged.

I think we always want to release the request? Maybe for the Unpooled version we want request.content().duplicate()? And then release the request?

@Tim-Brooks
Copy link
Contributor

I actually want to point out that this is also not strictly compliant with what is possible in netty. The following is valid:

ByteBuf buffer1 = Unpooled.buffer(10);
ByteBuf buffer = PooledByteBufAllocator.DEFAULT.buffer(10);
CompositeByteBuf byteBufs = new CompositeByteBuf(UnpooledByteBufAllocator.DEFAULT, false, 2, buffer1, buffer);

This creates a composite byte buffer that is backed by a pooled byte buffer. But uses the unpooled allocator. This would be a memory leak in this PR. I don't think this should happen since we tend to only use one allocator type at once. But it is possible. I think maybe we should address this PR by converting the bytes to not be a netty thing?

@Tim-Brooks
Copy link
Contributor

I think maybe we should address this PR by converting the bytes to not be a netty thing?

I guess this is a little tricky right now. So probably not the best approach for this PR.

@danielmitterdorfer
Copy link
Member Author

danielmitterdorfer commented Aug 2, 2018

@tbrooks8 after our discussion yesterday I added an additional assert that checks that all associated buffers are unpooled so we get failures if the assertion is violated. Can you please have a look?

Copy link
Contributor

@Tim-Brooks Tim-Brooks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM -

Looking at netty - If you set -Dio.netty.allocator.numHeapArenas=0 it looks like the pool allocator delegates to Unpooled.

        if (heapArena != null) {
            buf = heapArena.allocate(cache, initialCapacity, maxCapacity);
        } else {
            buf = PlatformDependent.hasUnsafe() ?
                    new UnpooledUnsafeHeapByteBuf(this, initialCapacity, maxCapacity) :
                    new UnpooledHeapByteBuf(this, initialCapacity, maxCapacity);
        }

netty/netty#1315

We should maybe look to a follow up where we have that set to 0 in the unpooled case. Like we set both the unpooled system property (making it the default) and heap arenas = 0 (making a manual usage of pooled allocator be delegated to unpooled).

An then maybe the logic can be that those settings are enabled?

@danielmitterdorfer
Copy link
Member Author

Thanks for review and good point with the follow-up. When we choose the unpooled allocator ergonomically, this is a pretty simple change (code-wise). When it is set by the user in jvm.options it might be a bit tricky.

@danielmitterdorfer
Copy link
Member Author

I just want to provide an update why I held back merging this change. First of all, I managed to fix this issue for all allocator types (pooled and unpooled) by releasing the buffer at the appropriate place. Then I ran several benchmarks and it turns out that with this change indexing throughput actually decreases by roughly 10% (I tested several of our Rally tracks on 1GB and 4GB heaps).

Profiling reveals the reason: The bulk API uses the buffer's random access API (i.e. #get(index)) to find the next marker (i.e. the new line character for the JSON content type). If we copy the buffer, this is a cheap array access but if we don't, we access a CompositeByteBuffer instance which needs to check which of its components actually contains this index, calculate the offset and then access the buffer element at the right index. I implemented a proof of concept that is optimized for iteration but the results are still worse than with the buffer copy. So I think we should not merge the Netty parts of this PR.

As a next step I want to benchmark the performance impact on our NIO implementation but I'll wait until #32757 is merged.

@danielmitterdorfer
Copy link
Member Author

After running several benchmarks also for the NIO implementation with different workloads and heap sizes it turns out that we do not get a practical benefit from avoiding this copy. Therefore, I'm closing the PR unmerged.

@danielmitterdorfer danielmitterdorfer removed the request for review from jasontedor August 24, 2018 10:28
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Jul 18, 2019
* Copying the request is not necessary here. We can simply release it once the response has been generated and a lot of `Unpooled` allocations that way
* Relates elastic#32228
   * I think the issue that preventet that PR  that PR from being merged was solved by elastic#39634 that moved the bulk index marker search to ByteBuf bulk access so the composite buffer shouldn't require many additional bounds checks  (I'd argue the bounds checks we add, we save when copying the composite buffer)
* I couldn't neccessarily reproduce much of a speedup from this change, but I could reproduce a very measureable reduction in GC time with e.g. Rally's PMC (4g heap node and bulk requests of size 5k saw a reduction in young GC time by ~10% for me)
original-brownbear added a commit that referenced this pull request Dec 3, 2019
* Copying the request is not necessary here. We can simply release it once the response has been generated and a lot of `Unpooled` allocations that way
* Relates #32228
   * I think the issue that preventet that PR  that PR from being merged was solved by #39634 that moved the bulk index marker search to ByteBuf bulk access so the composite buffer shouldn't require many additional bounds checks  (I'd argue the bounds checks we add, we save when copying the composite buffer)
* I couldn't neccessarily reproduce much of a speedup from this change, but I could reproduce a very measureable reduction in GC time with e.g. Rally's PMC (4g heap node and bulk requests of size 5k saw a reduction in young GC time by ~10% for me)
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Dec 3, 2019
* Copying the request is not necessary here. We can simply release it once the response has been generated and a lot of `Unpooled` allocations that way
* Relates elastic#32228
   * I think the issue that preventet that PR  that PR from being merged was solved by elastic#39634 that moved the bulk index marker search to ByteBuf bulk access so the composite buffer shouldn't require many additional bounds checks  (I'd argue the bounds checks we add, we save when copying the composite buffer)
* I couldn't neccessarily reproduce much of a speedup from this change, but I could reproduce a very measureable reduction in GC time with e.g. Rally's PMC (4g heap node and bulk requests of size 5k saw a reduction in young GC time by ~10% for me)
original-brownbear added a commit that referenced this pull request Dec 4, 2019
* Copying the request is not necessary here. We can simply release it once the response has been generated and a lot of `Unpooled` allocations that way
* Relates #32228
   * I think the issue that preventet that PR  that PR from being merged was solved by #39634 that moved the bulk index marker search to ByteBuf bulk access so the composite buffer shouldn't require many additional bounds checks  (I'd argue the bounds checks we add, we save when copying the composite buffer)
* I couldn't neccessarily reproduce much of a speedup from this change, but I could reproduce a very measureable reduction in GC time with e.g. Rally's PMC (4g heap node and bulk requests of size 5k saw a reduction in young GC time by ~10% for me)
SivagurunathanV pushed a commit to SivagurunathanV/elasticsearch that referenced this pull request Jan 23, 2020
* Copying the request is not necessary here. We can simply release it once the response has been generated and a lot of `Unpooled` allocations that way
* Relates elastic#32228
   * I think the issue that preventet that PR  that PR from being merged was solved by elastic#39634 that moved the bulk index marker search to ByteBuf bulk access so the composite buffer shouldn't require many additional bounds checks  (I'd argue the bounds checks we add, we save when copying the composite buffer)
* I couldn't neccessarily reproduce much of a speedup from this change, but I could reproduce a very measureable reduction in GC time with e.g. Rally's PMC (4g heap node and bulk requests of size 5k saw a reduction in young GC time by ~10% for me)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Network Http and internode communication implementations >enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants