-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce memory usage on Azure repository implementation #66489
Conversation
This commit moves the upload logic to the repository itself instead of delegating into the SDK. Multi-block uploads are done sequentially instead of in parallel that allows to bound the outstanding memory. Additionally the number of i/o threads have been reduced to 1, to reduce the memory overhead.
Pinging @elastic/es-distributed (Team:Distributed) |
@@ -150,7 +150,6 @@ private static ByteBufAllocator createByteBufAllocator() { | |||
int tinyCacheSize = PooledByteBufAllocator.defaultTinyCacheSize(); | |||
int smallCacheSize = PooledByteBufAllocator.defaultSmallCacheSize(); | |||
int normalCacheSize = PooledByteBufAllocator.defaultNormalCacheSize(); | |||
boolean useCacheForAllThreads = PooledByteBufAllocator.defaultUseCacheForAllThreads(); | |||
|
|||
return new PooledByteBufAllocator(false, | |||
nHeapArena, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we can reduce those too, wdyt @original-brownbear?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, can't we just do 1
maybe when we only use one thread now anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Francisco. I only have one question pretty much.
That said, I think the move to a single thread and reducing the heap arenas are straight forward in any case. Lets see what we can do about the upload flux. I guess if that turns out to be trickier we could pull the arena and thread count changes to a separate PR so we can reenable tests asap maybe?
@@ -150,7 +150,6 @@ private static ByteBufAllocator createByteBufAllocator() { | |||
int tinyCacheSize = PooledByteBufAllocator.defaultTinyCacheSize(); | |||
int smallCacheSize = PooledByteBufAllocator.defaultSmallCacheSize(); | |||
int normalCacheSize = PooledByteBufAllocator.defaultNormalCacheSize(); | |||
boolean useCacheForAllThreads = PooledByteBufAllocator.defaultUseCacheForAllThreads(); | |||
|
|||
return new PooledByteBufAllocator(false, | |||
nHeapArena, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, can't we just do 1
maybe when we only use one thread now anyway?
int numOfBytesRead = 0; | ||
int offset = 0; | ||
int len = (int) count; | ||
final byte[] buffer = new byte[len]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still scares me a little. If we only do 64k at a time here, can't we use the Netty memory allocator (or manage our own set of byte[]
and recycle them on doOnComplete
or do we still have no guarantees about flushing at that point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sadly we don't have guarantees about flushing at that point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I explored a different approach where I was passing an allocator there, and recycling at the end of the request, but in that case you end up holding that memory for the entire duration of the request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed this on a different channel and it looks like there is no easy way to get rid of the allocations in uploading (SDK does the same thing under the hood as well).
=> LGTM :)
This commit moves the upload logic to the repository itself instead of delegating into the SDK. Multi-block uploads are done sequentially instead of in parallel that allows to bound the outstanding memory. Additionally the number of i/o threads and heap arenas have been reduced to 1, to reduce the memory overhead. Closes elastic#66385 Backport of elastic#66489
This commit moves the upload logic to the repository itself instead of delegating into the SDK. Multi-block uploads are done sequentially instead of in parallel that allows to bound the outstanding memory. Additionally the number of i/o threads and heap arenas have been reduced to 1, to reduce the memory overhead. Closes elastic#66385 Backport of elastic#66489
This commit moves the upload logic to the repository itself instead of delegating into the SDK. Multi-block uploads are done sequentially instead of in parallel that allows to bound the outstanding memory. Additionally the number of i/o threads and heap arenas have been reduced to 1, to reduce the memory overhead. Closes #66385 Backport of #66489
This commit moves the upload logic to the repository itself instead of delegating into the SDK. Multi-block uploads are done sequentially instead of in parallel that allows to bound the outstanding memory. Additionally the number of i/o threads and heap arenas have been reduced to 1, to reduce the memory overhead. Closes #66385 Backport of #66489
Today we represent block IDs sent to Azure using the URL-safe base-64 encoding. This makes sense: these IDs appear in URLs. It turns out that Azure rejects this encoding for block IDs and instead demands that they are represented using the regular, URL-unsafe, base-64 encoding instead, then further wrapped in %-encoding to deal with the URL-unsafe characters that inevitably result. Relates elastic#66489
Today we represent block IDs sent to Azure using the URL-safe base-64 encoding. This makes sense: these IDs appear in URLs. It turns out that Azure rejects this encoding for block IDs and instead demands that they are represented using the regular, URL-unsafe, base-64 encoding instead, then further wrapped in %-encoding to deal with the URL-unsafe characters that inevitably result. Relates elastic#66489 Backport of elastic#68957
Today we represent block IDs sent to Azure using the URL-safe base-64 encoding. This makes sense: these IDs appear in URLs. It turns out that Azure rejects this encoding for block IDs and instead demands that they are represented using the regular, URL-unsafe, base-64 encoding instead, then further wrapped in %-encoding to deal with the URL-unsafe characters that inevitably result. Relates elastic#66489 Backport of elastic#68957
Today we represent block IDs sent to Azure using the URL-safe base-64 encoding. This makes sense: these IDs appear in URLs. It turns out that Azure rejects this encoding for block IDs and instead demands that they are represented using the regular, URL-unsafe, base-64 encoding instead, then further wrapped in %-encoding to deal with the URL-unsafe characters that inevitably result. Relates #66489
Today we represent block IDs sent to Azure using the URL-safe base-64 encoding. This makes sense: these IDs appear in URLs. It turns out that Azure rejects this encoding for block IDs and instead demands that they are represented using the regular, URL-unsafe, base-64 encoding instead, then further wrapped in %-encoding to deal with the URL-unsafe characters that inevitably result. Relates #66489 Backport of #68957
Today we represent block IDs sent to Azure using the URL-safe base-64 encoding. This makes sense: these IDs appear in URLs. It turns out that Azure rejects this encoding for block IDs and instead demands that they are represented using the regular, URL-unsafe, base-64 encoding instead, then further wrapped in %-encoding to deal with the URL-unsafe characters that inevitably result. Relates #66489 Backport of #68957
This commit moves the upload logic to the repository itself
instead of delegating into the SDK.
Multi-block uploads are done sequentially instead of in parallel
that allows to bound the outstanding memory.
Additionally the number of i/o threads have been reduced to 1,
to reduce the memory overhead.
Closes #66385