-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade Azure repository SDK to v12 #65140
Conversation
|
||
return threadPool.scheduler().scheduleAtFixedRate(() -> { | ||
try { | ||
delegate.execute(decoratedCommand); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible that the command is executed some time after the specified period
if the thread pool backed by delegate
is over subscribed. This is used for timeouts and retries, I guess we could live with a less precise timers but still control the executor in which that code is executed.
...azure/src/test/java/org/elasticsearch/repositories/azure/AzureBlobContainerRetriesTests.java
Outdated
Show resolved
Hide resolved
...pository-azure/src/main/java/org/elasticsearch/repositories/azure/AzureRepositoryPlugin.java
Outdated
Show resolved
Hide resolved
// Most of the code that needs special permissions (i.e. jackson serializers generation) is executed | ||
// in the event loop executor. That's the reason why we should provide an executor that allows the | ||
// execution of privileged code | ||
final EventLoopGroup eventLoopGroup = new NioEventLoopGroup(eventLoopThreadsFromSettings(settings), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I point out here most of the response handling code is executed in the netty event loop executor, meaning that we should provide a privileged environment to the tasks executed in that event loop.
I tried different approaches to solve this issue, one of them was to create a wrapper around
com.azure.core.http.HttpClient
that would publish the emitted response in a different thread pool:
Mono<HttpResponse> send(HttpRequest request) {
return delegate.send(request).publishOn(privilegedExecutor);
}
But it seems like the HttpResponse
is parsed before, so that didn't help.
Maybe I'm missing something here and we can provide a more fine-grained solution.
private final String container; | ||
|
||
public AzureHttpHandler(final String container) { | ||
public AzureHttpHandler(final String account, final String container) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In te latest SDK if the endpoint uses an ip host, it expects the account in the URI path, see com.azure.storage.blob.BlobUrlParts#parseIpUrl
(it doesn't validate it there, but it fails later on as it expects the blob name to be on the 3 part of the path, i.e. http://192.168.1.20:8080/account/container/blob
.)
|
||
private final ThreadPool threadPool; | ||
private final String reactorExecutorName; | ||
private final EventLoopGroup eventLoopGroup; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All this resources are shared by all the clients, so we only need to manage the lifecycle of this component and the clients can be stateless.
...repository-azure/src/main/java/org/elasticsearch/repositories/azure/AzureClientProvider.java
Outdated
Show resolved
Hide resolved
private static final TimeValue DEFAULT_CONNECTION_TIMEOUT = TimeValue.timeValueSeconds(30); | ||
private static final TimeValue DEFAULT_MAX_CONNECTION_IDLE_TIME = TimeValue.timeValueSeconds(60); | ||
private static final int DEFAULT_MAX_CONNECTIONS = Math.min(Runtime.getRuntime().availableProcessors(), 8) * 2; | ||
private static final int DEFAULT_EVENT_LOOP_THREAD_COUNT = Math.min(Runtime.getRuntime().availableProcessors(), 8) * 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this is too much? It's capped to 16
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if just 1
might actually be good enough here? We're not doing anything CPU intensive on the IO threads I think? (haven't bench-marked this though, maybe we can get some hard data for this from our benchmarking efforts?, my guess would be 1
is good enough and maybe we can verify that or experiment out the actual optimal value).
I would hope that there is no hidden SDK issue that makes 16
useful here at least :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small aside, can go into a follow-up (I can deal with it later this week): I think we can make this 1
for now. The consumers that consume whatever buffers this pool produces will never be able to grind through what a single network thread can produce in throughput and the memory savings from reducing the thread count are non-trivial here. We can do a few official benchmarks to this effect though :)
Pinging @elastic/es-distributed (Team:Distributed) |
jenkins test this |
@fcofdez there seems to be some bug with deleting blobs left here? CI fails on some of the delete blob tests and this fails for me locally as well:
seems like this may just be a missing permission grant? |
Yes, it's related with a change introduced recently in the security side that restricts the permissions that can be granted to plugins. #64751. I'm waiting to see if there's a workaround for this. |
Sorry @original-brownbear, you were right. I thought I was providing the proper permissions to all tasks, but I missed the reactor schedulers. It should be fixed now. |
No worries @fcofdez ! Unfortunately it looks like things are still broken due to permissions trouble, see failing test:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Francisco: a few smaller comments. Sorry for being so slow here, I'll pick this up at a faster pace tomorrow!
plugins/repository-azure/src/main/java/org/elasticsearch/repositories/azure/AzureBlobStore.java
Outdated
Show resolved
Hide resolved
private static final TimeValue DEFAULT_CONNECTION_TIMEOUT = TimeValue.timeValueSeconds(30); | ||
private static final TimeValue DEFAULT_MAX_CONNECTION_IDLE_TIME = TimeValue.timeValueSeconds(60); | ||
private static final int DEFAULT_MAX_CONNECTIONS = Math.min(Runtime.getRuntime().availableProcessors(), 8) * 2; | ||
private static final int DEFAULT_EVENT_LOOP_THREAD_COUNT = Math.min(Runtime.getRuntime().availableProcessors(), 8) * 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if just 1
might actually be good enough here? We're not doing anything CPU intensive on the IO threads I think? (haven't bench-marked this though, maybe we can get some hard data for this from our benchmarking efforts?, my guess would be 1
is good enough and maybe we can verify that or experiment out the actual optimal value).
I would hope that there is no hidden SDK issue that makes 16
useful here at least :)
...repository-azure/src/main/java/org/elasticsearch/repositories/azure/AzureClientProvider.java
Outdated
Show resolved
Hide resolved
...pository-azure/src/main/java/org/elasticsearch/repositories/azure/AzureRepositoryPlugin.java
Show resolved
Hide resolved
190a747
to
381b33d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry @fcofdez didn't get through the whole thing today, but I figured I'd add the comment on the stream read asap so we can discuss it. Let me know what you think there.
So far looks great though apart from that, thanks!
...azure/src/test/java/org/elasticsearch/repositories/azure/AzureBlobContainerRetriesTests.java
Outdated
Show resolved
Hide resolved
plugins/repository-azure/src/main/java/org/elasticsearch/repositories/azure/AzureBlobStore.java
Outdated
Show resolved
Hide resolved
plugins/repository-azure/src/main/java/org/elasticsearch/repositories/azure/AzureBlobStore.java
Outdated
Show resolved
Hide resolved
plugins/repository-azure/src/main/java/org/elasticsearch/repositories/azure/AzureBlobStore.java
Outdated
Show resolved
Hide resolved
@original-brownbear as we discussed I've implemented the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @fcofdez the idea looks great. I have a question on the implementation around the queue though, I might be misunderstanding something here?
plugins/repository-azure/src/main/java/org/elasticsearch/repositories/azure/AzureBlobStore.java
Outdated
Show resolved
Hide resolved
...e/src/main/java/org/elasticsearch/repositories/azure/CancellableRateLimitedFluxIterator.java
Show resolved
Hide resolved
...e/src/main/java/org/elasticsearch/repositories/azure/CancellableRateLimitedFluxIterator.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the noisy review, but one important comment inline about the max blob size to look at asap IMO if it's not a random oversight
...repository-azure/src/main/java/org/elasticsearch/repositories/azure/AzureStorageService.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I gave this a bit of a spin locally and it's definitely using a little more memory than I would have hoped (quite a bit of per thread buffer use). I have some ideas to significantly lower this in a follow up (same tried and true fixes we applied on the transport layer already) and I don't think these should be a blocker for now.
-> Assuming CI goes green, LGTM :) Thanks again+so much for grinding through this Francisco, looks great!
Thanks for your review and patience @original-brownbear ! :) |
Upgrade Azure repository to the latest non blocking Azure SDK. Closes elastic#43309 Backport of elastic#65140 Co-authored-by: Ryan Ernst <[email protected]>
Upgrade Azure repository to the latest non blocking Azure SDK. Closes #43309 Backport of #65140 Co-authored-by: Ryan Ernst <[email protected]>
This PR upgrades the Azure repository SDK to v12.
I've added comments where I thought we could discuss if a different approach would make sense or some parts that might need more context.
Additionally I've opened a few issues on the azure SDK side to overcome some problems that I've encountered along the way:
Sorry for the size of this PR. There's quite a bit of boilerplate regarding new licenses and so on and I couldn't think of a way to implement this in smaller chunks.