Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade Azure repository SDK to v12 #65140

Merged
merged 43 commits into from
Dec 15, 2020
Merged

Conversation

fcofdez
Copy link
Contributor

@fcofdez fcofdez commented Nov 17, 2020

This PR upgrades the Azure repository SDK to v12.

I've added comments where I thought we could discuss if a different approach would make sense or some parts that might need more context.

Additionally I've opened a few issues on the azure SDK side to overcome some problems that I've encountered along the way:

Sorry for the size of this PR. There's quite a bit of boilerplate regarding new licenses and so on and I couldn't think of a way to implement this in smaller chunks.

@fcofdez fcofdez added >enhancement :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v8.0.0 Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v7.11.0 labels Nov 17, 2020

return threadPool.scheduler().scheduleAtFixedRate(() -> {
try {
delegate.execute(decoratedCommand);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible that the command is executed some time after the specified period if the thread pool backed by delegate is over subscribed. This is used for timeouts and retries, I guess we could live with a less precise timers but still control the executor in which that code is executed.

// Most of the code that needs special permissions (i.e. jackson serializers generation) is executed
// in the event loop executor. That's the reason why we should provide an executor that allows the
// execution of privileged code
final EventLoopGroup eventLoopGroup = new NioEventLoopGroup(eventLoopThreadsFromSettings(settings),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I point out here most of the response handling code is executed in the netty event loop executor, meaning that we should provide a privileged environment to the tasks executed in that event loop.
I tried different approaches to solve this issue, one of them was to create a wrapper around
com.azure.core.http.HttpClient that would publish the emitted response in a different thread pool:

Mono<HttpResponse> send(HttpRequest request) {
         return delegate.send(request).publishOn(privilegedExecutor);
}

But it seems like the HttpResponse is parsed before, so that didn't help.

Maybe I'm missing something here and we can provide a more fine-grained solution.

private final String container;

public AzureHttpHandler(final String container) {
public AzureHttpHandler(final String account, final String container) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In te latest SDK if the endpoint uses an ip host, it expects the account in the URI path, see com.azure.storage.blob.BlobUrlParts#parseIpUrl (it doesn't validate it there, but it fails later on as it expects the blob name to be on the 3 part of the path, i.e. http://192.168.1.20:8080/account/container/blob.)


private final ThreadPool threadPool;
private final String reactorExecutorName;
private final EventLoopGroup eventLoopGroup;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All this resources are shared by all the clients, so we only need to manage the lifecycle of this component and the clients can be stateless.

private static final TimeValue DEFAULT_CONNECTION_TIMEOUT = TimeValue.timeValueSeconds(30);
private static final TimeValue DEFAULT_MAX_CONNECTION_IDLE_TIME = TimeValue.timeValueSeconds(60);
private static final int DEFAULT_MAX_CONNECTIONS = Math.min(Runtime.getRuntime().availableProcessors(), 8) * 2;
private static final int DEFAULT_EVENT_LOOP_THREAD_COUNT = Math.min(Runtime.getRuntime().availableProcessors(), 8) * 2;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this is too much? It's capped to 16

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if just 1 might actually be good enough here? We're not doing anything CPU intensive on the IO threads I think? (haven't bench-marked this though, maybe we can get some hard data for this from our benchmarking efforts?, my guess would be 1 is good enough and maybe we can verify that or experiment out the actual optimal value).
I would hope that there is no hidden SDK issue that makes 16 useful here at least :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small aside, can go into a follow-up (I can deal with it later this week): I think we can make this 1 for now. The consumers that consume whatever buffers this pool produces will never be able to grind through what a single network thread can produce in throughput and the memory savings from reducing the thread count are non-trivial here. We can do a few official benchmarks to this effect though :)

@fcofdez fcofdez marked this pull request as ready for review November 18, 2020 10:20
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@fcofdez
Copy link
Contributor Author

fcofdez commented Nov 20, 2020

jenkins test this

@original-brownbear
Copy link
Member

original-brownbear commented Nov 23, 2020

@fcofdez there seems to be some bug with deleting blobs left here?

CI fails on some of the delete blob tests and this fails for me locally as well:

/gradlew 'azure-test-goal-that-idea-doesnt-shotw' --tests "org.elasticsearch.repositories.azure.AzureBlobStoreRepositoryTests.testDeleteBlobs" -Dtests.seed=D55C7FBE34D4B07E -Dtests.locale=ru -Dtests.timezone=WET -Druntime.java=15

java.io.IOException: Unable to delete blobs

	at __randomizedtesting.SeedInfo.seed([D55C7FBE34D4B07E:1A096AB2490B57C]:0)
	at org.elasticsearch.repositories.azure.AzureBlobStore.deleteBlobsInBatches(AzureBlobStore.java:327)
	at org.elasticsearch.repositories.azure.AzureBlobStore.deleteBlobList(AzureBlobStore.java:286)
	at org.elasticsearch.repositories.azure.AzureBlobContainer.deleteBlobsIgnoringIfNotExists(AzureBlobContainer.java:117)
	at org.elasticsearch.repositories.blobstore.ESBlobStoreRepositoryIntegTestCase.testDeleteBlobs(ESBlobStoreRepositoryIntegTestCase.java:193)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:564)
	at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
	at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
	at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
	at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
	at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:375)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:824)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:475)
	at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
	at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
	at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
	at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
	at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:375)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:831)
	at java.base/java.lang.Thread.run(Thread.java:832)
Caused by: java.security.AccessControlException: access denied ("java.lang.RuntimePermission" "accessDeclaredMembers")
	at java.base/java.security.AccessControlContext.checkPermission(AccessControlContext.java:472)
	at java.base/java.security.AccessController.checkPermission(AccessController.java:1036)
	at java.base/java.lang.SecurityManager.checkPermission(SecurityManager.java:408)
	at java.base/java.lang.Class.checkMemberAccess(Class.java:3009)
	at java.base/java.lang.Class.getDeclaredFields(Class.java:2339)
	at com.fasterxml.jackson.databind.util.ClassUtil.getDeclaredFields(ClassUtil.java:1113)
	at com.fasterxml.jackson.databind.introspect.AnnotatedFieldCollector._findFields(AnnotatedFieldCollector.java:66)
	at com.fasterxml.jackson.databind.introspect.AnnotatedFieldCollector._findFields(AnnotatedFieldCollector.java:64)
	at com.fasterxml.jackson.databind.introspect.AnnotatedFieldCollector.collect(AnnotatedFieldCollector.java:41)
	at com.fasterxml.jackson.databind.introspect.AnnotatedFieldCollector.collectFields(AnnotatedFieldCollector.java:36)
	at com.fasterxml.jackson.databind.introspect.AnnotatedClass._fields(AnnotatedClass.java:349)
	at com.fasterxml.jackson.databind.introspect.AnnotatedClass.fields(AnnotatedClass.java:321)
	at com.fasterxml.jackson.databind.introspect.POJOPropertiesCollector._addFields(POJOPropertiesCollector.java:379)
	at com.fasterxml.jackson.databind.introspect.POJOPropertiesCollector.collectAll(POJOPropertiesCollector.java:308)
	at com.fasterxml.jackson.databind.introspect.POJOPropertiesCollector.getJsonValueAccessor(POJOPropertiesCollector.java:196)
	at com.fasterxml.jackson.databind.introspect.BasicBeanDescription.findJsonValueAccessor(BasicBeanDescription.java:252)
	at com.fasterxml.jackson.databind.ser.BasicSerializerFactory.findSerializerByAnnotations(BasicSerializerFactory.java:346)
	at com.fasterxml.jackson.databind.ser.BeanSerializerFactory._createSerializer2(BeanSerializerFactory.java:216)
	at com.fasterxml.jackson.databind.ser.BeanSerializerFactory.createSerializer(BeanSerializerFactory.java:165)
	at com.fasterxml.jackson.databind.SerializerProvider._createUntypedSerializer(SerializerProvider.java:1388)
	at com.fasterxml.jackson.databind.SerializerProvider._createAndCacheUntypedSerializer(SerializerProvider.java:1336)
	at com.fasterxml.jackson.databind.SerializerProvider.findValueSerializer(SerializerProvider.java:510)
	at com.fasterxml.jackson.databind.SerializerProvider.findTypedValueSerializer(SerializerProvider.java:713)
	at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:308)
	at com.fasterxml.jackson.databind.ObjectMapper._configAndWriteValue(ObjectMapper.java:4110)
	at com.fasterxml.jackson.databind.ObjectMapper.writeValue(ObjectMapper.java:3369)
	at com.azure.core.util.serializer.JacksonAdapter.serialize(JacksonAdapter.java:145)
	at com.azure.core.util.serializer.JacksonAdapter.serialize(JacksonAdapter.java:131)
	at com.azure.core.util.serializer.JacksonAdapter.serializeRaw(JacksonAdapter.java:155)
	at com.azure.core.http.rest.SwaggerMethodParser.serialize(SwaggerMethodParser.java:457)
	at com.azure.core.http.rest.SwaggerMethodParser.setHeaders(SwaggerMethodParser.java:324)
	at com.azure.core.http.rest.RestProxy.createHttpRequest(RestProxy.java:237)
	at com.azure.core.http.rest.RestProxy.invoke(RestProxy.java:121)
	at com.azure.storage.blob.implementation.$Proxy54.submitBatch(Unknown Source)
	at com.azure.storage.blob.implementation.ServicesImpl.submitBatchWithRestResponseAsync(ServicesImpl.java:314)
	at com.azure.storage.blob.batch.BlobBatchAsyncClient.lambda$submitBatchWithResponse$3(BlobBatchAsyncClient.java:126)
	at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:118)
	at reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1782)
	at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.drain(MonoIgnoreThen.java:147)
	at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.ignoreDone(MonoIgnoreThen.java:190)
	at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreInner.onComplete(MonoIgnoreThen.java:240)
	at reactor.core.publisher.MonoPeekTerminal$MonoTerminalPeekSubscriber.onComplete(MonoPeekTerminal.java:292)
	at reactor.core.publisher.MonoWhen$WhenCoordinator.signal(MonoWhen.java:208)
	at reactor.core.publisher.MonoWhen$WhenInner.onComplete(MonoWhen.java:285)
	at reactor.core.publisher.FluxContextStart$ContextStartSubscriber.onComplete(FluxContextStart.java:115)
	at reactor.core.publisher.MonoFlatMap$FlatMapMain.secondComplete(MonoFlatMap.java:189)
	at reactor.core.publisher.MonoFlatMap$FlatMapInner.onComplete(MonoFlatMap.java:260)
	at reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.onComplete(FluxMapFuseable.java:144)
	at reactor.core.publisher.MonoFlatMap$FlatMapMain.onComplete(MonoFlatMap.java:174)
	at reactor.core.publisher.FluxContextStart$ContextStartSubscriber.onComplete(FluxContextStart.java:115)
	at reactor.core.publisher.FluxDoOnEach$DoOnEachSubscriber.onComplete(FluxDoOnEach.java:209)
	at reactor.core.publisher.MonoFlatMap$FlatMapMain.onComplete(MonoFlatMap.java:174)
	at reactor.core.publisher.FluxMap$MapSubscriber.onComplete(FluxMap.java:136)
	at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.onComplete(Operators.java:2016)
	at reactor.core.publisher.MonoFlatMap$FlatMapMain.onComplete(MonoFlatMap.java:174)
	at reactor.core.publisher.FluxDelaySubscription$DelaySubscriptionMainSubscriber.onComplete(FluxDelaySubscription.java:190)
	at reactor.core.publisher.SerializedSubscriber.onComplete(SerializedSubscriber.java:146)
	at reactor.core.publisher.SerializedSubscriber.onComplete(SerializedSubscriber.java:146)
	at reactor.core.publisher.FluxTimeout$TimeoutMainSubscriber.onComplete(FluxTimeout.java:227)
	at reactor.core.publisher.MonoPeekTerminal$MonoTerminalPeekSubscriber.onComplete(MonoPeekTerminal.java:292)
	at reactor.core.publisher.FluxMapFuseable$MapFuseableConditionalSubscriber.onComplete(FluxMapFuseable.java:336)
	at reactor.core.publisher.FluxMapFuseable$MapFuseableConditionalSubscriber.onComplete(FluxMapFuseable.java:336)
	at reactor.core.publisher.Operators.complete(Operators.java:135)
	at reactor.core.publisher.FluxFlatMap.trySubscribeScalarMap(FluxFlatMap.java:144)
	at reactor.core.publisher.MonoFlatMap.subscribeOrReturn(MonoFlatMap.java:53)
	at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:57)
	at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:52)
	at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:64)
	at reactor.core.publisher.MonoDelaySubscription.accept(MonoDelaySubscription.java:52)
	at reactor.core.publisher.MonoDelaySubscription.accept(MonoDelaySubscription.java:33)
	at reactor.core.publisher.FluxDelaySubscription$DelaySubscriptionOtherSubscriber.onNext(FluxDelaySubscription.java:123)
	at reactor.core.publisher.MonoDelay$MonoDelayRunnable.run(MonoDelay.java:117)
	at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:68)
	at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:28)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:674)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
	... 1 more
	Suppressed: java.lang.Exception: #block terminated with an error
		at reactor.core.publisher.BlockingSingleSubscriber.blockingGet(BlockingSingleSubscriber.java:99)
		at reactor.core.publisher.Mono.block(Mono.java:1680)
		at org.elasticsearch.repositories.azure.AzureBlobStore.lambda$deleteBlobsInBatches$7(AzureBlobStore.java:318)
		at org.elasticsearch.repositories.azure.SocketAccess.lambda$doPrivilegedVoidException$0(SocketAccess.java:57)
		at java.base/java.security.AccessController.doPrivileged(AccessController.java:554)
		at org.elasticsearch.repositories.azure.SocketAccess.doPrivilegedVoidException(SocketAccess.java:56)
		at org.elasticsearch.repositories.azure.AzureBlobStore.deleteBlobsInBatches(AzureBlobStore.java:295)
		at org.elasticsearch.repositories.azure.AzureBlobStore.deleteBlobList(AzureBlobStore.java:286)
		at org.elasticsearch.repositories.azure.AzureBlobContainer.deleteBlobsIgnoringIfNotExists(AzureBlobContainer.java:117)
		at org.elasticsearch.repositories.blobstore.ESBlobStoreRepositoryIntegTestCase.testDeleteBlobs(ESBlobStoreRepositoryIntegTestCase.java:193)
		at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
		at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64)
		at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
		at java.base/java.lang.reflect.Method.invoke(Method.java:564)
		at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
		at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
		at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
		at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
		at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
		at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
		at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
		at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
		at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
		at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
		at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
		at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:375)
		at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:824)
		at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:475)
		at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
		at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
		at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
		at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
		at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
		at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
		at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
		at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
		at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
		at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
		at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
		at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
		at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
		at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
		at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
		at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
		at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:375)
		at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:831)
		... 1 more


seems like this may just be a missing permission grant?

@fcofdez
Copy link
Contributor Author

fcofdez commented Nov 23, 2020

Yes, it's related with a change introduced recently in the security side that restricts the permissions that can be granted to plugins. #64751. I'm waiting to see if there's a workaround for this.

@fcofdez
Copy link
Contributor Author

fcofdez commented Nov 23, 2020

Sorry @original-brownbear, you were right. I thought I was providing the proper permissions to all tasks, but I missed the reactor schedulers. It should be fixed now.

@original-brownbear
Copy link
Member

No worries @fcofdez !

Unfortunately it looks like things are still broken due to permissions trouble, see failing test:

13:20:11 > Task :x-pack:plugin:searchable-snapshots:qa:azure:integTest FAILED
13:20:11 Exec output and error:
13:20:11 | Output for ./bin/elasticsearch-plugin:-> Installing file:/dev/shm/elastic+elasticsearch+pull-request-2/plugins/repository-azure/build/distributions/repository-azure-8.0.0-SNAPSHOT.zip
13:20:11 | -> Downloading file:/dev/shm/elastic elasticsearch pull-request-2/plugins/repository-azure/build/distributions/repository-azure-8.0.0-SNAPSHOT.zip
13:20:11 | -> Failed installing file:/dev/shm/elastic+elasticsearch+pull-request-2/plugins/repository-azure/build/distributions/repository-azure-8.0.0-SNAPSHOT.zip
13:20:11 | -> Rolling back file:/dev/shm/elastic+elasticsearch+pull-request-2/plugins/repository-azure/build/distributions/repository-azure-8.0.0-SNAPSHOT.zip
13:20:11 | -> Rolled back file:/dev/shm/elastic+elasticsearch+pull-request-2/plugins/repository-azure/build/distributions/repository-azure-8.0.0-SNAPSHOT.zip
13:20:11 | Exception in thread "main" java.lang.IllegalArgumentException: plugin policy [/dev/shm/elastic+elasticsearch+pull-request-2/x-pack/plugin/searchable-snapshots/qa/azure/build/testclusters/integTest-0/distro/8.0.0-DEFAULT/plugins/.installing-15635625663194161767/plugin-security.policy] contains illegal permission ("java.lang.reflect.ReflectPermission" "newProxyInPackage.com.azure.storage.blob.implementation") in global grant
13:20:11 | 	at org.elasticsearch.bootstrap.PolicyUtil.validatePolicyPermissionsForJar(PolicyUtil.java:311)
13:20:11 | 	at org.elasticsearch.bootstrap.PolicyUtil.validatePolicyPermissions(PolicyUtil.java:321)
13:20:11 | 	at org.elasticsearch.bootstrap.PolicyUtil.getPluginPolicyInfo(PolicyUtil.java:332)
13:20:11 | 	at org.elasticsearch.plugins.InstallPluginCommand.installPlugin(InstallPluginCommand.java:854)
13:20:11 | 	at org.elasticsearch.plugins.InstallPluginCommand.execute(InstallPluginCommand.java:240)
13:20:11 | 	at org.elasticsearch.plugins.InstallPluginCommand.execute(InstallPluginCommand.java:210)
13:20:11 | 	at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:86)
13:20:11 | 	at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:127)
13:20:11 | 	at org.elasticsearch.cli.MultiCommand.execute(MultiCommand.java:91)
13:20:11 | 	at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:127)
13:20:11 | 	at org.elasticsearch.cli.Command.main(Command.java:90)
13:20:11 | 	at org.elasticsearch.plugins.PluginCli.main(PluginCli.java:47)
13:20:11 
13:20:11 > Task :x-pack:plugin:ml:thirdPartyAudit

Copy link
Member

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Francisco: a few smaller comments. Sorry for being so slow here, I'll pick this up at a faster pace tomorrow!

private static final TimeValue DEFAULT_CONNECTION_TIMEOUT = TimeValue.timeValueSeconds(30);
private static final TimeValue DEFAULT_MAX_CONNECTION_IDLE_TIME = TimeValue.timeValueSeconds(60);
private static final int DEFAULT_MAX_CONNECTIONS = Math.min(Runtime.getRuntime().availableProcessors(), 8) * 2;
private static final int DEFAULT_EVENT_LOOP_THREAD_COUNT = Math.min(Runtime.getRuntime().availableProcessors(), 8) * 2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if just 1 might actually be good enough here? We're not doing anything CPU intensive on the IO threads I think? (haven't bench-marked this though, maybe we can get some hard data for this from our benchmarking efforts?, my guess would be 1 is good enough and maybe we can verify that or experiment out the actual optimal value).
I would hope that there is no hidden SDK issue that makes 16 useful here at least :)

@rjernst rjernst force-pushed the azure-sdk-upgrade-new branch from 190a747 to 381b33d Compare December 2, 2020 00:37
Copy link
Member

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @fcofdez didn't get through the whole thing today, but I figured I'd add the comment on the stream read asap so we can discuss it. Let me know what you think there.

So far looks great though apart from that, thanks!

@fcofdez
Copy link
Contributor Author

fcofdez commented Dec 13, 2020

@original-brownbear as we discussed I've implemented the AzureInputStream reading the entire stream instead of requesting it in chunks, the tricky part comes in https://github.com/elastic/elasticsearch/pull/65140/files#diff-5138cdb95334310e4f811afa0bb04f785e7be1280c5812ab6464211274194990 where the interface between the async API and the sync API lies. We still need to copy the buffers, as I didn't find a reasonable way to avoid copying them :(, for that I'm using a buffer pool so we avoid allocating them over and over. Let me know what you think.

Copy link
Member

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @fcofdez the idea looks great. I have a question on the implementation around the queue though, I might be misunderstanding something here?

Copy link
Member

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the noisy review, but one important comment inline about the max blob size to look at asap IMO if it's not a random oversight

Copy link
Member

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave this a bit of a spin locally and it's definitely using a little more memory than I would have hoped (quite a bit of per thread buffer use). I have some ideas to significantly lower this in a follow up (same tried and true fixes we applied on the transport layer already) and I don't think these should be a blocker for now.
-> Assuming CI goes green, LGTM :) Thanks again+so much for grinding through this Francisco, looks great!

@fcofdez
Copy link
Contributor Author

fcofdez commented Dec 15, 2020

Thanks for your review and patience @original-brownbear ! :)

@fcofdez fcofdez merged commit fd1d282 into elastic:master Dec 15, 2020
fcofdez added a commit to fcofdez/elasticsearch that referenced this pull request Dec 15, 2020
Upgrade Azure repository to the latest non blocking Azure SDK.

Closes elastic#43309

Backport of elastic#65140

Co-authored-by: Ryan Ernst <[email protected]>
fcofdez added a commit that referenced this pull request Dec 15, 2020
Upgrade Azure repository to the latest non blocking Azure SDK.

Closes #43309

Backport of #65140

Co-authored-by: Ryan Ernst <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v7.11.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants