Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EncryptedAzureBlobStoreRepositoryIntegTests.testLargeBlobCountDeletion failed to delete directory #67119

Closed
DaveCTurner opened this issue Jan 6, 2021 · 2 comments · Fixed by #67210
Assignees
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs Team:Distributed Meta label for distributed team (obsolete) >test-failure Triaged test failures from CI

Comments

@DaveCTurner
Copy link
Contributor

Build scan: https://gradle-enterprise.elastic.co/s/odetfr6slb2ou/console-log?task=:x-pack:plugin:repository-encrypted:internalClusterTest

Repro line: REPRODUCE WITH: ./gradlew ':x-pack:plugin:repository-encrypted:internalClusterTest' --tests "org.elasticsearch.repositories.encrypted.EncryptedAzureBlobStoreRepositoryIntegTests.testLargeBlobCountDeletion" -Dtests.seed=98E7E9A0B1D0E0E9 -Dtests.security.manager=true -Dtests.locale=vi-VN -Dtests.timezone=America/Swift_Current -Druntime.java=11

Reproduces locally?: No

Applicable branches: Only seen on a PR build against master

Failure history: https://build-stats.elastic.co/goto/276c4165d436c9b560144e07e799de1d indicates that this was failing frequently until 2020-12-30 but then quietened down for a week.

Failure excerpt: Doesn't look very helpful:

	org.elasticsearch.repositories.encrypted.EncryptedAzureBlobStoreRepositoryIntegTests > testLargeBlobCountDeletion FAILED	
	    java.io.IOException: Deleting directory [] failed	
	        at __randomizedtesting.SeedInfo.seed([98E7E9A0B1D0E0E9:74AC3F3AB86B93EE]:0)	
	        at org.elasticsearch.repositories.azure.AzureBlobStore.deleteBlobDirectory(AzureBlobStore.java:262)	
	        at org.elasticsearch.repositories.azure.AzureBlobContainer.delete(AzureBlobContainer.java:116)	
	        at org.elasticsearch.repositories.encrypted.EncryptedRepository$EncryptedBlobContainer.delete(EncryptedRepository.java:654)	
	        at org.elasticsearch.repositories.azure.AzureBlobStoreRepositoryTests.testLargeBlobCountDeletion(AzureBlobStoreRepositoryTests.java:235)	
	        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)	
	        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)	
	        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)	
	        at java.base/java.lang.reflect.Method.invoke(Method.java:566)	
	        at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)	
	        at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)	
	        at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)	
	        at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)	
	        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)	
	        at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)	
	        at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)	
	        at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)	
	        at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)	
	        at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)	
	        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)	
	        at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:375)	
	        at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:824)	
	        at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:475)	
	        at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)	
	        at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)	
	        at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)	
	        at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)	
	        at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)	
	        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)	
	        at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)	
	        at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)	
	        at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)	
	        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)	
	        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)	
	        at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)	
	        at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)	
	        at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)	
	        at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)	
	        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)	
	        at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:375)	
	        at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:831)	
	        at java.base/java.lang.Thread.run(Thread.java:834)	
		
	        Caused by:	
	        reactor.core.Exceptions$CompositeException: Multiple exceptions	
	            at reactor.core.Exceptions.multiple(Exceptions.java:121)	
	            at reactor.core.publisher.MonoZip$ZipCoordinator.signal(MonoZip.java:213)	
	            at reactor.core.publisher.MonoZip$ZipInner.onComplete(MonoZip.java:352)	
	            at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:141)	
	            at reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1782)	
	            at reactor.core.publisher.MonoFlatMap$FlatMapInner.onNext(MonoFlatMap.java:241)	
	            at reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.onNext(FluxMapFuseable.java:121)	
	            at reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1782)	
	            at reactor.core.publisher.MonoFlatMap$FlatMapInner.onNext(MonoFlatMap.java:241)	
	            at reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1782)	
	            at reactor.core.publisher.MonoIgnoreThen$ThenAcceptInner.onNext(MonoIgnoreThen.java:296)	
	            at reactor.core.publisher.FluxSwitchIfEmpty$SwitchIfEmptySubscriber.onNext(FluxSwitchIfEmpty.java:67)	
	            at reactor.core.publisher.FluxMap$MapSubscriber.onNext(FluxMap.java:114)	
	            at reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1782)	
	            at reactor.core.publisher.MonoCacheTime$CoordinatorSubscriber.signalCached(MonoCacheTime.java:320)	
	            at reactor.core.publisher.MonoCacheTime$CoordinatorSubscriber.onNext(MonoCacheTime.java:337)	
	            at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onNext(FluxOnErrorResume.java:73)	
	            at reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1782)	
	            at reactor.core.publisher.MonoCallable.subscribe(MonoCallable.java:61)	
	            at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:64)	
	            at reactor.core.publisher.MonoCacheTime.subscribeOrReturn(MonoCacheTime.java:132)	
	            at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:57)	
	            at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.drain(MonoIgnoreThen.java:153)	
	            at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.ignoreDone(MonoIgnoreThen.java:190)	
	            at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreInner.onComplete(MonoIgnoreThen.java:240)	
	            at reactor.core.publisher.MonoIgnoreElements$IgnoreElementsSubscriber.onComplete(MonoIgnoreElements.java:81)	
	            at reactor.core.publisher.FluxMap$MapSubscriber.onComplete(FluxMap.java:136)	
	            at reactor.core.publisher.FluxDoFinally$DoFinallySubscriber.onComplete(FluxDoFinally.java:138)	
	            at reactor.core.publisher.FluxMap$MapSubscriber.onComplete(FluxMap.java:136)	
	            at reactor.netty.channel.FluxReceive.onInboundComplete(FluxReceive.java:374)	
	            at reactor.netty.channel.ChannelOperations.onInboundComplete(ChannelOperations.java:373)	
	            at reactor.netty.channel.ChannelOperations.terminate(ChannelOperations.java:429)	
	            at reactor.netty.http.client.HttpClientOperations.onInboundNext(HttpClientOperations.java:655)	
	            at reactor.netty.channel.ChannelOperationsHandler.channelRead(ChannelOperationsHandler.java:96)	
	            at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)	
	            at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)	
	            at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)	
	            at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelRead(CombinedChannelDuplexHandler.java:436)	
	            at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:324)	
	            at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:296)	
	            at io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:251)	
	            at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)	
	            at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)	
	            at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)	
	            at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)	
	            at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)	
	            at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)	
	            at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)	
	            at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)	
	            at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714)	
	            at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:615)	
	            at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:578)	
	            at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)	
	            at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)	
	            at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)	
	            at org.elasticsearch.repositories.azure.SocketAccess.lambda$doPrivilegedVoidException$0(SocketAccess.java:57)	
	            at java.base/java.security.AccessController.doPrivileged(Native Method)	
	            at org.elasticsearch.repositories.azure.SocketAccess.doPrivilegedVoidException(SocketAccess.java:56)	
	            at org.elasticsearch.repositories.azure.executors.PrivilegedExecutor.lambda$execute$0(PrivilegedExecutor.java:38)	
	            at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:680)	
	            at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)	
	            at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)	
	            ... 1 more

Labelling this snapshot/restore suspecting something wrong in AzureBlobStore.deleteBlobDirectory but this could be related to the encrypted repo tests instead.

@DaveCTurner DaveCTurner added :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >test-failure Triaged test failures from CI labels Jan 6, 2021
@elasticmachine elasticmachine added the Team:Distributed Meta label for distributed team (obsolete) label Jan 6, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@mark-vieira
Copy link
Contributor

This is failing a few times a day so I've muted in 22a6811.

fcofdez added a commit to fcofdez/elasticsearch that referenced this issue Jan 8, 2021
Instead of executing all the delete request in parallel
this commits introduces a change that allows the execution
of delete requests in batches of 100 parallel deletions.
The reason for this change is to avoid timeout failures
when large number of files should be deleted as if we
execute all of them in parallel a few slow requests could
make the rest to fail due to timeouts, as there is an effective
limit at the connection pool level.
Additionally, this commit improves the error messages provided as
previously we weren't including the blob name on
deletion failures.

Closes elastic#67119
fcofdez added a commit that referenced this issue Jan 13, 2021
Additionally, this commit improves the error messages provided as
previously we weren't including the blob name on
deletion failures.

Closes #67119
fcofdez added a commit to fcofdez/elasticsearch that referenced this issue Jan 13, 2021
Additionally, this commit improves the error messages provided as
previously we weren't including the blob name on
deletion failures.

Closes elastic#67119
Backport of elastic#67210
fcofdez added a commit to fcofdez/elasticsearch that referenced this issue Jan 13, 2021
Additionally, this commit improves the error messages provided as
previously we weren't including the blob name on
deletion failures.

Closes elastic#67119
Backport of elastic#67210
fcofdez added a commit that referenced this issue Jan 13, 2021
Additionally, this commit improves the error messages provided as
previously we weren't including the blob name on
deletion failures.

Closes #67119
Backport of #67210
fcofdez added a commit that referenced this issue Jan 13, 2021
Additionally, this commit improves the error messages provided as
previously we weren't including the blob name on
deletion failures.

Closes #67119
Backport of #67210
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs Team:Distributed Meta label for distributed team (obsolete) >test-failure Triaged test failures from CI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants