Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] AzureBlobStoreRepositoryTests.testIndicesDeletedFromRepository #48978

Closed
costin opened this issue Nov 12, 2019 · 11 comments · Fixed by #49283
Closed

[CI] AzureBlobStoreRepositoryTests.testIndicesDeletedFromRepository #48978

costin opened this issue Nov 12, 2019 · 11 comments · Fixed by #49283
Assignees
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >test-failure Triaged test failures from CI

Comments

@costin
Copy link
Member

costin commented Nov 12, 2019

Looks like a service unavailable but since the issue was raised before (#47120) I'm raising it for consistency.

CI https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+g1gc/483/
Scan: https://gradle-enterprise.elastic.co/s/2nryv35jtn7lo/tests/lgycqkw2bfw46-kysdm6ag7igo6

Caused by: com.microsoft.azure.storage.StorageException: Service Unavailable
	at com.microsoft.azure.storage.StorageException.translateException(StorageException.java:87) ~[azure-storage-8.4.0.jar:?]
	at com.microsoft.azure.storage.core.StorageRequest.materializeException(StorageRequest.java:305) ~[azure-storage-8.4.0.jar:?]
	at com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:196) ~[azure-storage-8.4.0.jar:?]
	at com.microsoft.azure.storage.blob.CloudBlockBlob.uploadFullBlob(CloudBlockBlob.java:1035) ~[azure-storage-8.4.0.jar:?]
	at com.microsoft.azure.storage.blob.CloudBlockBlob.upload(CloudBlockBlob.java:864) ~[azure-storage-8.4.0.jar:?]
	at com.microsoft.azure.storage.blob.CloudBlockBlob.upload(CloudBlockBlob.java:743) ~[azure-storage-8.4.0.jar:?]
	at org.elasticsearch.repositories.azure.AzureStorageService.lambda$writeBlob$15(AzureStorageService.java:333) ~[main/:?]
	at org.elasticsearch.repositories.azure.SocketAccess.lambda$doPrivilegedVoidException$0(SocketAccess.java:69) ~[main/:?]
	at java.security.AccessController.doPrivileged(Native Method) ~[?:?]
	at org.elasticsearch.repositories.azure.SocketAccess.doPrivilegedVoidException(SocketAccess.java:68) ~[main/:?]
	at org.elasticsearch.repositories.azure.AzureStorageService.writeBlob(AzureStorageService.java:332) ~[main/:?]
	at org.elasticsearch.repositories.azure.AzureBlobStore.writeBlob(AzureBlobStore.java:119) ~[main/:?]
	at org.elasticsearch.repositories.azure.AzureBlobContainer.writeBlob(AzureBlobContainer.java:101) ~[main/:?]
	at org.elasticsearch.repositories.blobstore.ChecksumBlobStoreFormat.lambda$write$1(ChecksumBlobStoreFormat.java:1
@costin costin added :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >test-failure Triaged test failures from CI labels Nov 12, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

@tlrx
Copy link
Member

tlrx commented Nov 12, 2019

@tlrx
Copy link
Member

tlrx commented Nov 13, 2019

I suspect an issue on the server side logic, so I added some logging in #48991 in case it reproduces.

@romseygeek
Copy link
Contributor

This just happened again on an intake build:
https://gradle-enterprise.elastic.co/s/evu3cnypcts2o/tests/lgycqkw2bfw46-clvf2uhrzpiw4

tlrx added a commit that referenced this issue Nov 19, 2019
This commit fixes the server side logic of "List Objects" operations 
of Azure and S3 fixtures. Until today, the fixtures were returning a "
flat" view of stored objects and were not correctly handling the 
delimiter parameter. This causes some objects listing to be wrongly 
interpreted by the snapshot deletion logic in Elasticsearch which 
relies on the ability to list child containers of BlobContainer (#42653) 
to correctly delete stale indices.

As a consequence, the blobs were not correctly deleted from the
 emulated storage service and stayed in heap until they got garbage 
collected, causing CI failures like #48978.

This commit fixes the server side logic of Azure and S3 fixture when 
listing objects so that it now return correct common blob prefixes as 
expected by the snapshot deletion process. It also adds an after-test 
check to ensure that tests leave the repository empty (besides the 
root index files).

Closes #48978
tlrx added a commit that referenced this issue Nov 20, 2019
This commit fixes the server side logic of "List Objects" operations
of Azure and S3 fixtures. Until today, the fixtures were returning a "
flat" view of stored objects and were not correctly handling the
delimiter parameter. This causes some objects listing to be wrongly
interpreted by the snapshot deletion logic in Elasticsearch which
relies on the ability to list child containers of BlobContainer (#42653)
to correctly delete stale indices.

As a consequence, the blobs were not correctly deleted from the
 emulated storage service and stayed in heap until they got garbage
collected, causing CI failures like #48978.

This commit fixes the server side logic of Azure and S3 fixture when
listing objects so that it now return correct common blob prefixes as
expected by the snapshot deletion process. It also adds an after-test
check to ensure that tests leave the repository empty (besides the
root index files).

Closes #48978
tlrx added a commit that referenced this issue Nov 20, 2019
This commit fixes the server side logic of "List Objects" operations
of Azure and S3 fixtures. Until today, the fixtures were returning a "
flat" view of stored objects and were not correctly handling the
delimiter parameter. This causes some objects listing to be wrongly
interpreted by the snapshot deletion logic in Elasticsearch which
relies on the ability to list child containers of BlobContainer (#42653)
to correctly delete stale indices.

As a consequence, the blobs were not correctly deleted from the
 emulated storage service and stayed in heap until they got garbage
collected, causing CI failures like #48978.

This commit fixes the server side logic of Azure and S3 fixture when
listing objects so that it now return correct common blob prefixes as
expected by the snapshot deletion process. It also adds an after-test
check to ensure that tests leave the repository empty (besides the
root index files).

Closes #48978
tlrx added a commit to tlrx/elasticsearch that referenced this issue Nov 20, 2019
@tlrx
Copy link
Member

tlrx commented Nov 20, 2019

@dnhatn
Copy link
Member

dnhatn commented Nov 25, 2019

@original-brownbear
Copy link
Member

The above doesn't contain #49525 (assumed fix) it seems :)

@tlrx
Copy link
Member

tlrx commented Nov 26, 2019

The above doesn't contain #49525 (assumed fix) it seems :)

Yep

@original-brownbear
Copy link
Member

@tlrx think it's safe to close this one (and all the related ones) now? :) We didn't have any new failures ever since merging #49525 it appears :)

@tlrx
Copy link
Member

tlrx commented Nov 26, 2019

@original-brownbear It can take more than a day for the failure to happen, but yeah let's close this and reopen if needed (I'm sure we won't :)) Thanks again for your expertise on this!

@tlrx tlrx closed this as completed Nov 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >test-failure Triaged test failures from CI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants