Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Azure blob batch API to delete blobs in batches #114566

Merged

Conversation

nicktindall
Copy link
Contributor

@nicktindall nicktindall commented Oct 11, 2024

This PR implements blob deletion as one or more blob batch requests, rather than deleting each blob individually.

The reason this wasn't implemented originally was due to concerns around the blob batch API's SAS token auth support.

The difference in the approach in this PR is down to the use of a container-scoped client which sends an additional request parameter (restype=container). Using the API in this way means that SAS tokens are supported.

I ran this branch through the elasticsearch / periodic pipeline (results here) and everything passed. If I'm reading it correctly, that includes running the AzureStorageCleanupThirdPartyTests using a SAS token, and that test includes code paths that use the new bulk delete.

Closes ES-9777

@nicktindall nicktindall force-pushed the ES-9777_support_batch_deletions_in_azure branch from 2d9b166 to 8e08e60 Compare October 11, 2024 06:38
@@ -30,6 +30,7 @@ dependencies {
api "com.azure:azure-identity:1.13.2"
api "com.azure:azure-json:1.2.0"
api "com.azure:azure-storage-blob:12.27.1"
api "com.azure:azure-storage-blob-batch:12.23.1"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the version consistent with the others from the BOM

requires reactor.netty.core;
requires reactor.netty.http;
requires com.azure.storage.blob.batch;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IntelliJ seemed to optimize the requires, the ones removed above are all transitively required by com.azure.storage.blob.batch.

@nicktindall nicktindall added >non-issue :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. labels Oct 14, 2024
@@ -689,7 +690,8 @@ enum Operation {
GET_BLOB_PROPERTIES("GetBlobProperties"),
PUT_BLOB("PutBlob"),
PUT_BLOCK("PutBlock"),
PUT_BLOCK_LIST("PutBlockList");
PUT_BLOCK_LIST("PutBlockList"),
BLOB_BATCH("BlobBatch");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't be specific about the type of operation we're performing in a batch without inspecting the request body. I think it's better to track BlobBatch than potentially erroneously track BatchDelete (if one day we start using batch to "set access tier")

@nicktindall nicktindall marked this pull request as ready for review October 14, 2024 06:55
@nicktindall nicktindall requested a review from a team as a code owner October 14, 2024 06:55
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@elasticsearchmachine elasticsearchmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Oct 14, 2024
@nicktindall nicktindall changed the title Experimental batch-blob deletion support WIP Use Azure blob batch API to delete files in batches Oct 14, 2024
@nicktindall nicktindall changed the title Use Azure blob batch API to delete files in batches Use Azure blob batch API to delete blobs in batches Oct 14, 2024
Copy link
Member

@ywangd ywangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay here. I took a second and closer look at the changes. I think we might want to consider adding controls for resource usages (heap and concurrent requests).

Btw, the PR should be now labelled as :>enhancement due to the new setting.

@@ -129,6 +134,7 @@ public AzureBlobStore(
// locationMode is set per repository, not per client
this.locationMode = Repository.LOCATION_MODE_SETTING.get(metadata.settings());
this.maxSinglePartUploadSize = Repository.MAX_SINGLE_PART_UPLOAD_SIZE_SETTING.get(metadata.settings());
this.maxDeletesPerBatch = Repository.DELETION_BATCH_SIZE_SETTING.get(metadata.settings());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Can we rename this field to be deletionBatchSize which is consistent with the setting name and avoid clashing with the static MAX_ELEMENTS_PER_BATCH?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -52,6 +52,10 @@ public static void doPrivilegedVoidException(StorageRunnable action) {
}
}

public static <E extends Exception> void doPrivilegedVoidExceptionExplicit(Class<E> exception, StorageRunnable action) throws E {
doPrivilegedVoidException(action);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary? The existing code works ok without the explicit throws? If we want to change this, I'd prefer to update the existing method so that it explicit throws IOException in its catch block if the cause is an IOException. Since it requires some cascading changes, I think a separate PR would be better.

Comment on lines 251 to 260
for (BlobItem blobItem : blobContainerClient.listBlobs(options, null)) {
if (blobItem.isPrefix()) {
continue;
}
blobNames.add(blobItem.getName());
bytesDeleted.addAndGet(blobItem.getProperties().getContentLength());
blobsDeleted.incrementAndGet();
}
if (blobNames.isEmpty() == false) {
deleteListOfBlobs(client, blobNames.iterator());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether there is an issue in materializing all blobItems from the listing before invoking delete. If there are a large number of items, it could be rather inefficient. IIUC, listBlobs returns an Iterable that lazily load. I think this change means we no longer leverage it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, I've changed this now to use Flux all the way through. I think that should pipeline all this stuff better.

Comment on lines 285 to 293
final List<Mono<Void>> batchResponses = new ArrayList<>();
while (blobNames.hasNext()) {
final BlobBatch currentBatch = batchAsyncClient.getBlobBatch();
int counter = 0;
while (counter < maxDeletesPerBatch && blobNames.hasNext()) {
currentBatch.deleteBlob(container, blobNames.next());
counter++;
}
batchResponses.add(batchAsyncClient.submitBatch(currentBatch));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more likely a theoretical concern. Technically the number of concurrent requests here are also unbounded while previously it is hard-coded to 100.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've limited these. There is an underlying limit imposed at the node level (max open connections for a pool shared between clients, which defaults to 50 and Azure is HTTP/1.1 so that's effectively a global max concurrent requests for a node) and also for the execution of these blocks that dispatch the request there's a thread pool limit in the reactor runtime (which seems to default to 5 threads).

So I think with a limit of 100 the actual number of concurrent requests would be much lower. In any case I've added an explicit limit which is configurable and defaults to 10. Because these are bulk requests that means by default that's a maximum of 2560 concurrent individual deletes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining. Makes sense from the networking perspective. I should have been more clear. By unbounded number of requests, I mostly mean the number of request objects that are constructed during this process. I guess they are eagerly instantiated even when the underlying network stack is not ready to consume them? If so, they consume memory and in extreme case may even lead to oom.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yep I understand. I think the limitation I added should restrict that as it'll limit the number of concurrent subscribers. As I understand it nothing happens until a subscriber asks for the next value(s) so there should only be at most 10 batch requests being processed at any one time.

Comment on lines 306 to 307
final CountDownLatch allRequestsFinished = new CountDownLatch(deleteTasks.size());
final List<Throwable> errors = new CopyOnWriteArrayList<>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, I think we should limit the number of errors. Also, seeing the CountDownLatch makes me think whether it is possible to leverage the Flux approach similar to how existing deletion code reiles on Flux.then().blocks(). Maybe something like Flux#fromIterable so that it takes a custom Iterable implementation that internally constructs deletion requests which in turn consumes the listing response. I feel it could somewhat address my previous comments about limiting resource usages. It's just a rough idea. There maybe issues that I just haven't noticed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in b913fd0

@elasticsearchmachine
Copy link
Collaborator

Hi @nicktindall, I've created a changelog YAML for you.

Copy link
Member

@ywangd ywangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thanks for the iterations!

docs/reference/snapshot-restore/repository-azure.asciidoc Outdated Show resolved Hide resolved
Comment on lines 95 to 98
logger.info("Using SAS token authentication");
secureSettings.setString("azure.client.default.sas_token", System.getProperty("test.azure.sas_token"));
} else {
logger.info("Using key authentication");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: can we add --> in the beginning of the logging messages? It's an informal conventional to make these test logging messages easier to search.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in d9ce4b5

Comment on lines +279 to +284
// We need to use a container-scoped BlobBatchClient, so the restype=container parameter
// is sent, and we can support all SAS token types
// See https://learn.microsoft.com/en-us/rest/api/storageservices/blob-batch?tabs=shared-access-signatures#authorization
final BlobBatchAsyncClient batchAsyncClient = new BlobBatchClientBuilder(
azureBlobServiceClient.getAsyncClient().getBlobContainerAsyncClient(container)
).buildAsyncClient();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this still works with non-container-scoped tokens and other azure crendential types that we support?

Copy link
Contributor Author

@nicktindall nicktindall Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just kicked off the tests with the latest changes https://buildkite.com/elastic/elasticsearch-periodic/builds/4557

EDIT: third party tests all still pass :)

});
}, maxConcurrentBatchDeletes).collectList().block();
if (errors.isEmpty() == false) {
final IOException ex = new IOException("Error deleting batches");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think we can include a brief message about exactly how many errors have been encountered if errorsCollected is greater than 10 so that it is clear that some errors are skipped.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in 84ff3c2

Comment on lines 309 to 310
} catch (RuntimeException e) {
throw new IOException("Error deleting batches", e);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly for my own education: Why are we specifically catching RuntimeException here? Is there a concrete concern of anything thrown here or is it to match the existing code. The existing code seems to catch a broader Exception instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was really because there are no checked exceptions thrown in the try/catch block, perhaps it's safer just to catch exception in case there's a SocketAccess.doPrivilegedVoidException-type scenario going on in there somewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Broadened to Exception in 103aec4

/**
* The maximum number of concurrent batch deletes
*/
static final Setting<Integer> MAX_CONCURRENT_BATCH_DELETES_SETTING = Setting.intSetting("max_concurrent_batch_deletes", 10, 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we give it a sensible max value, e.g. 100.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in 6c4697e

@nicktindall nicktindall merged commit 7599d4c into elastic:main Oct 24, 2024
16 checks passed
@nicktindall nicktindall deleted the ES-9777_support_batch_deletions_in_azure branch October 24, 2024 08:51
davidkyle pushed a commit to davidkyle/elasticsearch that referenced this pull request Oct 24, 2024
georgewallace pushed a commit to georgewallace/elasticsearch that referenced this pull request Oct 25, 2024
jfreden pushed a commit to jfreden/elasticsearch that referenced this pull request Nov 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. >enhancement Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants