-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Differentiate stats for the same blobstore operation with purposes #99615
Differentiate stats for the same blobstore operation with purposes #99615
Conversation
@Nullable | ||
private final Purpose purpose; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would much prefer this not to be nullable, let's represent the cases that would be null as a proper enum value instead. Indeed at some point in the future I think we'll want to distinguish metadata and data operations too, and this seems like a good mechanism for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks David. I replaced null
with Purpose.Generic
in 0a39942
@@ -29,6 +32,14 @@ public interface BlobStore extends Closeable { | |||
*/ | |||
void deleteBlobsIgnoringIfNotExists(Iterator<String> blobNames) throws IOException; | |||
|
|||
default void deleteBlobsIgnoringIfNotExists(Iterator<String> blobNames, @Nullable Purpose purpose) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also much prefer not to have an implicit default for the purpose
parameter like this. Experience shows that we will forget to set it in future work unless the caller is required to make an explicit choice.
That said, temporarily adding an implicit default is a reasonable strategy for avoiding breaking the serverless build, but it must only be temporary. Ideally we'd mark the overloads that don't take the Purpose
as @Deprecated(forRemoval = true)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed especially since this interface method is pretty new. I will go through the deprecation and removal process once we are happy with the proposed approach in this PR.
Hi @ywangd, I've created a changelog YAML for you. |
Pinging @elastic/es-distributed (Team:Distributed) |
@Deprecated(forRemoval = true) | ||
void deleteBlobsIgnoringIfNotExists(Iterator<String> blobNames) throws IOException; | ||
|
||
// TODO: Remove the default implementation and require each blob store to implement this method. Once it's done, remove the | ||
// the above overload version that does not take the Purpose parameter. | ||
/** | ||
* Delete all the provided blobs from the blob store. Each blob could belong to a different {@code BlobContainer} | ||
* @param blobNames the blobs to be deleted | ||
* @param purpose the purpose of the {@code BlobContainer} associated to the blobs to be deleted. It should be set | ||
* to {@code Purpose.GENERIC}, if the blobs are from multiple {@code BlobContainer}s. | ||
*/ | ||
default void deleteBlobsIgnoringIfNotExists(Iterator<String> blobNames, Purpose purpose) throws IOException { | ||
if (purpose == Purpose.GENERIC) { | ||
deleteBlobsIgnoringIfNotExists(blobNames); | ||
} else { | ||
throw new UnsupportedOperationException(); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As David suggested, I will perform the cleanup of these two methods in a follow-up and avoid breaking builds on either side.
@elasticmachine update branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think attaching the Purpose
to the path to the container will be ok for our current needs, but I worry that it won't be flexible enough for some related future work: we want to distinguish data from metadata operations when taking a snapshot, and I can see value in being able to distinguish searchable-snapshot-related read operations from regular snapshot reads too.
Have we considered making this a per-operation property rather than a property of the BlobContainer
? This will involve changing more call sites here of course but no more real complexity and a great deal more flexibility in future.
server/src/main/java/org/elasticsearch/common/blobstore/BlobPath.java
Outdated
Show resolved
Hide resolved
modules/repository-s3/src/main/java/org/elasticsearch/repositories/s3/S3BlobStore.java
Outdated
Show resolved
Hide resolved
modules/repository-s3/src/main/java/org/elasticsearch/repositories/s3/S3BlobStore.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/common/blobstore/BlobPath.java
Outdated
Show resolved
Hide resolved
I agree, I'd like to at least try out that approach too to see how big of a change it is. |
Thanks for the feedback. I was under the impression that once a |
Assuming "per operation" means "per method of the BlobContainer" class, I think it is achievable without having to change signature of every method. We still attach class BlobContainer {
Purpose purpose = Purpose.SNAPSHOT; // default
BlobContainer withPurpose(Purpose purpose) {
return new BlobContainer(..., purpose);
}
// operation method
void writeBlob(...) {
// the method can reference purpose as instance variable
}
...
}
// Sample call site usage
class TranslogFileUploadTask {
void doRun() {
blobContainer.withPurpose(Purpose.TRANSLOG) // configure the purpose per operation
.writeBlob(...); // as is today
}
} I think the above can achieve the "per operation" variety with likely less cascading code changes. What do you think? |
I'd rather not create a new |
@Deprecated(forRemoval = true) | ||
default InputStream readBlob(String blobName) throws IOException { | ||
return readBlob(OperationPurpose.SNAPSHOT, blobName); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, methods marked as deprecated in this class will be removed once the necessary changes are made in the other repo. They are here to avoid breaking builds for now.
Thanks for the reviews. I have updated the PR accordingly. The most notable update is that I removed the changes to actual stats collection so that this PR focuses on adding the new purpose parameter (as suggested here). This is a good call since I think we are not entirely aligned on the new stats structure. Deferring it gives us more time for discussion. Note that I still kept the refactoring for |
I believe all reviews have been addressed. I'd appreciate if you could take another look. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a couple more (somewhat superficial) comments.
final RequestMetricCollector multiPartUploadMetricCollector; | ||
final RequestMetricCollector deleteMetricCollector; | ||
final RequestMetricCollector abortPartUploadMetricCollector; | ||
private final StatsCollectors statsCollectors = new StatsCollectors(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we revert these changes to stats collection too? That way this PR is just about adding the purpose
parameter to all the APIs, which is noisy but low-risk, and follow-ups which change the behaviour like here will be easier to review and will let us pin down any later problems with git bisect
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure I reverted these changes as well. I believe this PR is now a pure refactor. Hence I adjusted the labels accordingly as well.
|
||
package org.elasticsearch.common.blobstore; | ||
|
||
public enum BlobPurpose { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for nitpicking about naming here but I don't like the new name BlobPurpose
. We're not distinguishing blobs by purpose, it's the operations on them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I reverted it back to OperationPurpose
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thanks for the extra iterations Yang
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Left a few minor comments but no need for another round. Thanks for the extra iterations on this.
|
||
final BlobPath blobPath = repository.basePath().add(randomAlphaOfLength(10)); | ||
final BlobContainer blobContainer = blobStore.blobContainer(blobPath); | ||
final OperationPurpose purpose = randomValueOtherThan(OperationPurpose.SNAPSHOT, () -> randomFrom(OperationPurpose.values())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why we exclude SNAPSHOT
here? I'd expect the test to also succeed for that and furthermore for the validation of SNAPSHOT to be equally valid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test had two parts where the 1st part uses SNAPSHOT
and the 2nd part randomizes from purposes other than SNAPSHOT. Since this PR is now a pure refactor, the test mostly lost its purpose. Now it really just shows the refactor has no impact on output. In this case, you are right that there is no need to exclude SNAPSHOT. I have updated accordingly.
@@ -243,9 +244,14 @@ public void deleteBlobsIgnoringIfNotExists(Iterator<String> blobNames) throws IO | |||
} | |||
} | |||
|
|||
private void deletePartition(AmazonS3Reference clientReference, List<String> partition, AtomicReference<Exception> aex) { | |||
private void deletePartition( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: all other places have purpose
as first arg, I'd like to keep it consistent, so can we move it to be first here too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching it. It was my intention to keep it consistently but apparently missed this one and the one below. Corrected now.
@@ -264,7 +270,7 @@ private void deletePartition(AmazonS3Reference clientReference, List<String> par | |||
} | |||
} | |||
|
|||
private static DeleteObjectsRequest bulkDelete(S3BlobStore blobStore, List<String> blobs) { | |||
private static DeleteObjectsRequest bulkDelete(S3BlobStore blobStore, List<String> blobs, OperationPurpose purpose) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: all other places have purpose
as first arg, I'd like to keep it consistent, so can we move it to be first here too?
expectThrows(IOException.class, () -> container.readBlob(blobName, content.length + 1, content.length).read()); | ||
expectThrows( | ||
IllegalArgumentException.class, | ||
() -> container.readBlob(OperationPurpose.SNAPSHOT, blobName, -1, content.length).read() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to randomize the purpose here (and in many places elsewhere). I think it can be a follow-up though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep we should have follow-ups for actual usages of the purpose parameter. In this PR, all callsite simply uses the SNAPSHOT as the default.
* @param blobName | ||
* The name of the blob whose existence is to be determined. | ||
* @return {@code true} if a blob exists in the {@link BlobContainer} with the given name, and {@code false} otherwise. | ||
* @param purpose The purpose of the operation, useful for stats collection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are several potential uses of this, stats collection, storage classes, priority. Perhaps we can move the examples to OperationPurpose
javadoc and leave the example out here (and in other methods in this class)?
* @param purpose The purpose of the operation, useful for stats collection. | |
* @param purpose The purpose of the operation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. I removed it from individual method and added it (with a bit more explanation) to the Enum class. Btw, the other usages sound pretty exciting!
|
||
package org.elasticsearch.common.blobstore; | ||
|
||
public enum OperationPurpose { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see us mentioning stats collection in every method javadoc where this is passed. I think that is only one use out of potentially several. Instead, we could supply that information here and leave the method level javadoc without the use case.
…lastic#99615) Today blobstore stats are collected against each HTTP operation, e.g. Get, List. This is not granular enough because the same HTTP operration can be performed for different purposes, e.g. cluster state, indices or translog. This PR adds a new Purpose enum to provide further breakdown for the same HTTP operation. Relates: ES-6800
All their usages have been replaced by corresponding new versions. Relates: elastic#99615
All their usages have been replaced by corresponding new versions. Relates: #99615
A new no-op OperationPurpose parameter is added in elastic#99615 to all blob store/container operation method. This PR updates the s3 stats collection code to actually use this parameter for finer grained stats collection and reports. Stats are reported per combination of operation and operation purpose. A sample output is as the follows: ``` { "ListObjects": 2, "GetObject": 1, "PutObject": 2, "PutMultipartObject": 0, "AbortMultipartObject": 0, "DeleteObjects": 1, "GetObject/ClusterState": 1, "PutObject/ClusterState": 1, "DeleteObjects/Translog": 1, "ListObjects/Indices": 1 } ``` The changes are made with BWC in mind, i.e. existing stats reports with default operation purpose will remain unchanged. For an example, the key "ListObjects" is equivalent "ListObjects/Snapshot". But we omit the default purpose in the stats key so that it is backwards compatible. Relates: elastic#99615 Relates: ES-6800
A new no-op OperationPurpose parameter is added in #99615 to all blob store/container operation method. This PR updates the s3 stats collection code to actually use this parameter for finer grained stats collection and reports. This differentiation between purposes are kept internally for now. The stats are currently aggregated over operations for existing stats reporting. This means responses from both GetRepositoriesMetering API and GetBlobStoreStats API will not be changed. We will have follow-ups to expose the finer stats separately. Relates: #99615 Relates: ES-6800
Today blobstore stats are collected against each HTTP operation, e.g. Get, List. This is not granular enough because the same HTTP operration can be performed for different purposes, e.g. cluster state, indices or translog. This PR adds a new Purpose enum to provide further breakdown for the same HTTP operation.
Relates: ES-6800