Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metrics to express data cache #1146

Merged
merged 1 commit into from
Nov 20, 2024

Conversation

muddyfish
Copy link
Contributor

Description of change

Adds metrics to express data cache
Fixes a bug where getting a cache miss would be reported as an error rather than a cache miss

Relevant issues: N/A

Does this change impact existing behavior?

Adds metrics, no user facing functionality changes.

Does this change need a changelog entry in any of the crates?

No


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the Developer Certificate of Origin (DCO).

if result.is_err() {
Err(parse_get_object_error(result).map(ObjectClientError::ServiceError))
let err = parse_get_object_error(result);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is problematic: the result has not been processed yet - see these comments.

I think this is the same mistake that causes issues for PUT: #1007.

Unless we review the callbacks, the correct thing to do would be to wait for the request to finish and return the fully parsed error.

}
}

fn emit_failure_metric(&self, reason: &'static str, type_: MetricsType) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest inlining the metrics calls. This function makes it less clear what you are tracking.

@@ -113,11 +137,15 @@ where
block_offset: u64,
object_size: usize,
) -> DataCacheResult<Option<ChecksummedBytes>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too many points of failure.
Shall we factor out the logic into an inner method and only handle the metrics here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully, we won't need handle_get_object_err anymore.

Copy link
Contributor Author

@muddyfish muddyfish Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not planning on refactoring into an inner method, as that makes the reasons harder to propagate. Did remove the handle_get_object_err function though

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative could be to still split into an inner function which records the failure (and reason), but move the hit/miss metric to the wrapper.

Happy to review the approach later, though.

.get_object_metadata()
.await
.map_err(|err| DataCacheError::IoFailure(err.into()))?;
let object_metadata = match result.get_object_metadata().await {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a workaround for the get_object errors above, why not get the data first here? That will guarantee to return the correct error before trying to get metadata or checksums.

Copy link
Contributor

@passaro passaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments, but completely optional.

}

#[inline]
fn emit_failure_metric_write(reason: &'static str) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, non-blocking: we could just inline

@@ -113,11 +137,15 @@ where
block_offset: u64,
object_size: usize,
) -> DataCacheResult<Option<ChecksummedBytes>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative could be to still split into an inner function which records the failure (and reason), but move the hit/miss metric to the wrapper.

Happy to review the approach later, though.

@muddyfish muddyfish added this pull request to the merge queue Nov 20, 2024
Merged via the queue into awslabs:main with commit f7b4524 Nov 20, 2024
23 checks passed
@muddyfish muddyfish deleted the express-cache-metrics branch November 20, 2024 17:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants