-
Notifications
You must be signed in to change notification settings - Fork 864
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metrics: Add ReadMe with tenets, metrics list to collect, design doc. #1304
Conversation
7c0bab1
to
5ecf61b
Compare
…ototype interfaces
5ecf61b
to
776d70c
Compare
Codecov Report
@@ Coverage Diff @@
## master #1304 +/- ##
============================================
- Coverage 75.58% 75.58% -0.01%
Complexity 638 638
============================================
Files 898 898
Lines 28136 28140 +4
Branches 2221 2221
============================================
+ Hits 21268 21269 +1
- Misses 5848 5850 +2
- Partials 1020 1021 +1
Continue to review full report at Codecov.
|
## Concepts | ||
### Metric | ||
* A representation of data collected | ||
* Metric can be one of the following types: Counter, Gauge, Timer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note, Counter can be ambiguous as there are at least two pretty common definitions:
- Measuring a rate of change, e.g., RPS, as used in Prometheus, Spectator, Micrometer.
- A gauge that can be incremented or decremented to model something like a queue size. Used in Dropwizard Metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading these over. Will come back with thoughts.
| Api | ConstantGauge | The name of the AWS API the request is made to | ||
| StreamingRequest | ConstantGauge | True if the request has streaming payload | ||
| StreamingResponse | ConstantGauge | True if the response has streaming payload | ||
| ApiCallStartTime | Timer | The start time of the request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be a timestamp rather than a timer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 I like timestamp over timer. Will update.
| ApiCallEndTime | Timer | The end time of the request | ||
| ApiCallLatency | Timer | The total time taken to finish a request (inclusive of all retries), ApiCallEndTime - ApiCallStartTime | ||
| MarshallingLatency | Timer | The time taken to marshall the request | ||
| ApiCallAttemptCount | Counter | Total number of attempts that were made by the service client to fulfill this request before succeeding or failing. (Value is 1 if there are no retries) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any way to know the maximum number of attempts for a given call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I follow the ask here. Is this not what this metric is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean the configured maximum number of attempts before it will give up. This metric shows the number of attempts that are actually used. Having both allows us to see how close it is to failing outright because it will exhaust all attempts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I don't care about the individual attempts but want to log whether the overall call from the client's perspective succeeded or not, I will need something like the last Http Status code at the APICall level. Same for the Exception. Thoughts?
# Project Details | ||
|
||
1. Metrics are disabled by default and should be enabled explicitly by customers. Enabling metrics will introduce small overhead. | ||
2. Metrics can be enabled quickly during large scale events with need for code change or deployments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
without?
Meters define the way a metric is measured. Here are the list of meters: | ||
|
||
**Counter :** Number of times a metric is reported. These kind of metrics can be incremented or decremented. | ||
For example: number of requests made since the start of application |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This definition seems a bit odd.
Number of times a metric is reported.
Should this be something like the number of times an event has occurred?
These kind of metrics can be incremented or decremented.
Once an event has occurred you cannot take it back so decrementing doesn't make sense. for example with the number of requests made since the start of the application, allowing a decrement would give an incorrect result.
See other comment about different uses of the term Counter between systems. This definition seems to mix both.
docs/design/core/metrics/README.md
Outdated
**Counter :** Number of times a metric is reported. These kind of metrics can be incremented or decremented. | ||
For example: number of requests made since the start of application | ||
|
||
**Timer :** Records the time between start of an event and end of an event. An example is the time taken (latency) to complete a request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the table of metrics, the timestamps like start and end time also seem to be labeled as timers. Should those be gauges?
* By default, SDK creates and uses only CloudWatch publisher with default options (Default credential chain | ||
* and region chain). | ||
* To use CloudWatch publisher with custom options or any other publishers, create a | ||
* #PublisherConfiguration object and set it in the ClientOverrideConfiguration on the client. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I have an application that uses various libraries using the SDK, then is there a way to configure the metrics consistently for all uses of the SDK without needing to have each library explicitly do the right thing or configure it?
We currently do it using an ExecutionInterceptor which can be loaded automatically if it is on the classpath. This has been handy for getting consistent instrumentation for all uses in an app without needing to worry about how the client was setup in the various libraries that are pulled in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you can set environment variables or system properties that can globally enable metrics.
docs/design/core/metrics/Design.md
Outdated
* A representation of data collected | ||
* Metric is uniquely represented by its name | ||
* Metric can be one of the following types: Constant, Counter, Gauge, Timer | ||
* Metric can have tags. A Tag represent the category it belongs to (like Default, HttpClient, Streaming etc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tag -> tag
docs/design/core/metrics/Design.md
Outdated
* SDK provides implementations to publish metrics to services like [Amazon CloudWatch](https://aws.amazon.com/cloudwatch/), [Client Side Monitoring](https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/sdk-metrics.html) (also known as AWS SDK Metrics for Enterprise Support) | ||
* Customers can implement the interface and register the custom implementation to publish metrics to a platform not supported in the SDK. | ||
* MetricPublishers can have different behaviors in terms of list of metrics to publish, publishing frequency, configuration needed to publish etc. | ||
* Metrics can be explicitly published to the platform by calling publish() method. This can be useful in scenarios when the application fails |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or the application is short-lived
| ApiCallEndTime | Timer | The end time of the request | ||
| ApiCallLatency | Timer | The total time taken to finish a request (inclusive of all retries), ApiCallEndTime - ApiCallStartTime | ||
| MarshallingLatency | Timer | The time taken to marshall the request | ||
| ApiCallAttemptCount | Counter | Total number of attempts that were made by the service client to fulfill this request before succeeding or failing. (Value is 1 if there are no retries) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I follow the ask here. Is this not what this metric is?
* By default, SDK creates and uses only CloudWatch publisher with default options (Default credential chain | ||
* and region chain). | ||
* To use CloudWatch publisher with custom options or any other publishers, create a | ||
* #PublisherConfiguration object and set it in the ClientOverrideConfiguration on the client. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you can set environment variables or system properties that can globally enable metrics.
| Api | ConstantGauge | The name of the AWS API the request is made to | ||
| StreamingRequest | ConstantGauge | True if the request has streaming payload | ||
| StreamingResponse | ConstantGauge | True if the response has streaming payload | ||
| ApiCallStartTime | Timer | The start time of the request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 I like timestamp over timer. Will update.
ac9a527
to
1db31f2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This design does not account for flexible request level metrics logging. I have added some details as to why that is very much desirable and request it to be taken into consideration.
| ApiCallEndTime | Timer | The end time of the request | ||
| ApiCallLatency | Timer | The total time taken to finish a request (inclusive of all retries), ApiCallEndTime - ApiCallStartTime | ||
| MarshallingLatency | Timer | The time taken to marshall the request | ||
| ApiCallAttemptCount | Counter | Total number of attempts that were made by the service client to fulfill this request before succeeding or failing. (Value is 1 if there are no retries) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I don't care about the individual attempts but want to log whether the overall call from the client's perspective succeeded or not, I will need something like the last Http Status code at the APICall level. Same for the Exception. Thoughts?
|
||
* Reporting is transferring the collected metrics to Publishers. | ||
* To report metrics to a publisher, call the registerMetrics(MetricRegistry) method on the MetricPublisher. | ||
* There is no requirement for Publisher to publish the reported metrics immediately after calling this method. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will there be a way to publish metric reports separately for each request? This is so that I can tie the metrics logging with my parent unit of work.
This is useful for instance when I am serving a request for my callers and as part of that I need to call an AWS API, I would like to log metrics for that particular call along with other metrics for the incoming request to my service. For this to work, I would need to be able to attach a publisher at the request level. This publisher will contain other custom metrics and at the end of my incoming request processing would flush the collected metrics to my custom destination (log file, some destination over Http).
Also if a publisher is configured at the request level, and another one is configured at the SDK level, only one should be invoked with the request level taking precedence.
I had mentioned more details in #23 (comment)
Any updates on this? |
Will be merging this as-is in the current state for now and will open PRs for further edits and amendments as we ramp up work on this feature again. |
Kudos, SonarCloud Quality Gate passed! 0 Bugs |
No description provided.