Metrics: Add ReadMe with tenets, metrics list to collect, design doc. #1304

varunnvs92 · 2019-06-24T17:19:28Z

No description provided.

…ototype interfaces

codecov-io · 2019-07-01T05:01:16Z

Codecov Report

Merging #1304 into master will decrease coverage by <.01%.
The diff coverage is 100%.

@@             Coverage Diff              @@
##             master    #1304      +/-   ##
============================================
- Coverage     75.58%   75.58%   -0.01%     
  Complexity      638      638              
============================================
  Files           898      898              
  Lines         28136    28140       +4     
  Branches       2221     2221              
============================================
+ Hits          21268    21269       +1     
- Misses         5848     5850       +2     
- Partials       1020     1021       +1

Flag	Coverage Δ	Complexity Δ
#unittests	`75.58% <100%> (-0.01%)`	`638 <0> (ø)`

Impacted Files	Coverage Δ	Complexity Δ
...n/awssdk/auth/signer/internal/Aws4SignerUtils.java	`100% <100%> (ø)`	`0 <0> (ø)`	⬇️
...igner/internal/BaseEventStreamAsyncAws4Signer.java	`96.55% <100%> (+0.18%)`	`0 <0> (ø)`	⬇️
...nio/netty/internal/OldConnectionReaperHandler.java	`81.81% <0%> (-9.1%)`	`0% <0%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0131c08...bb94adc. Read the comment docs.

brharrington · 2019-07-22T13:21:34Z

docs/design/core/metrics/Design.md

+## Concepts
+### Metric
+* A representation of data collected
+* Metric can be one of the following types: Counter, Gauge, Timer


Note, Counter can be ambiguous as there are at least two pretty common definitions:

Measuring a rate of change, e.g., RPS, as used in Prometheus, Spectator, Micrometer.

A gauge that can be incremented or decremented to model something like a queue size. Used in Dropwizard Metrics.

Reading these over. Will come back with thoughts.

brharrington · 2019-07-22T13:38:45Z

docs/design/core/metrics/MetricsList.md

+| Api                            | ConstantGauge | The name of the AWS API the request is made to
+| StreamingRequest               | ConstantGauge | True if the request has streaming payload
+| StreamingResponse              | ConstantGauge | True if the response has streaming payload
+| ApiCallStartTime               | Timer         | The start time of the request


This seems to be a timestamp rather than a timer.

+1 I like timestamp over timer. Will update.

brharrington · 2019-07-22T13:41:20Z

docs/design/core/metrics/MetricsList.md

+| ApiCallEndTime                 | Timer         | The end time of the request
+| ApiCallLatency                 | Timer         | The total time taken to finish a request (inclusive of all retries), ApiCallEndTime - ApiCallStartTime
+| MarshallingLatency             | Timer         | The time taken to marshall the request
+| ApiCallAttemptCount            | Counter       | Total number of attempts that were made by the service client to fulfill this request before succeeding or failing. (Value is 1 if there are no retries)


Is there any way to know the maximum number of attempts for a given call?

Not sure I follow the ask here. Is this not what this metric is?

I mean the configured maximum number of attempts before it will give up. This metric shows the number of attempts that are actually used. Having both allows us to see how close it is to failing outright because it will exhaust all attempts.

If I don't care about the individual attempts but want to log whether the overall call from the client's perspective succeeded or not, I will need something like the last Http Status code at the APICall level. Same for the Exception. Thoughts?

brharrington · 2019-07-22T13:46:55Z

docs/design/core/metrics/README.md

+# Project Details
+
+1. Metrics are disabled by default and should be enabled explicitly by customers. Enabling metrics will introduce small overhead.
+2. Metrics can be enabled quickly during large scale events with need for code change or deployments.


brharrington · 2019-07-22T13:51:51Z

docs/design/core/metrics/README.md

+Meters define the way a metric is measured. Here are the list of meters:
+
+**Counter :** Number of times a metric is reported. These kind of metrics can be incremented or decremented.  
+For example: number of requests made since the start of application


This definition seems a bit odd.

Number of times a metric is reported.

Should this be something like the number of times an event has occurred?

These kind of metrics can be incremented or decremented.

Once an event has occurred you cannot take it back so decrementing doesn't make sense. for example with the number of requests made since the start of the application, allowing a decrement would give an incorrect result.

See other comment about different uses of the term Counter between systems. This definition seems to mix both.

brharrington · 2019-07-22T13:52:38Z

docs/design/core/metrics/README.md

+**Counter :** Number of times a metric is reported. These kind of metrics can be incremented or decremented.  
+For example: number of requests made since the start of application
+
+**Timer :** Records the time between start of an event and end of an event. An example is the time taken (latency) to complete a request.


In the table of metrics, the timestamps like start and end time also seem to be labeled as timers. Should those be gauges?

brharrington · 2019-07-22T14:01:47Z

docs/design/core/metrics/prototype/MetricPublisherConfiguration.java

+ * By default, SDK creates and uses only CloudWatch publisher with default options (Default credential chain
+ * and region chain).
+ * To use CloudWatch publisher with custom options or any other publishers, create a
+ * #PublisherConfiguration object and set it in the ClientOverrideConfiguration on the client.


If I have an application that uses various libraries using the SDK, then is there a way to configure the metrics consistently for all uses of the SDK without needing to have each library explicitly do the right thing or configure it?

We currently do it using an ExecutionInterceptor which can be loaded automatically if it is on the classpath. This has been handy for getting consistent instrumentation for all uses in an app without needing to worry about how the client was setup in the various libraries that are pulled in.

Yes, you can set environment variables or system properties that can globally enable metrics.

spfink · 2019-06-24T17:41:45Z

docs/design/core/metrics/Design.md

+* A representation of data collected
+* Metric is uniquely represented by its name
+* Metric can be one of the following types: Constant, Counter, Gauge, Timer
+* Metric can have tags. A Tag represent the category it belongs to (like Default, HttpClient, Streaming etc)


spfink · 2019-06-24T17:42:30Z

docs/design/core/metrics/Design.md

+* SDK provides implementations to publish metrics to services like [Amazon CloudWatch](https://aws.amazon.com/cloudwatch/), [Client Side Monitoring](https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/sdk-metrics.html) (also known as AWS SDK Metrics for Enterprise Support)
+* Customers can implement the interface and register the custom implementation to publish metrics to a platform not supported in the SDK.
+* MetricPublishers can have different behaviors in terms of list of metrics to publish, publishing frequency, configuration needed to publish etc.
+* Metrics can be explicitly published to the platform by calling publish() method. This can be useful in scenarios when the application fails 


or the application is short-lived

spfink · 2019-08-06T20:24:47Z

docs/design/core/metrics/MetricsList.md

+| ApiCallEndTime                 | Timer         | The end time of the request
+| ApiCallLatency                 | Timer         | The total time taken to finish a request (inclusive of all retries), ApiCallEndTime - ApiCallStartTime
+| MarshallingLatency             | Timer         | The time taken to marshall the request
+| ApiCallAttemptCount            | Counter       | Total number of attempts that were made by the service client to fulfill this request before succeeding or failing. (Value is 1 if there are no retries)


Not sure I follow the ask here. Is this not what this metric is?

spfink · 2019-08-06T20:25:46Z

docs/design/core/metrics/prototype/MetricPublisherConfiguration.java

+ * By default, SDK creates and uses only CloudWatch publisher with default options (Default credential chain
+ * and region chain).
+ * To use CloudWatch publisher with custom options or any other publishers, create a
+ * #PublisherConfiguration object and set it in the ClientOverrideConfiguration on the client.


Yes, you can set environment variables or system properties that can globally enable metrics.

spfink · 2019-08-06T20:26:52Z

docs/design/core/metrics/MetricsList.md

+| Api                            | ConstantGauge | The name of the AWS API the request is made to
+| StreamingRequest               | ConstantGauge | True if the request has streaming payload
+| StreamingResponse              | ConstantGauge | True if the response has streaming payload
+| ApiCallStartTime               | Timer         | The start time of the request


+1 I like timestamp over timer. Will update.

swaranga

This design does not account for flexible request level metrics logging. I have added some details as to why that is very much desirable and request it to be taken into consideration.

swaranga · 2019-09-14T20:07:58Z

docs/design/core/metrics/MetricsList.md

+| ApiCallEndTime                 | Timer         | The end time of the request
+| ApiCallLatency                 | Timer         | The total time taken to finish a request (inclusive of all retries), ApiCallEndTime - ApiCallStartTime
+| MarshallingLatency             | Timer         | The time taken to marshall the request
+| ApiCallAttemptCount            | Counter       | Total number of attempts that were made by the service client to fulfill this request before succeeding or failing. (Value is 1 if there are no retries)


If I don't care about the individual attempts but want to log whether the overall call from the client's perspective succeeded or not, I will need something like the last Http Status code at the APICall level. Same for the Exception. Thoughts?

swaranga · 2019-09-15T03:20:24Z

docs/design/core/metrics/Design.md

+
+* Reporting is transferring the collected metrics to Publishers. 
+* To report metrics to a publisher, call the registerMetrics(MetricRegistry) method on the MetricPublisher.
+* There is no requirement for Publisher to publish the reported metrics immediately after calling this method.


Will there be a way to publish metric reports separately for each request? This is so that I can tie the metrics logging with my parent unit of work.

This is useful for instance when I am serving a request for my callers and as part of that I need to call an AWS API, I would like to log metrics for that particular call along with other metrics for the incoming request to my service. For this to work, I would need to be able to attach a publisher at the request level. This publisher will contain other custom metrics and at the end of my incoming request processing would flush the collected metrics to my custom destination (log file, some destination over Http).

Also if a publisher is configured at the request level, and another one is configured at the SDK level, only one should be invoked with the request level taking precedence.

I had mentioned more details in #23 (comment)

swaranga · 2019-10-04T00:56:07Z

Any updates on this?

dagnir · 2020-01-14T23:58:46Z

Will be merging this as-is in the current state for now and will open PRs for further edits and amendments as we ramp up work on this feature again.

sonarqubecloud · 2020-01-15T04:06:24Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities (and 0 Security Hotspots to review)
0 Code Smells

No Coverage information
No Duplication information

varunnvs92 requested a review from spfink June 24, 2019 17:19

varunnvs92 changed the base branch from sdk-metrics-development to master June 24, 2019 18:13

varunnvs92 changed the title ~~Add ReadMe with tenets, metrics list to collect, design doc.~~ Metrics: Add ReadMe with tenets, metrics list to collect, design doc. Jun 25, 2019

varunnvs92 mentioned this pull request Jun 26, 2019

Add initial tenets #1265

Closed

varunnvs92 force-pushed the varunkn/metrics-docs branch 2 times, most recently from 7c0bab1 to 5ecf61b Compare July 1, 2019 04:26

Add ReadMe with tenets, metrics list to collect, design doc. Added pr…

776d70c

…ototype interfaces

varunnvs92 force-pushed the varunkn/metrics-docs branch from 5ecf61b to 776d70c Compare July 1, 2019 04:28

varunnvs92 added 2 commits July 12, 2019 12:26

Add more details on Reporting

85e732b

Add performance section

57a3de2

brharrington reviewed Jul 22, 2019

View reviewed changes

spfink reviewed Aug 6, 2019

View reviewed changes

Update metrics sequence diagram

1db31f2

varunnvs92 force-pushed the varunkn/metrics-docs branch from ac9a527 to 1db31f2 Compare August 19, 2019 07:06

millems mentioned this pull request Sep 13, 2019

Metrics #23

Closed

swaranga suggested changes Sep 15, 2019

View reviewed changes

Minor formatting fixes

2c7c49e

dagnir added 2 commits January 14, 2020 15:59

Merge branch 'master' into varunkn/metrics-docs

0131c08

Merge branch 'master' into varunkn/metrics-docs

bb94adc

dagnir approved these changes Jan 15, 2020

View reviewed changes

dagnir merged commit 13d63eb into master Jan 15, 2020

zoewangg deleted the varunkn/metrics-docs branch August 19, 2020 22:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metrics: Add ReadMe with tenets, metrics list to collect, design doc. #1304

Metrics: Add ReadMe with tenets, metrics list to collect, design doc. #1304

varunnvs92 commented Jun 24, 2019

codecov-io commented Jul 1, 2019 •

edited

Loading

brharrington Jul 22, 2019

spfink Aug 6, 2019

brharrington Jul 22, 2019

spfink Aug 6, 2019

brharrington Jul 22, 2019

spfink Aug 6, 2019

brharrington Aug 6, 2019

swaranga Sep 14, 2019

brharrington Jul 22, 2019

brharrington Jul 22, 2019

brharrington Jul 22, 2019

brharrington Jul 22, 2019

spfink Aug 6, 2019

spfink Jun 24, 2019

spfink Jun 24, 2019

spfink Aug 6, 2019

spfink Aug 6, 2019

spfink Aug 6, 2019

swaranga left a comment

swaranga Sep 14, 2019

swaranga Sep 15, 2019

swaranga commented Oct 4, 2019

dagnir commented Jan 14, 2020

sonarqubecloud bot commented Jan 15, 2020

Metrics: Add ReadMe with tenets, metrics list to collect, design doc. #1304

Metrics: Add ReadMe with tenets, metrics list to collect, design doc. #1304

Conversation

varunnvs92 commented Jun 24, 2019

codecov-io commented Jul 1, 2019 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

swaranga left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

swaranga commented Oct 4, 2019

dagnir commented Jan 14, 2020

sonarqubecloud bot commented Jan 15, 2020

codecov-io commented Jul 1, 2019 •

edited

Loading