Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update docs based on latest discussions #1854

Merged
merged 1 commit into from
May 20, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
207 changes: 60 additions & 147 deletions docs/design/core/metrics/Design.md
Original file line number Diff line number Diff line change
@@ -1,41 +1,62 @@
# SDK Metrics System
## Concepts
### Metric
* A representation of data collected
* Metric can be one of the following types: Counter, Gauge, Timer
* Metric can be associated to a category. Some of the metric categories are Default, HttpClient, Streaming etc
* A measure of some aspect of the SDK. Examples include request latency, number
of pooled connections and retries executed.

### MetricRegistry
* A metric is associated to a category. Some of the metric categories are
`Default`, `HttpClient` and `Streaming`. This enables customers to enable
metrics only for categories they are interested in.

* A MetricRegistry represent an interface to store the collected metric data. It can hold different types of Metrics
described above
* MetricRegistry is generic and not tied to specific category (ApiCall, HttpClient etc) of metrics.
* Each API call has it own instance of a MetricRegistry. All metrics collected in the ApiCall lifecycle are stored in
that instance.
* A MetricRegistry can store other instances of same type. This can be used to store metrics for each Attempt in an Api
Call.
* [Interface prototype](prototype/MetricRegistry.java)
Refer to the [Metrics List](./MetricsList.md) document for a complete list of
standard metrics collected by the SDK.

### Metric Collector

* `MetricCollector` is a typesafe aggregator of of metrics. This is the primary
interface through which other SDK components report metrics they emit, using
the `reportMetric(SdkMetric,Object)` method.

* `MetricCollector` objects allow for nesting. This enables metrics to be
collected in the context of other metric events. For example, for a single
API call, there may be multiple request attempts if there are retries. Each
attempt's associated metric events can be stored in their own
`MetricCollector`, all of which are children of another collector that
represents metrics for the entire API call.

A child of a collector is created by calling its `childCollector(String)`
method.

* The `collect()` method returns a `MetricCollection`. This class essentially
returns an immutable version of the tree formed by the collector and its
children, which are also represented by `MetricCollection` objects.

Note that calling `collect()` implies that child collectors are are also
collected.

* Each collector has a name. Often this is will be used to describe the class of
metrics that it collects; e.g. `"ApiCall"` and `"ApiCallAttempt"`.

* [Interface prototype](prototype/MetricCollector.java)

### MetricPublisher

* A MetricPublisher represent an interface to publish the collected metrics to a external source.
* SDK provides implementations to publish metrics to services like [Amazon
CloudWatch](https://aws.amazon.com/cloudwatch/), [Client Side
Monitoring](https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/sdk-metrics.html) (also known as AWS SDK
Metrics for Enterprise Support)
* Customers can implement the interface and register the custom implementation to publish metrics to a platform not
supported in the SDK.
* MetricPublishers can have different behaviors in terms of list of metrics to publish, publishing frequency,
configuration needed to publish etc.
* Metrics can be explicitly published to the platform by calling publish() method. This can be useful in scenarios when
the application fails and customer wants to flush metrics before exiting the application.
* [Interface prototype](prototype/MetricPublisher.java)
* A `MetricPublisher` publishes collected metrics to a system(s) outside of the
SDK. It takes a `MetricCollection` object, potentially transforms the data
into richer metrics, and also into a format the receiver expects.

### Reporting
* By default, the SDK will provide implementations to publish metrics to [Amazon
CloudWatch](https://aws.amazon.com/cloudwatch/) and [Client Side
Monitoring](https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/sdk-metrics.html)
(also known as AWS SDK Metrics for Enterprise Support).

* Reporting is transferring the collected metrics to Publishers.
* To report metrics to a publisher, call the registerMetrics(MetricRegistry) method on the MetricPublisher.
* There is no requirement for Publisher to publish the reported metrics immediately after calling this method.
* Metrics publishers are pluggable within the SDK, allowing customers to
provide their own custom implementations.

* Metric publishers can have different behaviors in terms of list of metrics to
publish, publishing frequency, configuration needed to publish etc.

* [Interface prototype](prototype/MetricPublisher.java)

## Enabling Metrics

Expand Down Expand Up @@ -155,126 +176,18 @@ New modules are created to support metrics feature.
* Customers has to **explicitly add dependency** on these modules to use the sdk provided publishers


## Sequence Diagram

<b>Metrics Collection</b>

<div style="text-align: center;">

![Metrics Collection](images/MetricCollection.jpg)

</div>

<b>MetricPublisher</b>

<div style="text-align: center;">

![MetricPublisher fig.align="left"](images/MetricPublisher.jpg)

</div>

1. Client enables metrics feature through MetricConfigurationProvider and configure publishers through
MetricPublisherConfiguration.
2. For each API call, a new MetricRegistry object is created and stored in the ExecutionAttributes. If metrics are not
enabled, a NoOpMetricRegistry is used.
3. At each metric collection point, the metric is registered in the MetricRegistry object if its category is enabled in
MetricConfigurationProvider.
4. The metrics that are collected once for a Api Call execution are stored in the METRIC_REGISTRY ExecutionAttribute.
5. The metrics that are collected per Api Call attempt are stored in new MetricRegistry instances which are part of the
ApiCall MetricRegistry. These MetricRegistry instance for the current attempt is also accessed through
ATTEMPT_METRIC_REGISTRY ExecutionAttribute.
6. At end of API call, report the MetricRegistry object to MetricPublishers by calling registerMetrics(MetricRegistry)
method. This is done in an ExecutionInterceptor.
7. Steps 2 to 6 are repeated for each API call
8. MetricPublisher calls publish() method to report metrics to external sources. The frequency of publish() method call
is unique to Publisher implementation.
9. Client has access to all registered publishers and it can call publish() method explicitly if desired.


<b>CloudWatch MetricPublisher</b>

<div style="text-align: center;">

![CloudWatch MetricPublisher](images/CWMetricPublisher.jpg)

</div>

## Implementation Details
Few important implementation details are discussed in this section.

SDK modules can be organized as shown in this image.

<div style="text-align: center;">

![Module Hierarchy](images/MetricsModulesHierarchy.png)

</div>

* Core modules - Modules in the core directory while have access to ExecutionContext and ExecutionAttributes
* Downstream modules - Modules where execution occurs after core modules. For example, http-clients is downstream module
as the request is transferred from core to http client for further execution.
* Upstream modules - Modules that live in layers above core. Examples are High Level libraries (HLL) or Applications
that use SDK. Execution goes from Upstream modules to core modules.

### Core Modules
* SDK will use ExecutionAttributes to pass the MetricConfigurationProvider information through out the core module where
core request-response metrics are collected.
* Instead of checking whether metrics is enabled at each metric collection point, SDK will use the instance of
NoOpMetricRegistry (if metrics are disabled) and DefaultMetricRegistry (if metrics are enabled).
* The NoOpMetricRegistry class does not collect or store any metric data. Instead of creating a new NoOpMetricRegistry
instance for each request, use the same instance for every request to avoid additional object creation.
* The DefaultMetricRegistry class will only collect metrics if they belong to the MetricCategory list provided in the
MetricConfigurationProvider. To support this, DefaultMetricRegistry is decorated by another class to filter metric
categories that are not set in MetricConfigurationProvider.

### Downstream Modules
* The MetricRegistry object and other required metric configuration details will be passed to the classes in downstream
modules.
* For example, HttpExecuteRequest for sync http client, AsyncExecuteRequest for async http client.
* Downstream modules record the metric data directly into the given MetricRegistry object.
* As we use same MetricRegistry object for core and downstream modules, both metrics will be reported to the Publisher
together.

### Upstream Modules
* As MetricRegistry object is created after the execution is passed from Upstream modules, these modules won't be able
to modify/add to the core metrics.
* If upstream modules want to report additional metrics using the registered publishers, they would need to create
MetricRegistry instances and explicitly call the methods on the Publishers.
* It would be useful to get the low-level API metrics in these modules, so SDK will expose APIs to get an immutable
version of the MetricRegistry object so that upstream classes can use that information in their metric calculation.

### Reporting
* Collected metrics are reported to the configured publishers at the end of each Api Call by calling
`registerMetrics(MetricRegistry)` method on MetricPublisher.
* The MetricRegistry argument in the registerMetrics method will have data on the entire Api Call including retries.
* This reporting is done in `MetricsExecutionInterceptor` via `afterExecution()` and `onExecutionFailure()` methods.
* `MetricsExecutionInterceptor` will always be the last configured ExecutionInterceptor in the interceptor chain


## Performance
One of the main tenet for metrics is “Enabling default metrics should have minimal impact on the application
performance". The following design choices are made to ensure enabling metrics does not effect performance
significantly.
* When collecting metrics, a NoOpRegistry is used if metrics are disabled. All methods in this registry are no-op and
return immediately. This also has the additional benefit of avoid metricsEnabled check at each metric collection
point.
* Metric publisher implementations can involve network calls and impact latency if done in blocking way. So all SDK
publisher implementation will process the metrics asynchronously and does not block the actual request.


## Testing

To ensure performance is not impacted due to metrics, tests should be written with various scenarios and a baseline for
overhead should be created. These tests should be run regularly to catch regressions.

### Test Cases

SDK will be tested under load for each of these test cases using the load testing framework we already have. Each of
these test case results should be compared with metrics feature disabled & enabled, and then comparing the results.

1. Enable each metrics publisher (CloudWatch, CSM) individually.
2. Enable all metrics publishers.
3. Individually enable each metric category to find overhead for each MetricCategory.
One of the main tenets for metrics is “Enabling default metrics should have
minimal impact on the application performance". The following design choices are
made to ensure enabling metrics does not effect performance significantly.

* When collecting metrics, a No-op metric collector is used if metrics are
disabled. All methods in this collector are no-op and return immediately.

* Metric publisher implementations can involve network calls and impact latency
if done in blocking way. Therefore, all SDK publisher implementations will
process the metrics asynchronously to not block the request thread.

* Performance tests will be written and run with each release to ensure that the
SDK performs well even when metrics are enabled and being collected and
published.
41 changes: 41 additions & 0 deletions docs/design/core/metrics/prototype/MetricCollection.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
/*
* Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License").
* You may not use this file except in compliance with the License.
* A copy of the License is located at
*
* http://aws.amazon.com/apache2.0
*
* or in the "license" file accompanying this file. This file is distributed
* on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
* express or implied. See the License for the specific language governing
* permissions and limitations under the License.
*/

/**
* An immutable collection of metrics.
*/
public interface MetricCollection extends SdkIterable<MetricRecord<?>> {
/**
* @return The name of this metric collection.
*/
String name();

/**
* Return all the values of the given metric. An empty list is returned if
* there are no reported values for the given metric.
*
* @param metric The metric.
* @param <T> The type of the value.
* @return All of the values of this metric.
*/
<T> List<T> metricValues(SdkMetric<T> metric);

/**
* Returns the child metric collections. An empty list is returned if there
* are no children.
* @return The child metric collections.
*/
List<MetricCollection> children();
}
58 changes: 58 additions & 0 deletions docs/design/core/metrics/prototype/MetricCollector.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
/*
* Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License").
* You may not use this file except in compliance with the License.
* A copy of the License is located at
*
* http://aws.amazon.com/apache2.0
*
* or in the "license" file accompanying this file. This file is distributed
* on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
* express or implied. See the License for the specific language governing
* permissions and limitations under the License.
*/

/**
* Used to collect metrics collected by the SDK.
* <p>
* Collectors are allowed to nest, allowing metrics to be collected within the
* context of other metrics.
*/
@NotThreadSafe
@SdkPublicApi
public interface MetricCollector {
/**
* @return The name of this collector.
*/
String name();

/**
* Report a metric.
*/
<T> void reportMetric(SdkMetric<T> metric, T value);

/**
*
* @param name The name of the child collector.
* @return The child collector.
*/
MetricCollector createChild(String name);

/**
* Return the collected metrics. The returned {@code MetricCollection} must
* preserve the children of this collector; in other words the tree formed
* by this collector and its children should be identical to the tree formed
* by the returned {@code MetricCollection} and its child collections.
* <p>
* Calling {@code collect()} prevents further invocations of {@link
* #reportMetric(SdkMetric, Object)}.
*
* @return The collected metrics.
*/
MetricCollection collect();

static MetricCollector create(String name) {
return DefaultMetricCollector.create(name);
}
}
Loading