forked from open-telemetry/opentelemetry-specification
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Metrics prototype scenario (open-telemetry#146)
* metrics prototype scenario * rename * fix typo * Update text/metrics/0146-metrics-prototype-scenarios.md Co-authored-by: Cijo Thomas <[email protected]> * Update text/metrics/0146-metrics-prototype-scenarios.md Co-authored-by: Cijo Thomas <[email protected]> * Update text/metrics/0146-metrics-prototype-scenarios.md Co-authored-by: Cijo Thomas <[email protected]> * Update text/metrics/0146-metrics-prototype-scenarios.md Co-authored-by: Cijo Thomas <[email protected]> * clarify GA/stable * add example to exemplar * adjust the proposed timeline considering most folks will be on vacation in Dec. * fix nits * adjust the ToC * fix nits * Update text/metrics/0146-metrics-prototype-scenarios.md Co-authored-by: Leighton Chen <[email protected]> * Update text/metrics/0146-metrics-prototype-scenarios.md Co-authored-by: Leighton Chen <[email protected]> * adjust wording * fix typo * address review comment * address comments * Update text/metrics/0146-metrics-prototype-scenarios.md Co-authored-by: Cijo Thomas <[email protected]> Co-authored-by: Cijo Thomas <[email protected]> Co-authored-by: Leighton Chen <[email protected]>
- Loading branch information
1 parent
0ccf798
commit edc0bd7
Showing
1 changed file
with
222 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,222 @@ | ||
# Scenarios for Metrics API/SDK Prototyping | ||
|
||
With the stable release of the tracing specification, the OpenTelemetry | ||
community is willing to spend more energy on metrics API/SDK. The goal is to get | ||
the metrics API/SDK specification to | ||
[`Experimental`](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/versioning-and-stability.md#experimental) | ||
state by end of 5/2021, and make it | ||
[`Stable`](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/versioning-and-stability.md#stable) | ||
before end of 2021: | ||
|
||
* By end of 5/31/2021, we should have a good confidence that we can recommend | ||
language client owners to work on metrics preview release. This means starting | ||
from 6/1/2021 the specification should not have major surprises or big | ||
changes. We will then start recommending client maintainers to implement it. | ||
We might introduce additional features but there should be a high bar. | ||
|
||
* By end of 9/30/2021, we should mark the metrics API/SDK specification as | ||
[`Feature-freeze`](https://github.com/open-telemetry/opentelemetry-specification/blob/1afab39e5658f807315abf2f3256809293bfd421/specification/document-status.md#feature-freeze), | ||
and focus on bug fixing or editorial changes. | ||
|
||
* By end of 11/30/2021, we want to have a stable release of metrics API/SDK | ||
specification, with multiple language SIGs providing RC (release candidate) or | ||
[stable](https://github.com/open-telemetry/opentelemetry-specification/blob/9047c91412d3d4b7f28b0f7346d8c5034b509849/specification/versioning-and-stability.md#stable) | ||
clients. | ||
|
||
In this document, we will focus on two scenarios that we use for prototyping | ||
metrics API/SDK. The goal is to have two scenarios which clearly capture the | ||
major requirements, so we can work with language client SIGs to prototype, | ||
gather the learnings, determine the scopes and stages. Later the scenarios can | ||
be used as examples and test cases for all the language clients. | ||
|
||
Here are the languages we've agreed to use during the prototyping: | ||
|
||
* C# | ||
* Java | ||
* Python | ||
|
||
In order to not undertake such an enormous task at once, we will need to have an incremental approach and divide the work into multiple | ||
stages: | ||
|
||
1. Do the end-to-end prototype to get the overall understanding of the problem | ||
domain. We should also clarify the scope and be able to articulate it | ||
precisely during this stage, here are some examples: | ||
|
||
* Why do we want to introduce brand new metrics APIs versus taking a well | ||
established API (e.g. Prometheus and Micrometer), what makes OpenTelemetry | ||
metrics API different (e.g. Baggage)? | ||
* Do we need to consider OpenCensus Stats API shim, or this is out of scope? | ||
|
||
2. Focus on a core subset of API, cover the end-to-end library instrumentation | ||
scenario. At this stage we don't expect to cover all the APIs as some of them | ||
might be very similar (e.g. if we know how to record an integer, we don't | ||
have to work on float/double as we can add them later by replicating what | ||
we've done for integer). | ||
|
||
3. Focus on a core subset of SDK. This would help us to get the end-to-end | ||
application. | ||
|
||
4. Replicate stage 2 to cover the complete set of APIs. | ||
|
||
5. Replicate stage 3 to cover the complete set of SDKs. | ||
|
||
## Scenario 1: Grocery | ||
|
||
The **Grocery** scenario covers how a developer could use metrics API and SDK in | ||
a final application. It is a self-contained application which covers: | ||
|
||
* How to instrument the code in a vendor agnostic way | ||
* How to configure the SDK and exporter | ||
|
||
Considering there might be multiple grocery stores, the metrics we collect will | ||
have the store name as a dimension - which is fairly static (not changing while | ||
the store is running). | ||
|
||
The store has plenty supply of potato and tomato, with the following price: | ||
|
||
* Potato: $1.00 / ea | ||
* Tomato: $3.00 / ea | ||
|
||
Each customer has a unique name (e.g. customerA, customerB), a customer could | ||
come to the store multiple times. Here goes the Python snippet: | ||
|
||
```python | ||
store = GroceryStore("Portland") | ||
store.process_order("customerA", {"potato": 2, "tomato": 3}) | ||
store.process_order("customerB", {"tomato": 10}) | ||
store.process_order("customerC", {"potato": 2}) | ||
store.process_order("customerA", {"tomato": 1}) | ||
``` | ||
|
||
We will need the following metrics every minute: | ||
|
||
**Order info:** | ||
|
||
| Store | Customer | Number of Orders | Amount (USD) | | ||
| -------- | --------- | ---------------- | ------------ | | ||
| Portland | customerA | 2 | 14.00 | | ||
| Portland | customerB | 1 | 30.00 | | ||
| Portland | customerC | 1 | 2.00 | | ||
|
||
**Items sold:** | ||
|
||
| Store | Customer | Item | Count | | ||
| -------- | --------- | ------ | ----- | | ||
| Portland | customerA | potato | 2 | | ||
| Portland | customerA | tomato | 4 | | ||
| Portland | customerB | tomato | 10 | | ||
| Portland | customerC | potato | 2 | | ||
|
||
## Scenario 2: HTTP Server | ||
|
||
The _HTTP Server_ scenario covers how a library developer X could use metrics | ||
API to instrument a library, and how the application developer Y can configure | ||
the library to use OpenTelemetry SDK in a final application. X and Y are working | ||
for different companies and they don't communicate. The demo has two parts - the | ||
library (HTTP lib and ClimateControl lib owned by X) and the server app (owned by Y): | ||
|
||
* How developer X could instrument the library code in a vendor agnostic way | ||
* Performance is critical for X | ||
* X doesn't know which metrics and which dimensions will Y pick | ||
* X doesn't know the aggregation time window, nor the final destination of the | ||
metrics | ||
* X would like to provide some default recommendation (e.g. default | ||
dimensions, aggregation time window, histogram buckets) so consumers of his | ||
library can have a better onboarding experience. | ||
* How developer Y could configure the SDK and exporter | ||
* How should Y hook up the metrics SDK with the library | ||
* How should Y configure the time window(s) and destination(s) | ||
* How should Y pick the metrics and the dimensions | ||
|
||
### Library Requirements | ||
|
||
The library developer (developer X) will provide two libraries: | ||
|
||
* Server climate control library - a library which monitors and controls the | ||
temperature and humidity of the server. | ||
* HTTP server library - a library which provides HTTP service. | ||
|
||
Both libraries will provide out-of-box metrics, the metrics have two categories: | ||
|
||
* Push metrics - the value is reported (via the API) when it is available, and | ||
collected (via the SDK) based on the ask from consumer(s). If there is no ask | ||
from the consumer, the API will be no-op and the data will be dropped on the | ||
floor. | ||
* Pull metrics - the value is always available, and is only reported and | ||
collected based on the ask from consumer(s). If there is no ask from the | ||
consumer, the value will not be reported at all (e.g. there is no API call to | ||
fetch the temperature unless someone is asking for the temperature). | ||
|
||
#### Server Climate Control Library | ||
|
||
Note: the **Host Name** should leverage [`OpenTelemetry | ||
Resource`](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/resource/sdk.md), | ||
so it should be covered by the metrics SDK rather than API, and strictly | ||
speaking it is not considered as a "dimension" from the SDK perspective. | ||
|
||
**Server temperature:** | ||
|
||
| Host Name | Temperature (F) | | ||
| --------- | --------------- | | ||
| MachineA | 65.3 | | ||
|
||
**Server humidity:** | ||
|
||
| Host Name | Humidity (%) | | ||
| --------- | ------------ | | ||
| MachineA | 21 | | ||
|
||
#### HTTP Server Library | ||
|
||
**Received HTTP requests:** | ||
|
||
Note: the **Client Type** is passed in via the [`OpenTelemetry | ||
Baggage`](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/baggage/api.md), | ||
strictly speaking it is not part of the metrics API, but it is considered as a | ||
"dimension" from the metrics SDK perspective. | ||
|
||
| Host Name | Process ID | Client Type | HTTP Method | HTTP Host | HTTP Flavor | Peer IP | Peer Port | Host IP | Host Port | | ||
| --------- | ---------- | ----------- | ----------- | --------- | ----------- | --------- | --------- | --------- | --------- | | ||
| MachineA | 1234 | Android | GET | otel.org | 1.1 | 127.0.0.1 | 51327 | 127.0.0.1 | 80 | | ||
| MachineA | 1234 | Android | POST | otel.org | 1.1 | 127.0.0.1 | 51328 | 127.0.0.1 | 80 | | ||
| MachineA | 1234 | iOS | PUT | otel.org | 1.1 | 127.0.0.1 | 51329 | 127.0.0.1 | 80 | | ||
|
||
**HTTP server request duration:** | ||
|
||
Note: the server duration is only available for **finished HTTP requests**. | ||
|
||
| Host Name | Process ID | Client Type | HTTP Method | HTTP Host | HTTP Status Code | HTTP Flavor | Peer IP | Peer Port | Host IP | Host Port | Duration (ms) | | ||
| --------- | ---------- | ----------- | ----------- | --------- | ---------------- | ----------- | --------- | --------- | --------- | --------- | ------------- | | ||
| MachineA | 1234 | Android | GET | otel.org | 200 | 1.1 | 127.0.0.1 | 51327 | 127.0.0.1 | 80 | 8.5 | | ||
| MachineA | 1234 | Android | POST | otel.org | 304 | 1.1 | 127.0.0.1 | 51328 | 127.0.0.1 | 80 | 100.0 | | ||
|
||
### Application Requirements | ||
|
||
The application owner (developer Y) would only want the following metrics: | ||
|
||
* Server temperature - reported every 5 seconds | ||
* Server humidity - reported every minute | ||
* HTTP server request duration, reported every 5 seconds, with a subset of the | ||
dimensions: | ||
* Host Name | ||
* HTTP Method | ||
* HTTP Host | ||
* HTTP Status Code | ||
* Client Type | ||
* 90%, 95%, 99% and 99.9% server duration | ||
* HTTP request counters, reported every 5 seconds: | ||
* Total number of received HTTP requests | ||
* Total number of finished HTTP requests | ||
* Number of currently-in-flight HTTP requests (concurrent HTTP requests) | ||
|
||
| Host Name | Process ID | HTTP Host | Received Requests | Finished Requests | Concurrent Requests | | ||
| --------- | ---------- | --------- | ----------------- | ----------------- | ------------------- | | ||
| MachineA | 1234 | otel.org | 630 | 601 | 29 | | ||
| MachineA | 5678 | otel.org | 1005 | 1001 | 4 | | ||
* Exception samples (exemplar) - in case HTTP 5xx happened, developer Y would | ||
want to see a sample request with trace id, span id and all the dimensions | ||
(IP, Port, etc.) | ||
|
||
| Trace ID | Span ID | Host Name | Process ID | Client Type | HTTP Method | HTTP Host | HTTP Status Code | HTTP Flavor | Peer IP | Peer Port | Host IP | Host Port | Exception | | ||
| -------------------------------- | ---------------- | --------- | ---------- | ----------- | ----------- | --------- | ---------------- | ----------- | --------- | --------- | --------- | --------- | -------------------- | | ||
| 8389584945550f40820b96ce1ceb9299 | 745239d26e408342 | MachineA | 1234 | iOS | PUT | otel.org | 500 | 1.1 | 127.0.0.1 | 51329 | 127.0.0.1 | 80 | SocketException(...) | |