Skip to content

Commit

Permalink
Enforce lint, link checks, fix errors. Remove "approved" status. (ope…
Browse files Browse the repository at this point in the history
…n-telemetry#103)

* Enforce lint, link checks, fix errors. Remove status.

Signed-off-by: Bogdan Drutu <[email protected]>

* Update text/0002-remove-spandata.md

Co-authored-by: Armin Ruech <[email protected]>

* Update text/0006-sampling.md

Co-authored-by: Armin Ruech <[email protected]>

* Update text/0016-named-tracers.md

Co-authored-by: Armin Ruech <[email protected]>

* Update text/0016-named-tracers.md

Co-authored-by: Armin Ruech <[email protected]>

* Update text/0092-logs-vision.md

Co-authored-by: Armin Ruech <[email protected]>

* Allow ? in the headers

Signed-off-by: Bogdan Drutu <[email protected]>

* Disable enforce link because of DDOS protection on github.com

Signed-off-by: Bogdan Drutu <[email protected]>

Co-authored-by: Armin Ruech <[email protected]>
  • Loading branch information
bogdandrutu and arminru authored May 4, 2020
1 parent 6d50d35 commit bd44a75
Show file tree
Hide file tree
Showing 19 changed files with 505 additions and 468 deletions.
24 changes: 12 additions & 12 deletions oteps/0001-telemetry-without-manual-instrumentation.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,5 @@
# (Open) Telemetry Without Manual Instrumentation

**Status:** `approved`

_Cross-language requirements for automated approaches to extracting portable telemetry data with zero source code modification._

## Motivation
Expand All @@ -25,29 +23,28 @@ Many people have correctly observed that “agent” design is highly language-d
### Requirements

Without further ado, here are a set of requirements for “official” OpenTelemetry efforts to accomplish zero-source-code-modification instrumentation (i.e., “OpenTelemetry agents”) in any given language:

* _Manual_ source code modifications "very strongly discouraged", with an exception for languages or environments that leave no credible alternatives. Any code changes must be trivial and `O(1)` per source file (rather than per-function, etc).
* Licensing must be permissive (e.g., ASL / BSD)
* Packaging must allow vendors to “wrap” or repackage the portable (OpenTelemetry) library into a single asset that’s delivered to customers
* That is, vendors do not want to require users to comprehend both an OpenTelemetry package and a vendor-specific package
* That is, vendors do not want to require users to comprehend both an OpenTelemetry package and a vendor-specific package
* Explicit, whitebox OpenTelemetry instrumentation must interoperate with the “automatic” / zero-source-code-modification / blackbox instrumentation.
* If the blackbox instrumentation starts a Span, whitebox instrumentation must be able to discover it as the active Span (and vice versa)
* Relatedly, there also must be a way to discover and avoid potential conflicts/overlap/redundancy between explicit whitebox instrumentation and blackbox instrumentation of the same libraries/packages
* That is, if a developer has already added the “official” OpenTelemetry plugin for, say, gRPC, then when the blackbox instrumentation effort adds gRPC support, it should *not* “double-instrument” it and create a mess of extra spans/etc
* If the blackbox instrumentation starts a Span, whitebox instrumentation must be able to discover it as the active Span (and vice versa)
* Relatedly, there also must be a way to discover and avoid potential conflicts/overlap/redundancy between explicit whitebox instrumentation and blackbox instrumentation of the same libraries/packages
* That is, if a developer has already added the “official” OpenTelemetry plugin for, say, gRPC, then when the blackbox instrumentation effort adds gRPC support, it should *not* “double-instrument” it and create a mess of extra spans/etc
* From the standpoint of the actual telemetry being gathered, the same standards and expectations (about tagging, metadata, and so on) apply to "whitebox" instrumentation and automatic instrumentation
* The code in the OpenTelemetry package must not take a hard dependency on any particular vendor/vendors (that sort of functionality should work via a plugin or registry mechanism)
* Further, the code in the OpenTelemetry package must be isolated to avoid possible conflicts with the host application (e.g., shading in Java, etc)

* Further, the code in the OpenTelemetry package must be isolated to avoid possible conflicts with the host application (e.g., shading in Java, etc)

### Nice-to-have properties

* Run-time integration (vs compile-time integration)
* Automated and modular testing of individual library/package plugins
* Note that this also makes it easy to test against multiple different versions of any given library
* Note that this also makes it easy to test against multiple different versions of any given library
* A fully pluggable architecture, where plugins can be registered at runtime without requiring changes to the central repo at github.com/open-telemetry
* E.g., for ops teams that want to write a plugin for a proprietary piece of legacy software they are unable to recompile
* E.g., for ops teams that want to write a plugin for a proprietary piece of legacy software they are unable to recompile
* Augemntation of whitebox instrumentation by blackbox instrumentation (or, perhaps, vice versa). That is, not only can the trace context be shared by these different flavors of instrumentation, but even things like in-flight Span objects can be shared and co-modified (e.g., to use runtime interposition to grab local variables and attach them to a manually-instrumented span).


## Trade-offs and mitigations

Approaching a problem this language-specific at the cross-language altitude is intrinsically challenging since "different languages are different" – e.g., in Go there is no way to perform the kind of runtime interpositioning that's possible in Python, Ruby, or even Java.
Expand All @@ -59,6 +56,7 @@ There is also a school of thought that we should only be focusing on the bits an
### What is our desired end state for OpenTelemetry end-users?

To reiterate much of the above:

* First and foremost, **portable OpenTelemetry instrumentation can be installed without manual source code modification**
* There’s one “clear winner” when it comes to portable, automatic instrumentation; just like with OpenTracing and OpenCensus, this is a situation where choice is not necessarily a good thing. End-users who wish to contribute instrumentation plugins should not have their enthusiasm and generosity diluted across competing projects.
* As much as such a thing is possible, consistency across languages
Expand All @@ -72,7 +70,7 @@ Given the desired end state, the Datadog tracers seem like the closest-fit, perm

### The overarching (technical) process, per-language

* Start with [the Datadog `dd-trace-foo` tracers](https://github.com/DataDog?utf8=✓&q=dd-trace&type=source&language=)
* Start with [the Datadog `dd-trace-foo` tracers](https://github.com/DataDog)
* For each language:
* Fork the Datadog `datadog/dd-trace-foo` repo into a `open-telemetry/auto-instr-foo` OpenTelemetry repo (exact naming TBD)
* In parallel:
Expand Down Expand Up @@ -102,12 +100,14 @@ Each `auto-instr-foo` repository must have at least one [Maintainer](https://git
## Prior art and alternatives

There are many proprietary APM language agents – no need to survey them all here. There is a much smaller list of "APM agents" (or other auto-instrumentation efforts) that are already permissively-licensed OSS. For instance, when we met to discuss options for JVM (longer notes [here](https://docs.google.com/document/d/1ix0WtzB5j-DRj1VQQxraoqeUuvgvfhA6Sd8mF5WLNeY/edit#heading=h.kjctiyv4rxup)), we came away with the following list:

* [Honeycomb's Java beeline](https://github.com/honeycombio/beeline-java)
* [Datadog's Java tracer](https://github.com/datadog/dd-trace-java)
* [Glowroot](https://glowroot.org/)
* [SpecialAgent](https://github.com/opentracing-contrib/java-specialagent)

The most obvious "alternative approach" would be to choose "starting points" independently in each language. This has several problems:

* Higher likelihood of "hard forks": we want to avoid an end state where two projects (the OpenTelemetry version, and the original version) evolve – and diverge – independently
* Higher likelihood of "concept divergence" across languages: while each language presents unique requirements and challenges, the Datadog auto-instrumentation libraries were written by a single organization with some common concepts and architectural requirements (they were also written to be OpenTracing-compatible, which greatly increases our odds of success given the similarities to OpenTelemetry)
* Datadog would also like a uniform strategy here, and this donation requires their consent (unless we want to do a hard fork, which is suboptimal for everyone). So starting with the Datadog libraries in "all but one" (or "all but two", etc) languages makes this less palatable for them
6 changes: 2 additions & 4 deletions oteps/0002-remove-spandata.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,5 @@
# Remove SpanData

**Status:** `approved`

Remove and replace SpanData by adding span start and end options.

## Motivation
Expand All @@ -24,7 +22,7 @@ I'd like to propose getting rid of SpanData and `tracer.recordSpanData()` and re

## Trade-offs and mitigations

From https://github.com/open-telemetry/opentelemetry-specification/issues/71: If the underlying SDK automatically adds tags to spans such as thread-id, stacktrace, and cpu-usage when a span is started, they would be incorrect for out of band spans as the tracer would not know the difference between in and out of band spans. This can be mitigated by indicating that the span is out of band to prevent attaching incorrect information, possibly with an `isOutOfBand()` option on `startSpan()`.
From <https://github.com/open-telemetry/opentelemetry-specification/issues/71>: If the underlying SDK automatically adds tags to spans such as thread-id, stacktrace, and cpu-usage when a span is started, they would be incorrect for out of band spans as the tracer would not know the difference between in and out of band spans. This can be mitigated by indicating that the span is out of band to prevent attaching incorrect information, possibly with an `isOutOfBand()` option on `startSpan()`.

## Prior art and alternatives

Expand All @@ -38,7 +36,7 @@ There also seems to be some hidden dependency between SpanData and the sampler A

We might want to include attributes as a start option to give the underlying sampler more information to sample with. We also might want to include optional events, which would be for bulk adding events with explicit timestamps.

We will also want to ensure, assuming the span or subtrace is being created in the same process, that the timestamps use the same precision and are monotonic.
We will also want to ensure, assuming the span or subtrace is being created in the same process, that the timestamps use the same precision and are monotonic.

## Related Issues

Expand Down
18 changes: 8 additions & 10 deletions oteps/0003-measure-metric-type.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,10 @@
# Consolidate pre-aggregated and raw metrics APIs

**Status:** `approved`

# Foreword
## Foreword

A working group convened on 8/21/2019 to discuss and debate the two metrics RFCs (0003 and 0004) and several surrounding concerns. This document has been revised with related updates that were agreed upon during this working session. See the [meeting notes](https://docs.google.com/document/d/1d0afxe3J6bQT-I6UbRXeIYNcTIyBQv4axfjKF4yvAPA/edit#).

# Overview
## Overview

Introduce a `Measure` kind of metric object that supports a `Record` API method. Like existing `Gauge` and `Cumulative` metrics, the new `Measure` metric supports pre-defined labels. A new `RecordBatch` measurement API is introduced for recording multiple metric observations simultaneously.

Expand All @@ -18,7 +16,7 @@ Since this document will be read in the future after the proposal has been writt

The preceding specification used the term `TimeSeries` to describe an instrument bound with a set of pre-defined labels. In this document, [the term "Handle" is used to describe an instrument with bound labels](0009-metric-handles.md). In a future OTEP this will be again changed to "Bound instrument". The term "Handle" is used throughout this document to refer to a bound instrument.

# Motivation
## Motivation

In the preceding `Metric.GetOrCreateTimeSeries` API for Gauges and Cumulatives, the caller obtains a `TimeSeries` handle for repeatedly recording metrics with certain pre-defined label values set. This enables an important optimization for exporting pre-aggregated metrics, since the implementation is able to compute the aggregate summary "entry" using a pointer or fast table lookup. The efficiency gain requires that the aggregation keys be a subset of the pre-defined labels.

Expand All @@ -28,7 +26,7 @@ The preceding raw statistics API did not specify support for pre-defined labels.

The preceding raw statistics API supported all-or-none recording for interdependent measurements using a common label set. This RFC introduces a `RecordBatch` API to support recording batches of measurements in a single API call, where a `Measurement` is now defined as a pair of `MeasureMetric` and `Value` (integer or floating point).

# Explanation
## Explanation

The common use for `MeasureMetric`, like the preceding raw statistics API, is for reporting information about rates and distributions over structured, numerical event data. Measure metrics are the most general-purpose of metrics. Informally, the individual metric event has a logical format expressed as one primary key=value (the metric name and a numerical value) and any number of secondary key=values (the labels, resources, and context).

Expand Down Expand Up @@ -72,7 +70,7 @@ Metric instrument Handles combine a metric instrument with a set of pre-defined
By separation of API and implementation in OpenTelemetry, we know that an implementation is free to do _anything_ in response to a metric API call. By the low-level interpretation defined above, all metric events have the same structural representation, only their logical interpretation varies according to the metric definition. Therefore, we select metric kinds based on two primary concerns:

1. What should be the default implementation behavior? Unless configured otherwise, how should the implementation treat this metric variable?
1. How will the program source code read? Each metric uses a different verb, which helps convey meaning and describe default behavior. Cumulatives have an `Add()` method. Gauges have a `Set()` method. Measures have a `Record()` method.
2. How will the program source code read? Each metric uses a different verb, which helps convey meaning and describe default behavior. Cumulatives have an `Add()` method. Gauges have a `Set()` method. Measures have a `Record()` method.

To guide the user in selecting the right kind of metric for an application, we'll consider the following questions about the primary intent of reporting given data. We use "of primary interest" here to mean information that is almost certainly useful in understanding system behavior. Consider these questions:

Expand Down Expand Up @@ -106,7 +104,7 @@ For gauge metrics, the default OpenTelemetry implementation exports the last val
Measure metrics express a distribution of measured values. This kind of metric should be used when the count or rate of events is meaningful and either:

1. The sum is of interest in addition to the count (rate)
1. Quantile information is of interest.
2. Quantile information is of interest.

The key property of a measure metric event is that computing quantiles and/or summarizing a distribution (e.g., via a histogram) may be expensive. Not only will implementations have various capabilities and algorithms for this task, users may wish to control the quality and cost of aggregating measure metrics.

Expand Down Expand Up @@ -135,7 +133,7 @@ Applications sometimes want to act upon multiple metric instruments in a single

A single measurement is defined as:

- Instrument: the measure instrument (not a Handle)
- Instrument: the measure instrument (not a Handle)
- Value: the recorded floating point or integer data

The batch measurement API uses a language-specific method name (e.g., `RecordBatch`). The entire batch of measurements takes place within a (implicit or explicit) context.
Expand All @@ -148,7 +146,7 @@ Prometheus supports the notion of vector metrics, which are those that support p

### `GetHandle` argument ordering

Argument ordering has been proposed as the way to pass pre-defined label values in `GetHandle`. The argument list must match the parameter list exactly, and if it doesn't we generally find out at runtime or not at all. This model has more optimization potential, but is easier to misuse than the alternative. The alternative approach is to always pass label:value pairs to `GetOrCreateTimeseries`, as opposed to an ordered list of values.
Argument ordering has been proposed as the way to pass pre-defined label values in `GetHandle`. The argument list must match the parameter list exactly, and if it doesn't we generally find out at runtime or not at all. This model has more optimization potential, but is easier to misuse than the alternative. The alternative approach is to always pass label:value pairs to `GetOrCreateTimeseries`, as opposed to an ordered list of values.

### `RecordBatch` argument ordering

Expand Down
2 changes: 1 addition & 1 deletion oteps/0005-global-init.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Global SDK initialization

*Status: proposed*
**Status**: proposed

Specify the behavior of OpenTelemetry APIs and implementations at startup.

Expand Down
Loading

0 comments on commit bd44a75

Please sign in to comment.