From 7cfc0b981dd3d0d5ca4753adbf7d5b90f6e7b4a4 Mon Sep 17 00:00:00 2001 From: Alex Boten Date: Fri, 21 Oct 2022 10:19:24 -0700 Subject: [PATCH] [docs] update design.md doc (#6359) * [docs] update design.md doc Started looking at various documents under the `docs` directory that are somewhat outdatted. This PR updates the design document. * Apply suggestions from code review Co-authored-by: Anthony Mirabella Co-authored-by: Anthony Mirabella --- docs/design.md | 105 ++++++++++++++++++------------------------------- 1 file changed, 38 insertions(+), 67 deletions(-) diff --git a/docs/design.md b/docs/design.md index ca2f9ec3829b..64df8e11ad68 100644 --- a/docs/design.md +++ b/docs/design.md @@ -1,27 +1,33 @@ # OpenTelemetry Collector Architecture -This document describes the architecture design and implementation of +This document describes the architecture design and implementation of the OpenTelemetry Collector. ## Summary -OpenTelemetry Collector is an executable that allows to receive telemetry data, optionally transform it and send the data further. +OpenTelemetry Collector is an executable that can receive telemetry data, optionally process it, and export it further. -The Collector supports several popular open-source protocols for telemetry data receiving and sending as well as offering a pluggable architecture for adding more protocols. +The Collector supports several popular open-source protocols for receiving and sending telemetry data as well as offering a pluggable architecture for adding more protocols. -Data receiving, transformation and sending is done using Pipelines. The Collector can be configured to have one or more Pipelines. Each Pipeline includes a set of Receivers that receive the data, a series of optional Processors that get the data from receivers and transform it and a set of Exporters which get the data from the Processors and send it further outside the Collector. The same receiver can feed data to multiple Pipelines and multiple pipelines can feed data into the same Exporter. +Data receiving, processing, and exporting is done using [Pipelines](#pipelines). The Collector can be configured to have one or more pipelines. Each pipeline includes: + +- a set of [Receivers](#receivers) that receive the data +- a series of optional [Processors](#processors) that get the data from receivers and process it +- a set of [Exporters](#exporters) which get the data from processors and send it further outside the Collector. + +The same receiver can be included in multiple pipelines and multiple pipelines can include the same Exporter. ## Pipelines -Pipeline defines a path the data follows in the Collector starting from reception, then further processing or modification and finally exiting the Collector via exporters. +A pipeline defines a path the data follows in the Collector starting from reception, then further processing or modification and finally exiting the Collector via exporters. -Pipelines can operate on 3 telemetry data types: traces, metrics and logs. The data type is a property of the pipeline defined by its configuration. Receivers, exporters and processors used in a pipeline must support the particular data type otherwise `ErrDataTypeIsNotSupported` will be reported when the configuration is loaded. A pipeline can be depicted the following way: +Pipelines can operate on 3 telemetry data types: traces, metrics, and logs. The data type is a property of the pipeline defined by its configuration. Receivers, processors, and exporters used in a pipeline must support the particular data type otherwise `ErrDataTypeIsNotSupported` will be reported when the configuration is loaded. A pipeline can be depicted the following way: ![Pipelines](images/design-pipelines.png) -There can be one or more receivers in a pipeline. Data from all receivers is pushed to the first processor, which performs a processing on it and then pushes it to the next processor (or it may drop the data, e.g. if it is a “sampling” processor) and so on until the last processor in the pipeline pushes the data to the exporters. Each exporter gets a copy of each data element. The last processor uses a `FanOutConnector` to fan out the data to multiple exporters. +There can be one or more receivers in a pipeline. Data from all receivers is pushed to the first processor, which performs a processing on it and then pushes it to the next processor (or it may drop the data, e.g. if it is a “sampling” processor) and so on until the last processor in the pipeline pushes the data to the exporters. Each exporter gets a copy of each data element. The last processor uses a `fanoutconsumer` to fan out the data to multiple exporters. -The pipeline is constructed during Collector startup based on pipeline definition in the config file. +The pipeline is constructed during Collector startup based on pipeline definition in the configuration. A pipeline configuration typically looks like this: @@ -44,42 +50,44 @@ Receivers typically listen on a network port and receive telemetry data. Usually ```yaml receivers: - opencensus: - endpoint: "0.0.0.0:55678" + otlp: + protocols: + grpc: + endpoint: localhost:4317 service: pipelines: traces: # a pipeline of “traces” type - receivers: [opencensus] + receivers: [otlp] processors: [memory_limiter, batch] exporters: [jaeger] traces/2: # another pipeline of “traces” type - receivers: [opencensus] + receivers: [otlp] processors: [batch] exporters: [opencensus] ``` -In the above example “opencensus” receiver will send the same data to pipeline “traces” and to pipeline “traces/2”. (Note: the configuration uses composite key names in the form of `type[/name]` as defined in [this document](https://docs.google.com/document/d/1NeheFG7DmcUYo_h2vLtNRlia9x5wOJMlV4QKEK05FhQ/edit#)). +In the above example `otlp` receiver will send the same data to pipeline `traces` and to pipeline `traces/2`. (Note: the configuration uses composite key names in the form of `type[/name]` as defined in [this document](https://docs.google.com/document/d/1NeheFG7DmcUYo_h2vLtNRlia9x5wOJMlV4QKEK05FhQ/edit#)). When the Collector loads this config the result will look like this (part of processors and exporters are omitted from the diagram for brevity): ![Receivers](images/design-receivers.png) -Important: when the same receiver is referenced in more than one pipeline the Collector will create only one receiver instance at runtime that will send the data to `FanOutConnector` which in turn will send the data to the first processor of each pipeline. The data propagation from receiver to `FanOutConnector` and then to processors is via synchronous function call. This means that if one processor blocks the call the other pipelines that are attached to this receiver will be blocked from receiving the same data and the receiver itself will stop processing and forwarding newly received data. +Important: when the same receiver is referenced in more than one pipeline the Collector will create only one receiver instance at runtime that will send the data to a fan out consumer, which in turn will send the data to the first processor of each pipeline. The data propagation from receiver to the fan out consumer and then to processors is done via a synchronous function call. This means that if one processor blocks the call the other pipelines that are attached to this receiver will be blocked from receiving the same data and the receiver itself will stop processing and forwarding newly received data. ### Exporters -Exporters typically forward the data they get to a destination on a network (but they can also send it elsewhere, e.g “logging” exporter writes the telemetry data to a local file). +Exporters typically forward the data they get to a destination on a network (but they can also send it elsewhere, e.g `logging` exporter writes the telemetry data to the logging destination). -The configuration allows to have multiple exporters of the same type, even in the same pipeline. For example one can have 2 “opencensus” exporters defined each one sending to a different opencensus endpoint, e.g.: +The configuration allows to have multiple exporters of the same type, even in the same pipeline. For example one can have 2 `otlp` exporters defined each one sending to a different OTLP endpoint, e.g.: ```yaml exporters: - opencensus/1: - endpoint: "example.com:14250" - opencensus/2: - endpoint: "0.0.0.0:14250" + otlp/1: + endpoint: example.com:4317 + otlp/2: + endpoint: localhost:14317 ``` Usually an exporter gets the data from one pipeline, however it is possible to configure multiple pipelines to send data to the same exporter, e.g.: @@ -89,7 +97,7 @@ exporters: jaeger: protocols: grpc: - endpoint: "0.0.0.0:14250" + endpoint: localhost:14250 service: pipelines: @@ -103,7 +111,7 @@ service: exporters: [jaeger] ``` -In the above example “jaeger” exporter will get data from pipeline “traces” and from pipeline “traces/2”. When the Collector loads this config the result will look like this (part of processors and receivers are omitted from the diagram for brevity): +In the above example `jaeger` exporter will get data from pipeline `traces` and from pipeline `traces/2`. When the Collector loads this config the result will look like this (part of processors and receivers are omitted from the diagram for brevity): ![Exporters](images/design-exporters.png) @@ -112,9 +120,9 @@ In the above example “jaeger” exporter will get data from pipeline “traces A pipeline can contain sequentially connected processors. The first processor gets the data from one or more receivers that are configured for the pipeline, the last processor sends the data to one or more exporters that are configured for the pipeline. All processors between the first and last receive the data strictly only from one preceding processor and send data strictly only to the succeeding processor. -Processors can transform the data before forwarding it (i.e. add or remove attributes from spans), they can drop the data simply by deciding not to forward it (this is for example how “sampling” processor works), they can also generate new data (this is how for example how a “persistent-queue” processor can work after Collector restarts by reading previously saved data from a local file and forwarding it on the pipeline). +Processors can transform the data before forwarding it (i.e. add or remove attributes from spans), they can drop the data simply by deciding not to forward it (this is for example how the `probabilisticsampler` processor works), they can also generate new data. This is how a `spanmetrics` processor can produce metrics for spans processed by the pipeline. -The same name of the processor can be referenced in the “processors” key of multiple pipelines. In this case the same configuration will be used for each of these processors however each pipeline will always get its own instance of the processor. Each of these processors will have its own state, the processors are never shared between pipelines. For example if “batch” processor is used in several pipelines each pipeline will have its own batch processor (although the batch processor will be configured exactly the same way if the reference the same key in the config file). As an example, given the following config: +The same name of the processor can be referenced in the `processors` key of multiple pipelines. In this case the same configuration will be used for each of these processors however each pipeline will always get its own instance of the processor. Each of these processors will have its own state, the processors are never shared between pipelines. For example if `batch` processor is used in several pipelines each pipeline will have its own batch processor (although each batch processor will be configured exactly the same way if they reference the same key in the configuration). As an example, given the following configuration: ```yaml processors: @@ -139,13 +147,13 @@ When the Collector loads this config the result will look like this: ![Processors](images/design-processors.png) -Note that each “batch” processor is an independent instance, although both are configured the same way, i.e. each have a send_batch_size of 10000. +Note that each `batch` processor is an independent instance, although both are configured the same way, i.e. each have a `send_batch_size` of 10000. ## Running as an Agent On a typical VM/container, there are user applications running in some processes/pods with OpenTelemetry Library (Library). Previously, Library did -all the recording, collecting, sampling and aggregation on spans/stats/metrics, +all the recording, collecting, sampling and aggregation on traces/metrics/logs, and exported them to other persistent storage backends via the Library exporters, or displayed them on local zpages. This pattern has several drawbacks, for example: @@ -166,7 +174,7 @@ drawbacks, for example: To resolve the issues above, you can run OpenTelemetry Collector as an Agent. The Agent runs as a daemon in the VM/container and can be deployed independent of Library. Once Agent is deployed and running, it should be able to retrieve -spans/stats/metrics from Library, export them to other backends. We MAY also +traces/metrics/logs from Library, export them to other backends. We MAY also give Agent the ability to push configurations (e.g sampling probability) to Library. For those languages that cannot do stats aggregation in process, they should also be able to send raw measurements and have Agent do the aggregation. @@ -175,13 +183,13 @@ should also be able to send raw measurements and have Agent do the aggregation. ![agent-architecture](images/design-collector-agent.png) For developers/maintainers of other libraries: Agent can also -accept spans/stats/metrics from other tracing/monitoring libraries, such as +accept traces/metrics/logs from other tracing/monitoring libraries, such as Zipkin, Prometheus, etc. This is done by adding specific receivers. See [Receivers](#receivers) for details. -## Running as a Standalone Collector +## Running as a Gateway -The OpenTelemetry Collector can run as a Standalone instance and receives spans +The OpenTelemetry Collector can run as a Gateway instance and receives spans and metrics exported by one or more Agents or Libraries, or by tasks/agents that emit in one of the supported protocols. The Collector is configured to send data to the configured exporter(s). The following figure @@ -193,40 +201,3 @@ summarizes the deployment architecture: The OpenTelemetry Collector can also be deployed in other configurations, such as receiving data from other agents or clients in one of the formats supported by its receivers. - - -### OpenCensus Protocol - -TODO: move this section somewhere else since this document is intended to describe non-protocol specific functionality. - -OpenCensus Protocol uses a bi-directional gRPC -stream. Sender should initiate the connection, since there’s only one -dedicated port for Agent, while there could be multiple instrumented processes. By default, the Collector is available on port 55678. - -#### Protocol Workflow - -1. Sender will try to directly establish connections for Config and Export - streams. -2. As the first message in each stream, Sender must send its identifier. Each - identifier should uniquely identify Sender within the VM/container. If - there is no identifier in the first message, Collector should drop the whole - message and return an error to the client. In addition, the first message - MAY contain additional data (such as `Span`s). As long as it has a valid - identifier associated, Collector should handle the data properly, as if they - were sent in a subsequent message. Identifier is no longer needed once the - streams are established. -3. On Sender side, if connection to Collector failed, Sender should retry - indefinitely if possible, subject to available/configured memory buffer size. - (Reason: consider environments where the running applications are already - instrumented with OpenTelemetry Library but Collector is not deployed yet. - Sometime in the future, we can simply roll out the Collector to those - environments and Library would automatically connect to Collector with - indefinite retries. Zero changes are required to the applications.) - Depending on the language and implementation, retry can be done in either - background or a daemon thread. Retry should be performed at a fixed - frequency (rather than exponential backoff) to have a deterministic expected - connect time. -4. On Collector side, if an established stream were disconnected, the identifier of - the corresponding Sender would be considered expired. Sender needs to - start a new connection with a unique identifier (MAY be different than the - previous one).