Data Prepper ingests Trace Analytics into OpenSearch and Amazon OpenSearch Service. Data Prepper is a last mile server-side component which collects telemetry data from AWS Distro OpenTelemetry collector or OpenTelemetry collector and transforms it for OpenSearch. The transformed trace data is the visualized using the Trace Analytics OpenSearch Dashboards plugin, which provides at-a-glance visibility into your application performance, along with the ability to drill down on individual traces.
Here is how all the components work in trace analytics:
In your service environment you will have to run OpenTelemetry Collector. You can run it as a sidecar or daemonset for EKS, a sidecar for ECS, or an agent on EC2. You should configure the collector to export trace data to Data Prepper. You will then have to deploy Data Prepper as an intermediate component and configure it to send the enriched trace data to your OpenSearch cluster or Amazon OpenSearch Service domain. Then using OpenSearch Dashboards you can visualize and detect problems in your distributed applications.
To achieve trace analytics in Data Prepper we have three pipelines otel-trace-pipeline
, raw-trace-pipeline
and service-map-pipeline
The OpenTelemetry source accepts trace data from the OpenTelemetry collector. The source depends on OpenTelemetry Protocol. The source officially support transport over gRPC. The source also supports industry-standard encryption (TLS/HTTPS).
We have two processor for the Trace Analytics feature,
- otel_trace_raw - This is a processor that receives collection of Span records sent from otel-trace-source, does stateful processing on extracting and filling-in trace group related fields.
- otel_trace_group - This is a processor that fills in the missing trace group related fields in the collection of Span records by looking up the opensearch backend.
- service_map_stateful - This processor performs the required preprocessing on the trace data and build metadata to display the service-map OpenSearch Dashboards dashboards.
We have a generic sink that writes the data to OpenSearch as the destination. The opensearch sink has configuration options related to OpenSearch cluster like endpoint, SSL/Username, index name, index template, index state management, etc. For the trace analytics feature, the sink has specific configurations which enables the sink to use indices and index templates specific to this feature. Trace analytics specific OpenSearch indices are,
- otel-v1-apm-span - This index stores the output from otel-trace-raw-processor.
- otel-v1-apm-service-map - This index stores the output from the service-map-processor.
Example otel-trace-source
with SSL and Basic Authentication enabled. Note that you will have to change your otel-collector-config.yaml
accordingly:
source:
otel_trace_source:
ssl: true
sslKeyCertChainFile: "/full/path/to/certfile.crt"
sslKeyFile: "/full/path/to/keyfile.key"
authentication:
http_basic:
username: "my-user"
password: "my_s3cr3t"
Example pipeline.yaml
without SSL and Basic Authentication for the otel-trace-source
:
otel-trace-pipeline:
# workers is the number of threads processing data in each pipeline.
# We recommend same value for all pipelines.
# default value is 1, set a value based on the machine you are running Data Prepper
workers: 8
# delay in milliseconds is how often the worker threads should process data.
# Recommend not to change this config as we want the otel-trace-pipeline to process as quick as possible
# default value is 3_000 ms
delay: "100"
source:
otel_trace_source:
ssl: false # Change this to enable encryption in transit
authentication:
unauthenticated:
buffer:
bounded_blocking:
# buffer_size is the number of ExportTraceRequest from otel-collector the data prepper should hold in memeory.
# We recommend to keep the same buffer_size for all pipelines.
# Make sure you configure sufficient heap
# default value is 512
buffer_size: 512
# This is the maximum number of request each worker thread will process within the delay.
# Default is 8.
# Make sure buffer_size >= workers * batch_size
batch_size: 8
sink:
- pipeline:
name: "raw-pipeline"
- pipeline:
name: "service-map-pipeline"
raw-pipeline:
# Configure same as the otel-trace-pipeline
workers: 8
# We recommend using the default value for the raw-pipeline.
delay: "3000"
source:
pipeline:
name: "otel-trace-pipeline"
buffer:
bounded_blocking:
# Configure the same value as in otel-trace-pipeline
# Make sure you configure sufficient heap
# default value is 512
buffer_size: 512
# The raw processor does bulk request to your OpenSearch sink, so configure the batch_size higher.
# If you use the recommended otel-collector setup each ExportTraceRequest could contain max 50 spans. https://github.com/opensearch-project/data-prepper/tree/v0.7.x/deployment/aws
# With 64 as batch size each worker thread could process upto 3200 spans (64 * 50)
batch_size: 64
processor:
- otel_trace_raw:
- otel_trace_group:
hosts: [ "https://localhost:9200" ]
# Change to your credentials
username: "admin"
password: "admin"
# Add a certificate file if you are accessing an OpenSearch cluster with a self-signed certificate
#cert: /path/to/cert
# If you are connecting to an Amazon OpenSearch Service domain without
# Fine-Grained Access Control, enable these settings. Comment out the
# username and password above.
#aws_sigv4: true
#aws_region: us-east-1
sink:
- opensearch:
hosts: [ "https://localhost:9200" ]
index_type: trace-analytics-raw
# Change to your credentials
username: "admin"
password: "admin"
# Add a certificate file if you are accessing an OpenSearch cluster with a self-signed certificate
#cert: /path/to/cert
# If you are connecting to an Amazon OpenSearch Service domain without
# Fine-Grained Access Control, enable these settings. Comment out the
# username and password above.
#aws_sigv4: true
#aws_region: us-east-1
service-map-pipeline:
workers: 8
delay: "100"
source:
pipeline:
name: "otel-trace-pipeline"
processor:
- service_map_stateful:
# The window duration is the maximum length of time the data prepper stores the most recent trace data to evaluvate service-map relationships.
# The default is 3 minutes, this means we can detect relationships between services from spans reported in last 3 minutes.
# Set higher value if your applications have higher latency.
window_duration: 180
buffer:
bounded_blocking:
# buffer_size is the number of ExportTraceRequest from otel-collector the data prepper should hold in memeory.
# We recommend to keep the same buffer_size for all pipelines.
# Make sure you configure sufficient heap
# default value is 512
buffer_size: 512
# This is the maximum number of request each worker thread will process within the delay.
# Default is 8.
# Make sure buffer_size >= workers * batch_size
batch_size: 8
sink:
- opensearch:
hosts: [ "https://localhost:9200" ]
index_type: trace-analytics-service-map
# Change to your credentials
username: "admin"
password: "admin"
# Add a certificate file if you are accessing an OpenSearch cluster with a self-signed certificate
#cert: /path/to/cert
# If you are connecting to an Amazon OpenSearch Service domain without
# Fine-Grained Access Control, enable these settings. Comment out the
# username and password above.
#aws_sigv4: true
#aws_region: us-east-1
You will need to modify the configuration above for your OpenSearch cluster. Note that it has two
opensearch
sinks which need to be modified.
The main changes you will need to make are:
hosts
- Set to your hostsusername
- Provide the OpenSearch usernamepassword
- Provide your OpenSearch passwordaws_sigv4
- If you are Amazon OpenSearch Service with AWS signing, set this totrue
. It will sign requests with the default AWS credentials provider.aws_region
- If you are Amazon OpenSearch Service with AWS signing, set this value to your region.
The the Data Prepper OpenSearch Sink documents other configurations available for OpenSearch.
You will have to run OpenTelemetry collector in your service environment. You can find the installation guide of OpenTelemetry collector here. Please ensure you that you configure the collector with an exporter configured to your Data Prepper. Below is an example otel-collector-config.yaml
that receives data from various instrumentations and export it to Data Prepper.
receivers:
jaeger:
protocols:
grpc:
otlp:
protocols:
grpc:
zipkin:
processors:
batch/traces:
timeout: 1s
send_batch_size: 50
exporters:
otlp/data-prepper:
endpoint: localhost:21890
tls:
insecure: true
service:
pipelines:
traces:
receivers: [jaeger, otlp, zipkin]
processors: [batch/traces]
exporters: [otlp/data-prepper]
Configure your application to use the OpenTelemetry Collector.
You will normally run the OpenTelemetry Collector alongside your application.
The OpenSearch Trace Analytics documentation provides additional details on configuring OpenSearch for viewing trace analytics. In particular, it documents how to use OpenSearch Dashboards.
The Trace Tuning page has information to help you tune and scale Data Prepper for trace analytics use cases.