Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blog: Blog post for scaling OpenTelemetry Collectors using Ansible #4140

Closed
wants to merge 57 commits into from
Closed
Changes from 2 commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
5f481b6
Create scaling-opentelemetry-collectors.md
ishanjainn Mar 12, 2024
87e7f5a
update for lint failure
ishanjainn Mar 12, 2024
c40a304
Update content/en/blog/2024/scaling-opentelemetry-collectors.md
ishanjainn Mar 13, 2024
983f5da
Update content/en/blog/2024/scaling-opentelemetry-collectors.md
ishanjainn Mar 13, 2024
714cb43
Update content/en/blog/2024/scaling-opentelemetry-collectors.md
ishanjainn Mar 13, 2024
270ff5e
Update content/en/blog/2024/scaling-opentelemetry-collectors.md
ishanjainn Mar 13, 2024
a640df7
Update content/en/blog/2024/scaling-opentelemetry-collectors.md
ishanjainn Mar 13, 2024
6561ace
Update content/en/blog/2024/scaling-opentelemetry-collectors.md
ishanjainn Mar 13, 2024
0faa4c0
Update content/en/blog/2024/scaling-opentelemetry-collectors.md
ishanjainn Mar 13, 2024
41e7613
Update content/en/blog/2024/scaling-opentelemetry-collectors.md
ishanjainn Mar 13, 2024
c38add7
Update content/en/blog/2024/scaling-opentelemetry-collectors.md
ishanjainn Mar 13, 2024
c172a11
Update content/en/blog/2024/scaling-opentelemetry-collectors.md
ishanjainn Mar 13, 2024
2482bbd
Update content/en/blog/2024/scaling-opentelemetry-collectors.md
ishanjainn Mar 13, 2024
d3e6b35
Update content/en/blog/2024/scaling-opentelemetry-collectors.md
ishanjainn Mar 13, 2024
7d65f4a
indent
ishanjainn Mar 13, 2024
91c3d6d
Merge branch 'main' into patch-4
ishanjainn Mar 13, 2024
9a5360b
Update content/en/blog/2024/scaling-opentelemetry-collectors.md
ishanjainn Mar 13, 2024
6c037f0
Update content/en/blog/2024/scaling-opentelemetry-collectors.md
ishanjainn Mar 13, 2024
ecb5e26
Update content/en/blog/2024/scaling-opentelemetry-collectors.md
ishanjainn Mar 13, 2024
9e8c6f2
Update content/en/blog/2024/scaling-opentelemetry-collectors.md
ishanjainn Mar 13, 2024
1c2c85a
Update content/en/blog/2024/scaling-opentelemetry-collectors.md
ishanjainn Mar 13, 2024
bcb7d03
Update content/en/blog/2024/scaling-opentelemetry-collectors.md
ishanjainn Mar 13, 2024
174e230
Update content/en/blog/2024/scaling-opentelemetry-collectors.md
ishanjainn Mar 13, 2024
c3de73c
Update content/en/blog/2024/scaling-opentelemetry-collectors.md
ishanjainn Mar 13, 2024
eb75dd3
Update content/en/blog/2024/scaling-opentelemetry-collectors.md
ishanjainn Mar 13, 2024
4df80d8
ansible prerequisite
ishanjainn Mar 13, 2024
72be345
partial fix, Config yet to be tested
ishanjainn Mar 13, 2024
631be4e
fix config
ishanjainn Mar 13, 2024
80ccad5
update Inventory
ishanjainn Mar 13, 2024
12a622f
ansible-config updates
ishanjainn Mar 13, 2024
77c776e
update doc with grafana steps
ishanjainn Mar 13, 2024
e44d39b
cspell fixes
ishanjainn Mar 13, 2024
bad97df
linter
ishanjainn Mar 13, 2024
ff5947a
`npm run fix:format`
ishanjainn Mar 13, 2024
b840872
npm run fix:format again
ishanjainn Mar 13, 2024
936eb0f
Merge branch 'main' into patch-4
ishanjainn Mar 13, 2024
9a209f4
Merge branch 'main' into patch-4
ishanjainn Mar 15, 2024
b77c266
remove canonical URL
patcher9 Mar 19, 2024
db991fe
Auto-update registry versions (90129530fc2097d9835f54da9df1ad09ff26ee…
opentelemetrybot Mar 15, 2024
89fb41e
Add Causely to vendors.yaml (#4158)
esara Mar 16, 2024
8ed906d
Auto-update registry versions (0ca8a0a58be7073c4a72f032f44b5b249615d7…
opentelemetrybot Mar 16, 2024
3fd8461
spring starter can now use all sdk autoconfig properties (#4167)
zeitlinger Mar 16, 2024
7fd62b6
Update opentelemetry-java-instrumentation version to v2.2.0 (#4164)
opentelemetrybot Mar 16, 2024
051e607
Add paymentServiceFailure & paymentServiceUnreachable featureflags to…
EislM0203 Mar 16, 2024
b3191c5
Update opentelemetry-specification version to v1.31.0 (#4157)
opentelemetrybot Mar 16, 2024
04d5cad
Troubleshooting added to check available components in the collector …
nerudadhich Mar 16, 2024
faf8784
Demo Docs: Update the name of recommendation cache feature flag (#4159)
cthain Mar 16, 2024
2b3708d
Update kubernetes-deployment.md (#4160)
julianocosta89 Mar 16, 2024
4a8e80c
Update architecture.md (#4162)
julianocosta89 Mar 16, 2024
a89d5be
spring boot build info resource detector (#3999)
zeitlinger Mar 16, 2024
8aeb024
how to enable Resource Providers that are disabled by default (#4138)
zeitlinger Mar 16, 2024
12dcb6d
Bump @opentelemetry/auto-instrumentations-web from 0.36.0 to 0.37.0 (…
dependabot[bot] Mar 16, 2024
ccc2dd3
Create opentelemetry-announced-support-for-profiling (#4173)
cartermp Mar 18, 2024
ac41348
Update author (it was austin, I am just his sockpuppet) (#4175)
cartermp Mar 18, 2024
33995b8
Adds Chronosphere to vendors list (#4179)
subvocal Mar 19, 2024
c24a755
Auto-update registry versions (2f7aab49799161a841fae7f82b536e6df0759a…
opentelemetrybot Mar 19, 2024
db6ea32
Remove mention of spanmetrics processor (#4178)
tiffany76 Mar 19, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
212 changes: 212 additions & 0 deletions content/en/blog/2024/scaling-opentelemetry-collectors.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,212 @@
---
title:
Manage OpenTelemetry Collectors at Scale with Ansible
linkTitle: Ansible role for OpenTelemetry Collector
ishanjainn marked this conversation as resolved.
Show resolved Hide resolved
date: 2024-03-12
author: '[Ishan Jain](https://github.com/ishanjainn) (Grafana)'

Check warning on line 6 in content/en/blog/2024/scaling-opentelemetry-collectors.md

View workflow job for this annotation

GitHub Actions / SPELLING check

Unknown word (Ishan)
ishanjainn marked this conversation as resolved.
Show resolved Hide resolved
draft: true # TODO: remove this line once your post is ready to be published
ishanjainn marked this conversation as resolved.
Show resolved Hide resolved
# canonical_url: http://somewhere.else/ # This will be added in future
---

This guide is focused on scaling the OpenTelemetry Collector deployment across various Linux hosts by leveraging Ansible, to function both as gateways and agents within your observability architecture. Utilizing the OpenTelemetry Collector in this dual capacity enables a robust collection and forwarding of metrics, traces, and logs to analysis and visualization platforms, such as Grafana Cloud.
ishanjainn marked this conversation as resolved.
Show resolved Hide resolved

Here, we outline a strategy for deploying and managing the OpenTelemetry Collector's scalable instances throughout your infrastructure with Ansible, enhancing your overall monitoring strategy and data visualization capabilities in Grafana Cloud.

## Before You Begin
ishanjainn marked this conversation as resolved.
Show resolved Hide resolved

To follow this guide, ensure you have:

- Linux hosts.
- SSH access to each of these Linux hosts.
- Account permissions to install and configure the OpenTelemetry Collector on these hosts.

## Install the Grafana Ansible collection

The [Grafana Agent role](https://github.com/grafana/grafana-ansible-collection/tree/main/roles/grafana_agent) is available in the Grafana Ansible collection as of the 1.1.0 release.
ishanjainn marked this conversation as resolved.
Show resolved Hide resolved

To install the Grafana Ansible collection, run this command:

```
ansible-galaxy collection install grafana.grafana
```

## Create an Ansible inventory file

Next, you will set up your hosts and create an inventory file.

1. Create your hosts and add public SSH keys to them.

This example uses eight Linux hosts: two Ubuntu hosts, two CentOS hosts, two Fedora hosts, and two Debian hosts.

1. Create an Ansible inventory file.
ishanjainn marked this conversation as resolved.
Show resolved Hide resolved

The Ansible inventory, which resides in a file named `inventory`, looks similar to this:

```
146.190.208.216 # hostname = ubuntu-01
146.190.208.190 # hostname = ubuntu-02
137.184.155.128 # hostname = centos-01
146.190.216.129 # hostname = centos-02
198.199.82.174 # hostname = debian-01
198.199.77.93 # hostname = debian-02
143.198.182.156 # hostname = fedora-01
143.244.174.246 # hostname = fedora-02
ishanjainn marked this conversation as resolved.
Show resolved Hide resolved
```

> **Note**: If you are copying the above file, remove the comments (#).
ishanjainn marked this conversation as resolved.
Show resolved Hide resolved

1. Create an `ansible.cfg` file within the same directory as `inventory`, with the following values:

Check warning on line 58 in content/en/blog/2024/scaling-opentelemetry-collectors.md

View workflow job for this annotation

GitHub Actions / SPELLING check

Unknown word (ansible)
ishanjainn marked this conversation as resolved.
Show resolved Hide resolved
```
[defaults]
inventory = inventory # Path to the inventory file
private_key_file = ~/.ssh/id_rsa # Path to my private SSH Key
remote_user=root
```

## Use the OpenTelemetry Collector Ansible Role
ishanjainn marked this conversation as resolved.
Show resolved Hide resolved

Next, you'll define an Ansible playbook to apply your chosen or created OpenTelemetry Collector role across your hosts.

Create a file named `deploy-opentelemetry.yml` in the same directory as your `ansible.cfg` and `inventory`.

Check warning on line 70 in content/en/blog/2024/scaling-opentelemetry-collectors.md

View workflow job for this annotation

GitHub Actions / SPELLING check

Unknown word (ansible)

```yaml
- name: Install OpenTelemetry Collector
hosts: all
become: true

vars:
grafana_cloud_api_key: <Your Grafana.com API Key> # Example - eyxxxxxxxx
metrics_username: <prometheus-username> # Example - 825019
logs_username: <loki-username> # Example - 411478
prometheus_url: <prometheus-push-url> # Example - https://prometheus-us-central1.grafana.net/api/prom/push
loki_url: <loki-push-url> # Example - https://logs-prod-017.grafana.net/loki/api/v1/push
tempo_url: <tempo-push-url> # Example - tempo-prod-04-prod-us-east-0.grafana.net:443
traces_username: <tempo-username> # Example - 411478
ishanjainn marked this conversation as resolved.
Show resolved Hide resolved

tasks:
- name: Install OpenTelemetry Collector
ansible.builtin.include_role:
name: grafana.grafana.opentelemetry_collector
vars:
otel_collector_extensions:
basicauth/grafana_cloud_tempo:
# https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/extension/basicauthextension
client_auth:
username: "{{ traces_username }}"
password: "{{ grafana_cloud_api_key }}"
basicauth/grafana_cloud_prometheus:
client_auth:
username: "{{ prometheus_url }}"
password: "{{ grafana_cloud_api_key }}"
basicauth/grafana_cloud_loki:
client_auth:
username: "{{ logs_username }}"
password: "{{ grafana_cloud_api_key }}"
ishanjainn marked this conversation as resolved.
Show resolved Hide resolved


otel_collector_receivers:
otlp:
# https://github.com/open-telemetry/opentelemetry-collector/tree/main/receiver/otlpreceiver
protocols:
grpc:
http:
hostmetrics:
collection_interval: 60s
scrapers:
cpu: {}
disk: {}
load: {}
filesystem: {}
memory: {}
network: {}
paging: {}
process:
mute_process_name_error: true
mute_process_exe_error: true
mute_process_io_error: true
processes: {}
otel_collector_processors:
batch:
# https://github.com/open-telemetry/opentelemetry-collector/tree/main/processor/batchprocessor
resourcedetection:
detectors: [env, system] # Before system detector, include ec2 for AWS, gcp for GCP and azure for Azure.
# Using OTEL_RESOURCE_ATTRIBUTES envvar, env detector adds custom labels.
timeout: 2s
system:
hostname_sources: [os] # alternatively, use [dns,os] for setting FQDN as host.name and os as fallback
transform/add_resource_attributes_as_metric_attributes:
# https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/transformprocessor
error_mode: ignore
metric_statements:
- context: datapoint
statements:
- set(attributes["deployment.environment"], resource.attributes["deployment.environment"])
- set(attributes["service.version"], resource.attributes["service.version"])

otel_collector_exporters:
otlp/grafana_cloud_traces:
# https://github.com/open-telemetry/opentelemetry-collector/tree/main/exporter/otlpexporter
endpoint: "{{ tempo_url }}"
auth:
authenticator: basicauth/grafana_cloud_tempo
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace this with jaeger, or use a debug exporter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed logs and traces and made the config focused on host metrics


loki/grafana_cloud_logs:
# https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/lokiexporter
endpoint: "{{ loki_url }}"
auth:
authenticator: basicauth/grafana_cloud_loki
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace this with a debug exporter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed logs and traces and made the config focused on host metrics


prometheusremotewrite/grafana_cloud_metrics:
# https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/prometheusremotewriteexporter
endpoint: "{{ prometheus_url }}"
add_metric_suffixes: false
auth:
authenticator: basicauth/grafana_cloud_prometheus
svrnm marked this conversation as resolved.
Show resolved Hide resolved


otel_collector_service:
extensions: [basicauth/grafana_cloud_tempo, basicauth/grafana_cloud_prometheus, basicauth/grafana_cloud_loki]
ishanjainn marked this conversation as resolved.
Show resolved Hide resolved
pipelines:
traces:
receivers: [otlp]
processors: [resourcedetection, batch]
exporters: [otlp/grafana_cloud_traces]
metrics:
receivers: [otlp, hostmetrics]
processors: [resourcedetection, transform/add_resource_attributes_as_metric_attributes, batch]
exporters: [prometheusremotewrite/grafana_cloud_metrics]
logs:
receivers: [otlp]
processors: [resourcedetection, batch]
exporters: [loki/grafana_cloud_logs]
```

> **Note:** You'll need to adjust the configuration to match the specific telemetry data you intend to collect and where you plan to forward it. The configuration snippet above is a basic example designed for traces, logs and metrics collection via OTLP and forwarding to Grafana Cloud.
ishanjainn marked this conversation as resolved.
Show resolved Hide resolved

The previous configuration would provision the OpenTelemetry Collector to collect host metrics from the Linux host.

## Running the Ansible Playbook
ishanjainn marked this conversation as resolved.
Show resolved Hide resolved

Deploy the OpenTelemetry Collector across your hosts by executing:

```sh
ansible-playbook deploy-opentelemetry.yml
```

## Verifying Data Ingestion into Grafana Cloud
ishanjainn marked this conversation as resolved.
Show resolved Hide resolved

Once you've deployed the OpenTelemetry Collector and configured it to forward data to Grafana Cloud, you can verify the ingestion:

- Log into your Grafana Cloud instance.
- Navigate to the **Explore** section.
- Select your Grafana Cloud Prometheus data source from the dropdown menu.
- Execute a query to confirm the reception of metrics, e.g., `{instance="ubuntu-01"}` for a specific host's metrics.
ishanjainn marked this conversation as resolved.
Show resolved Hide resolved

## Visualizing Metrics and Logs in Grafana
ishanjainn marked this conversation as resolved.
Show resolved Hide resolved

With data successfully ingested into Grafana Cloud, you can create custom dashboards to visualize the metrics, logs and traces received from your OpenTelemetry Collector. Utilize Grafana's powerful query builder and visualization tools to derive insights from your data effectively.
ishanjainn marked this conversation as resolved.
Show resolved Hide resolved

- Consider creating dashboards that offer a comprehensive overview of your infrastructure's health and performance.
- Utilize Grafana's alerting features to proactively manage and respond to issues identified through the OpenTelemetry data.

This guide simplifies the deployment of the OpenTelemetry Collector across multiple Linux hosts using Ansible and illustrates how to visualize collected telemetry data in Grafana Cloud. Tailor the Ansible roles, OpenTelemetry Collector configurations, and Grafana dashboards to suit your specific monitoring and observability requirements.
ishanjainn marked this conversation as resolved.
Show resolved Hide resolved
Loading