Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blog: Scaling OpenTelemetry Collectors using Ansible #4182

Merged
merged 41 commits into from
Apr 15, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
91d0c77
add blog for otelcol-ansible
ishanjainn Mar 19, 2024
e2bdd24
Merge branch 'main' into otel-ansible
ishanjainn Mar 19, 2024
175ff3a
Update content/en/blog/2024/scaling-opentelemetry-collectors.md
ishanjainn Mar 19, 2024
bf58bb0
Update content/en/blog/2024/scaling-opentelemetry-collectors.md
ishanjainn Mar 19, 2024
0b75fed
Update content/en/blog/2024/scaling-opentelemetry-collectors.md
ishanjainn Mar 19, 2024
0b103f7
Update content/en/blog/2024/scaling-opentelemetry-collectors.md
ishanjainn Mar 19, 2024
883cd02
Update content/en/blog/2024/scaling-opentelemetry-collectors.md
ishanjainn Mar 19, 2024
63b8bfc
Update content/en/blog/2024/scaling-opentelemetry-collectors.md
ishanjainn Mar 19, 2024
ab6aa24
Update content/en/blog/2024/scaling-opentelemetry-collectors.md
ishanjainn Mar 19, 2024
3743fa6
Update content/en/blog/2024/scaling-opentelemetry-collectors.md
ishanjainn Mar 19, 2024
3de9b49
Update content/en/blog/2024/scaling-opentelemetry-collectors.md
ishanjainn Mar 19, 2024
df72dba
Update content/en/blog/2024/scaling-opentelemetry-collectors.md
ishanjainn Mar 19, 2024
10e80dc
Update content/en/blog/2024/scaling-opentelemetry-collectors.md
ishanjainn Mar 19, 2024
bb0ba96
Update content/en/blog/2024/scaling-opentelemetry-collectors.md
ishanjainn Mar 19, 2024
1250d2e
Update content/en/blog/2024/scaling-opentelemetry-collectors.md
ishanjainn Mar 19, 2024
c3ce486
Update content/en/blog/2024/scaling-opentelemetry-collectors.md
ishanjainn Mar 19, 2024
676a8e7
Update content/en/blog/2024/scaling-opentelemetry-collectors.md
ishanjainn Mar 19, 2024
40e03d6
Update content/en/blog/2024/scaling-opentelemetry-collectors.md
ishanjainn Mar 19, 2024
1321e7d
Update content/en/blog/2024/scaling-opentelemetry-collectors.md
ishanjainn Mar 19, 2024
39723a5
Update content/en/blog/2024/scaling-opentelemetry-collectors.md
ishanjainn Mar 19, 2024
f1f4693
Merge branch 'main' into otel-ansible
ishanjainn Mar 19, 2024
0cedfc3
update blog content
ishanjainn Mar 19, 2024
79f6ff3
remove setup hosts step
ishanjainn Mar 19, 2024
c138901
Update scaling-opentelemetry-collectors.md
theletterf Mar 19, 2024
35fd956
Typo
theletterf Mar 19, 2024
58acec5
Results from /fix:all
opentelemetrybot Mar 19, 2024
1477e4a
add link to grafan for flow
ishanjainn Mar 20, 2024
36a57b3
rephrase the ansible inventory section
ishanjainn Mar 20, 2024
68f3d70
find:fix
ishanjainn Mar 20, 2024
62c9425
Merge branch 'main' into otel-ansible
theletterf Mar 26, 2024
197716c
Results from /fix:all
opentelemetrybot Mar 26, 2024
8873b13
Merge branch 'main' into otel-ansible
ishanjainn Apr 2, 2024
3061cd4
update blog according to suggestions
ishanjainn Apr 10, 2024
b7df8c1
Merge branch 'main' into otel-ansible
chalin Apr 12, 2024
588c459
Results from /fix:all
opentelemetrybot Apr 12, 2024
e029d58
Merge branch 'main' into otel-ansible
chalin Apr 12, 2024
4ef8ce6
Apply suggestions from code review
svrnm Apr 15, 2024
85c3a1c
Merge branch 'main' into otel-ansible
svrnm Apr 15, 2024
bbec3c9
Set date
svrnm Apr 15, 2024
5ad02fb
Results from /fix:dict
opentelemetrybot Apr 15, 2024
206dbc5
Merge branch 'main' into otel-ansible
svrnm Apr 15, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
222 changes: 222 additions & 0 deletions content/en/blog/2024/scaling-collectors.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,222 @@
---
title: Manage OpenTelemetry Collectors at scale with Ansible
linkTitle: Collectors at scale with Ansible
date: 2024-04-15
author: '[Ishan Jain](https://github.com/ishanjainn) (Grafana)'
cSpell:ignore: ansible associated Ishan ishanjainn Jain
---

You can scale the deployment of
[OpenTelemetry Collector](/docs/collector/deployment/) across multiple Linux
hosts through [Ansible](https://www.ansible.com/), to function both as
[gateways](/docs/collector/deployment/gateway/) and
[agents](/docs/collector/deployment/agent/) within your observability
architecture. Using the OpenTelemetry Collector in this dual capacity enables a
robust collection and forwarding of metrics, traces, and logs to analysis and
visualization platforms.

We outline a strategy for deploying and managing the OpenTelemetry Collector's
scalable instances throughout your infrastructure using Ansible. In the
following example, we'll use [Grafana](https://grafana.com/) as the target
backend for metrics.

## Prerequisites

Before we begin, make sure you meet the following requirements:

- Ansible installed on your base system
- SSH access to two or more Linux hosts
- Prometheus configured to gather your metrics

## Install the Grafana Ansible collection

The
[OpenTelemetry Collector role](https://github.com/grafana/grafana-ansible-collection/tree/main/roles/opentelemetry_collector)
is provided through the
[Grafana Ansible collection](https://docs.ansible.com/ansible/latest/collections/grafana/grafana/)
as of release 4.0.

To install the Grafana Ansible collection, run this command:

```sh
ansible-galaxy collection install grafana.grafana
```

## Create an Ansible inventory file

Next, gather the IP addresses and URLs associated with your Linux hosts and
create an inventory file.

1. Create an Ansible inventory file.

An Ansible inventory, which resides in a file named `inventory`, lists each
host IP on a separate line, like this (8 hosts shown):

```properties
10.0.0.1 # hostname = ubuntu-01
10.0.0.2 # hostname = ubuntu-02
10.0.0.3 # hostname = centos-01
10.0.0.4 # hostname = centos-02
10.0.0.5 # hostname = debian-01
10.0.0.6 # hostname = debian-02
10.0.0.7 # hostname = fedora-01
10.0.0.8 # hostname = fedora-02
```

2. Create an `ansible.cfg` file within the same directory as `inventory`, with
the following values:

```toml
[defaults]
inventory = inventory # Path to the inventory file
private_key_file = ~/.ssh/id_rsa # Path to private SSH Key
remote_user=root
```

## Use the OpenTelemetry Collector Ansible role

Next, define an Ansible playbook to apply your chosen or created OpenTelemetry
Collector role across your hosts.

Create a file named `deploy-opentelemetry.yml` in the same directory as your
`ansible.cfg` and `inventory` files:

```yaml
- name: Install OpenTelemetry Collector
hosts: all
become: true

tasks:
- name: Install OpenTelemetry Collector
ansible.builtin.include_role:
name: opentelemetry_collectorr
vars:
otel_collector_receivers:
hostmetrics:
collection_interval: 60s
scrapers:
cpu: {}
disk: {}
load: {}
filesystem: {}
memory: {}
network: {}
paging: {}
process:
mute_process_name_error: true
mute_process_exe_error: true
mute_process_io_error: true
processes: {}

otel_collector_processors:
batch:
resourcedetection:
detectors: [env, system]
timeout: 2s
system:
hostname_sources: [os]
transform/add_resource_attributes_as_metric_attributes:
error_mode: ignore
metric_statements:
- context: datapoint
statements:
- set(attributes["deployment.environment"],
resource.attributes["deployment.environment"])
- set(attributes["service.version"],
resource.attributes["service.version"])

otel_collector_exporters:
prometheusremotewrite:
endpoint: https://<prometheus-url>/api/prom/push
headers:
Authorization: 'Basic <base64-encoded-username:password>'

otel_collector_service:
pipelines:
metrics:
receivers: [hostmetrics]
processors:
[
resourcedetection,
transform/add_resource_attributes_as_metric_attributes,
batch,
]
exporters: [prometheusremotewrite]
```

{{% alert title="Note" %}}

Adjust the configuration to match the specific telemetry you intend to collect
as well as where you plan to forward it to. This configuration snippet is a
basic example designed for collecting host metrics that get forwarded to
Prometheus.

{{% /alert %}}

The previous configuration would provision the OpenTelemetry Collector to
collect metrics from the Linux host.

## Running the Ansible playbook

Deploy the OpenTelemetry Collector across your hosts by running the following
command:

```sh
ansible-playbook deploy-opentelemetry.yml
```

## Check your metrics in the backend

After your OpenTelemetry Collectors start sending metrics to Prometheus, follow
these steps to visualize them in Grafana:

### Set up Grafana

1. **Install Docker**: Make sure Docker is installed on your system.

2. **Run Grafana Docker Container**: Start a Grafana server with the following
command, which fetches the latest Grafana image:

```sh
docker run -d -p 3000:3000 --name=grafana grafana/grafana
```

3. **Access Grafana**: Open <http://localhost:3000> in your web browser. The
default login username and password are both `admin`.

4. **Change passwords** when prompted on first login -- pick a secure one!

For other installation methods and more detailed instructions, refer to the
[official Grafana documentation](https://grafana.com/docs/grafana/latest/#installing-grafana).

### Add Prometheus as a data source

1. In Grafana, navigate to **Connections** > **Data Sources**.
2. Click **Add data source** and select **Prometheus**.
3. In the settings, enter your Prometheus URL, for example,
`http://<your_prometheus_host>`, along with any other necessary details.
4. Select **Save & Test**.

### Explore your metrics

1. Go to the **Explore** page
2. In the Query editor, select your data source and enter the following query

```PromQL
100 - (avg by (cpu) (irate(system_cpu_time{state="idle"}[5m])) * 100)
```

This query calculates the average percentage of CPU time not spent in the
"idle" state, across each CPU core, over the last 5 minutes.

3. Explore other metrics and create dashboards to gain insights into your
system's performance.

This blog post illustrated how you can configure and deploy multiple
OpenTelemetry Collectors across various Linux hosts with the help of Ansible, as
well as visualize collected telemetry in Grafana. Incase you find this useful,
GitHub repository for
[OpenTelemetry Collector role](https://github.com/grafana/grafana-ansible-collection/tree/main/roles/opentelemetry_collector)
for detailed configuration options. If you have questions, You can connect with
me using my contact details at my GitHub profile
[@ishanjainn](https://github.com/ishanjainn).
22 changes: 19 additions & 3 deletions static/refcache.json
Original file line number Diff line number Diff line change
Expand Up @@ -811,6 +811,10 @@
"StatusCode": 206,
"LastSeen": "2024-01-30T16:07:39.690877-05:00"
},
"https://docs.ansible.com/ansible/latest/collections/grafana/grafana/": {
"StatusCode": 206,
"LastSeen": "2024-03-19T11:21:52.991213698Z"
},
"https://docs.appdynamics.com/latest/en/application-monitoring/appdynamics-for-opentelemetry": {
"StatusCode": 200,
"LastSeen": "2024-01-18T08:51:22.195056-05:00"
Expand Down Expand Up @@ -2595,6 +2599,10 @@
"StatusCode": 200,
"LastSeen": "2024-01-30T16:14:36.112572-05:00"
},
"https://github.com/ishanjainn": {
"StatusCode": 200,
"LastSeen": "2024-03-19T11:21:47.871135724Z"
},
"https://github.com/jack-berg": {
"StatusCode": 200,
"LastSeen": "2024-01-18T20:04:54.949867-05:00"
Expand Down Expand Up @@ -4489,15 +4497,19 @@
},
"https://grafana.com/docs/alloy/latest/": {
"StatusCode": 200,
"LastSeen": "2024-04-10T00:09:47.949842+02:00"
"LastSeen": "2024-04-12T20:40:28.798266582Z"
},
"https://grafana.com/docs/grafana-cloud/monitor-applications/application-observability/setup/instrument/dotnet/": {
"StatusCode": 200,
"LastSeen": "2024-04-10T00:09:50.125651+02:00"
"LastSeen": "2024-04-12T20:40:30.368448693Z"
},
"https://grafana.com/docs/grafana-cloud/monitor-applications/application-observability/setup/instrument/java/": {
"StatusCode": 200,
"LastSeen": "2024-04-10T00:09:55.400731+02:00"
"LastSeen": "2024-04-12T20:40:34.652514906Z"
},
"https://grafana.com/docs/grafana/latest/#installing-grafana": {
"StatusCode": 200,
"LastSeen": "2024-04-12T20:40:33.435682362Z"
},
"https://grafana.com/oss/opentelemetry/": {
"StatusCode": 200,
Expand Down Expand Up @@ -7811,6 +7823,10 @@
"StatusCode": 200,
"LastSeen": "2024-01-19T09:04:05.862693+01:00"
},
"https://www.ansible.com/": {
"StatusCode": 200,
"LastSeen": "2024-03-19T11:21:48.883430689Z"
},
"https://www.apollographql.com/docs/federation/": {
"StatusCode": 206,
"LastSeen": "2024-01-18T19:55:56.349642-05:00"
Expand Down