Skip to content

Commit

Permalink
Merge pull request #218 from sysdiglabs/staging
Browse files Browse the repository at this point in the history
Staging to Prod Y22W20
  • Loading branch information
daviddetorres authored May 26, 2022
2 parents 3de3b53 + 361ecc6 commit 217882e
Show file tree
Hide file tree
Showing 25 changed files with 2,587 additions and 2 deletions.
16 changes: 16 additions & 0 deletions apps/fluentd.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
---
apiVersion: v1
kind: App
name: "Fluentd"
keywords:
- Observability
- Logging
- Available
availableVersions:
- '1.12.4'
shortDescription: "Fluentd is an open source data collector for unified logging layer."
description: |
Fluentd is an open source data collector, which lets you unify the data collection and consumption for a better use and understanding of data.
icon: https://raw.githubusercontent.com/sysdiglabs/promcat-resources/master/apps/images/fluentd.png
website: https://www.fluentd.org/
available: true
Binary file added apps/images/fluentd.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added apps/images/ntp.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
15 changes: 15 additions & 0 deletions apps/ntp.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
---
apiVersion: v1
kind: App
name: "NTP"
keywords:
- Network
- Available
availableVersions:
- '4'
shortDescription: "The Network Time Protocol (NTP) is a networking protocol for clock synchronization between computer systems"
description: |
The Network Time Protocol (NTP) is a networking protocol for clock synchronization between computer systems over packet-switched, variable-latency data networks. In operation since before 1985, NTP is one of the oldest Internet protocols in current use. NTP was designed by David L. Mills of the University of Delaware.
icon: https://raw.githubusercontent.com/sysdiglabs/promcat-resources/master/apps/images/ntp.png
website: http://www.ntp.org/
available: yes
28 changes: 28 additions & 0 deletions resources/fluentd/ALERTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Alerts
## No Input From Container
No Input From Container.

## High Error Ratio
High Error Ratio.

## High Retry Ratio
High Retry Ratio.

## High Retry Wait
High Retry Wait.

## Low Buffer Available Space
Low Buffer Available Space.

## Buffer Queue Length Increasing
Buffer Queue Length Increasing.

## Buffer Total Bytes Increasing
Buffer Total Bytes Increasing.

## High Slow Flush Ratio
High Slow Flush Ratio.

## No Output Records From Plugin
No Output Records From Plugin.

30 changes: 30 additions & 0 deletions resources/fluentd/INSTALL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Prerequisites
Fluentd instruments Prometheus metrics and annotates the pods with Prometheus annotations.

For Fluentd to expose Prometheus metrics, the following plugins need to be enabled:
- 'prometheus' input plugin
- 'prometheus_monitor' input plugin
- 'prometheus_output_monitor' input plugin

As seen in the official plugin documentation (https://github.com/fluent/fluent-plugin-prometheus/blob/master/README.md), they can be enabled with the following configurations:
```
<source>
@type prometheus
@id in_prometheus
bind "0.0.0.0"
port 24231
metrics_path "/metrics"
</source>
<source>
@type prometheus_monitor
@id in_prometheus_monitor
</source>
<source>
@type prometheus_output_monitor
@id in_prometheus_output_monitor
</source>
```

If you are deploying Fluentd using the official Helm chart (https://github.com/fluent/helm-charts/tree/main/charts/fluentd), it already has these plugins enabled by default in its configuration, so no additional actions are needed.
12 changes: 12 additions & 0 deletions resources/fluentd/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Fluentd
Fluentd is an open source data collector, which lets you unify the data collection and consumption for a better use and understanding of data.


# Prometheus and exporters
Fluentd already has a Prometheus endpoint with all the metrics exposed on the port 24231. In Kubernetes the pod is already annotated, so with the Sysdig agent you can scrape the endpoint right away.

# Metrics
- Fluentd internal statistics

# Attributions
Configuration files, dashboards and alerts are maintained by [Sysdig team](https://sysdig.com/).
85 changes: 85 additions & 0 deletions resources/fluentd/alerts.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
apiVersion: v1
kind: Alert
app: Fluentd
version: 1.0.0
appVersion:
- '1.12.4'
descriptionFile: ALERTS.md
configurations:
- kind: Prometheus
data: |-
groups:
- name: Fluentd
rules:
- alert: '[Fluentd] No Input From Container'
expr: |
sum by (input_namespace, input_container)(rate(fluentd_input_status_num_records_total[5m])) == 0
for: 5m
labels:
severity: warning
annotations:
description: No Input From Container.
- alert: '[Fluentd] High Error Ratio'
expr: |
sum by (type, plugin_id)(rate(fluentd_output_status_num_errors[5m])) /sum by (type, plugin_id)(rate(fluentd_output_status_emit_count[5m]))> 0.05
for: 5m
labels:
severity: critical
annotations:
description: High Error Ratio.
- alert: '[Fluentd] High Retry Ratio'
expr: |
sum by (type, plugin_id)(rate(fluentd_output_status_retry_count[5m])) /sum by (type, plugin_id)(rate(fluentd_output_status_emit_count[5m]))> 0.05
for: 5m
labels:
severity: critical
annotations:
description: High Retry Ratio.
- alert: '[Fluentd] High Retry Wait'
expr: |
sum by (type, plugin_id)(max_over_time(fluentd_output_status_retry_wait[5m])) > 60
for: 5m
labels:
severity: critical
annotations:
description: High Retry Wait.
- alert: '[Fluentd] Low Buffer Available Space'
expr: |
fluentd_output_status_buffer_available_space_ratio < 10
for: 5m
labels:
severity: warning
annotations:
description: Low Buffer Available Space.
- alert: '[Fluentd] Buffer Queue Length Increasing'
expr: |
avg_over_time(fluentd_output_status_buffer_queue_length[5m]) - avg_over_time(fluentd_output_status_buffer_queue_length[5m] offset 5m)> 0
for: 5m
labels:
severity: warning
annotations:
description: Buffer Queue Length Increasing.
- alert: '[Fluentd] Buffer Total Bytes Increasing'
expr: |
avg_over_time(fluentd_output_status_buffer_total_bytes[5m]) - avg_over_time(fluentd_output_status_buffer_total_bytes[5m] offset 5m)> 0
for: 15m
labels:
severity: warning
annotations:
description: Buffer Total Bytes Increasing.
- alert: '[Fluentd] High Slow Flush Ratio'
expr: |
sum by (type, plugin_id)(rate(fluentd_output_status_slow_flush_count[5m])) /sum by (type, plugin_id)(rate(fluentd_output_status_emit_count[5m]))> 0.05
for: 5m
labels:
severity: warning
annotations:
description: High Slow Flush Ratio.
- alert: '[Fluentd] No Output Records From Plugin'
expr: |
rate(fluentd_output_status_emit_records[5m]) == 0
for: 5m
labels:
severity: warning
annotations:
description: No Output Records From Plugin.
16 changes: 16 additions & 0 deletions resources/fluentd/dashboards.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
apiVersion: v1
kind: Dashboard
app: Fluentd
version: 1.0.0
appVersion:
- '1.12.4'
configurations:
- name: Fluentd
kind: Sysdig
image: fluentd/images/fluentd.png
description: |
This dashboard offers information on:
* Input/Output
* Buffer
* Flush
file: include/Fluentd.json
7 changes: 7 additions & 0 deletions resources/fluentd/description.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
apiVersion: v1
kind: Description
app: Fluentd
version: 1.0.0
appVersion:
- '1.12.4'
descriptionFile: README.md
Binary file added resources/fluentd/images/fluentd.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 217882e

Please sign in to comment.