Skip to content

Commit

Permalink
Merge pull request #236 from sysdiglabs/staging
Browse files Browse the repository at this point in the history
Release from Staging Y22W32
  • Loading branch information
daviddetorres authored Aug 11, 2022
2 parents 7e416e1 + 6dc399c commit cf509b4
Show file tree
Hide file tree
Showing 32 changed files with 5,923 additions and 42 deletions.
Binary file added apps/images/kafka.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
12 changes: 6 additions & 6 deletions apps/kafka.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,12 @@ kind: App
name: "kafka"
keywords:
- Message-broker
- Coming soon
- Available
availableVersions:
- '2.4'
shortDescription: "Apache Kafka is an open-source stream-processing software platform"
- '2.7'
shortDescription: "Apache Kafka is an open-source stream-processing software platform."
description: |
Kafka is used for building real-time data pipelines and streaming apps. It is horizontally scalable, fault-tolerant, wicked fast, and runs in production in thousands of companies.
icon: https://upload.wikimedia.org/wikipedia/commons/0/05/Apache_kafka.svg
Apache Kafka is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.
icon: https://raw.githubusercontent.com/sysdiglabs/promcat-resources/master/apps/images/kafka.png
website: https://kafka.apache.org/
available: false
available: true
17 changes: 17 additions & 0 deletions apps/openshift-state-metrics.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
---
apiVersion: v1
kind: App
name: "openshift-state-metrics"
keywords:
- Platform
- OpenShift
- Kubernetes
- Available
availableVersions:
- '4.7'
shortDescription: "Specific metrics for OpenShift"
description: |
openshift-state-metrics expands upon kube-state-metrics by adding metrics for OpenShift specific resources
icon: https://raw.githubusercontent.com/sysdiglabs/promcat-resources/master/apps/images/openshift.png
website: https://github.com/openshift/openshift-state-metrics
available: true
39 changes: 39 additions & 0 deletions resources/kafka/ALERTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# Alerts
## No Leader
There is no ActiveController or 'leader' in the Kafka cluster.

## Too Many Leaders
There is more than one ActiveController or 'leader' in the Kafka cluster.

## Offline Partitions
There are one or more Offline Partitions. These partitions don’t have an active leader and are hence not writable or readable.

## Under Replicated Partitions
There are one or more Under Replicated Partitions.

## Under In-Sync Replicated Partitions
There are one or more Under In-Sync Replicated Partitions. These partitions will be unavailable to producers who use 'acks=all'.

## ConsumerGroup Lag Not Decreasing
The ConsumerGroup lag is not decreasing. The Consumers might be down, failing to process the messages and continuously retrying, or their consumption rate is lower than the production rate of messages.

## ConsumerGroup Without Members
The ConsumerGroup doesn't have any members.

## Producer High ThrottleTime By Client-Id
The Producer has reached its quota and has high throttle time. Applicable when Client-Id-only quotas are being used.

## Producer High ThrottleTime By User
The Producer has reached its quota and has high throttle time. Applicable when User-only quotas are being used.

## Producer High ThrottleTime By User And Client-Id
The Producer has reached its quota and has high throttle time. Applicable when Client-Id + User quotas are being used.

## Consumer High ThrottleTime By Client-Id
The Consumer has reached its quota and has high throttle time. Applicable when Client-Id-only quotas are being used.

## Consumer High ThrottleTime By User
The Consumer has reached its quota and has high throttle time. Applicable when User-only quotas are being used.

## Consumer High ThrottleTime By User And Client-Id
The Consumer has reached its quota and has high throttle time. Applicable when Client-Id + User quotas are being used.
96 changes: 96 additions & 0 deletions resources/kafka/INSTALL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
# Prerequisites

# Installation of the JMX-Exporter as a sidecar
The JMX-Exporter can be easily installed in two steps.

First deploy the ConfigMap which contains the Kafka JMX configurations. The following example is for a Kafka cluster which exposes the jmx port 9010:
```
helm repo add promcat-charts https://sysdiglabs.github.io/integrations-charts
helm repo update
helm -n kafka install kafka-jmx-exporter promcat-charts/jmx-exporter --set jmx_port=9010 --set integrationType=kafka --set onlyCreateJMXConfigMap=true
```

Then generate a patch file and apply it to your workload (your Kafka Deployment/StatefulSet/Daemonset). The following example is for a Kafka cluster which exposes the jmx port 9010, and is deployed as a StatefulSet called 'kafka-cp-kafka':
```
helm template kafka-jmx-exporter promcat-charts/jmx-exporter --set jmx_port=9010 --set integrationType=kafka --set onlyCreateSidecarPatch=true > sidecar-patch.yaml
kubectl -n kafka patch sts kafka-cp-kafka --patch-file sidecar-patch.yaml
```

# Create Secrets for Authentication for the Kafka-Exporter
Your Kafka cluster external endpoints might be secured by using authentication for the clients that want to connect to it (TLS, SASL+SCARM, SASL+Kerberos).
If you are going to make the Kafka-Exporter (which will be deployed in the next tab) use these secured external endpoints, then you'll need to create Kubernetes Secrets in the following step.
If you prefer using an internal not-secured (plaintext) endpoint for the Kafka-Exporter to connect to the Kafka cluster, then skip this step.

If using TLS, you'll need to create a Secret which contains the CA, the client certificate and the client key. The names of these files must be "ca.crt", "tls.crt" and "tls.key". The name of the secret can be any name that you want. Example:
```
kubectl create secret generic kafka-exporter-certs --from-file=./tls.key --from-file=./tls.crt --from-file=./ca.crt --dry-run=true -o yaml | kubectl apply -f -
```

If using SASL+SCRAM, you'll need to create a Secret which contains the "username" and "password". Example:
```
echo -n 'admin' > username
echo -n '1f2d1e2e67df' > password
kubectl create secret generic kafka-exporter-sasl-scram --from-file=username --from-file=password --dry-run=true -o yaml | kubectl apply -f -
```

If using SASL+Kerberos, you'll need to create a Secret which contains the "kerberos.conf". If the 'Kerberos Auth Type' is 'keytabAuth', it should also contain the "kerberos.keytab". Example:
```
kubectl create secret generic kafka-exporter-sasl-kerberos --from-file=./kerberos.conf --from-file=./kerberos.keytab --dry-run=true -o yaml | kubectl apply -f -
```

# Installation of the Kafka-Exporter
The Kafka-Exporter can be easily installed with one Helm command. The flags will change depending on the authentication used in Kafka. You can find more info about the flags in the [Kafka Exporter chart values.yaml](https://github.com/sysdiglabs/integrations-charts/blob/main/charts/kafka-exporter/values.yaml).

Example of Kafka-Exporter without auth:
```
helm -n kafka install kafka-exporter promcat-charts/kafka-exporter \
--set namespaceName="kafka" \
--set workloadType="statefulset" \
--set workloadName="kafka" \
--set kafkaServer[0]=kafka-cp-kafka:9092
```

Example of Kafka-Exporter with TLS auth:
```
helm -n kafka install kafka-exporter promcat-charts/kafka-exporter \
--set namespaceName="kafka" \
--set workloadType="statefulset" \
--set workloadName="kafka" \
--set kafkaServer[0]=kafka-cp-kafka:9092 \
--set tls.enabled=true \
--set tls.insecureSkipVerify=false \
--set tls.serverName="kafkaServerName" \
--set tls.secretName="kafka-exporter-certs"
```

Example of Kafka-Exporter with SASL+SCRAM auth:
```
helm -n kafka install kafka-exporter promcat-charts/kafka-exporter \
--set namespaceName="kafka" \
--set workloadType="statefulset" \
--set workloadName="kafka" \
--set kafkaServer[0]=kafka-cp-kafka:9092 \
--set sasl.enabled=true \
--set sasl.handshake=true \
--set sasl.scram.enabled=true \
--set sasl.scram.mechanism="plain" \
--set sasl.scram.secretName="kafka-exporter-sasl-scram"
```

Example of Kafka-Exporter with SASL+Kerberos auth:
```
helm -n kafka install kafka-exporter promcat-charts/kafka-exporter \
--set namespaceName="kafka" \
--set workloadType="statefulset" \
--set workloadName="kafka" \
--set kafkaServer[0]=kafka-cp-kafka:9092 \
--set sasl.enabled=true \
--set sasl.handshake=true \
--set sasl.kerberos.enabled=true \
--set sasl.kerberos.serviceName="kerberos-service" \
--set sasl.kerberos.realm="kerberos-realm" \
--set sasl.kerberos.kerberosAuthType="keytabAuth" \
--set sasl.kerberos.secretName="kafka-exporter-sasl-kerberos"
```

You can find below ConfigMap with the JMX configurations for Kafka, a patch for the JMX-exporter as a sidecar, a deployment with the Kafka-Exporter without auth, and the Sysdig Agent ConfigMap with the Prometheus job to scrape both exporters.
47 changes: 47 additions & 0 deletions resources/kafka/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# Kafka
Apache Kafka is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.

# Prometheus and exporters
Since Kafka isn't instrumentalized for Prometheus, exporters are needed. Here we're using the [jmx_exporter](https://github.com/prometheus/jmx_exporter) and the [kafka_exporter](https://github.com/danielqsj/kafka_exporter).

# Metrics

- kafka_brokers
- kafka_consumergroup_current_offset
- kafka_consumergroup_lag
- kafka_consumergroup_members
- kafka_controller_active_controller
- kafka_controller_offline_partitions
- kafka_log_size
- kafka_network_consumer_request_time_milliseconds
- kafka_network_fetch_follower_time_milliseconds
- kafka_network_producer_request_time_milliseconds
- kafka_server_bytes_in
- kafka_server_bytes_out
- kafka_server_consumer_client_byterate
- kafka_server_consumer_client_throttle_time
- kafka_server_consumer_user_byterate
- kafka_server_consumer_user_client_byterate
- kafka_server_consumer_user_client_throttle_time
- kafka_server_consumer_user_throttle_time
- kafka_server_messages_in
- kafka_server_partition_leader_count
- kafka_server_producer_client_byterate
- kafka_server_producer_client_throttle_time
- kafka_server_producer_user_byterate
- kafka_server_producer_user_client_byterate
- kafka_server_producer_user_client_throttle_time
- kafka_server_producer_user_throttle_time
- kafka_server_under_isr_partitions
- kafka_server_under_replicated_partitions
- kafka_server_zookeeper_auth_failures
- kafka_server_zookeeper_disconnections
- kafka_server_zookeeper_expired_sessions
- kafka_server_zookeeper_read_only_connections
- kafka_server_zookeeper_sasl_authentications
- kafka_server_zookeeper_sync_connections
- kafka_topic_partition_current_offset
- kafka_topic_partition_oldest_offset

# Attributions
Configuration files, dashboards and alerts are maintained by [Sysdig team](https://sysdig.com/).
119 changes: 119 additions & 0 deletions resources/kafka/alerts.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
apiVersion: v1
kind: Alert
app: kafka
version: 1.0.0
appVersion:
- '2.7'
descriptionFile: ALERTS.md
configurations:
- kind: Prometheus
data: |-
groups:
- name: Kafka
rules:
- alert: '[Kafka] No Leader'
expr: |
sum(kafka_controller_active_controller) < 1
for: 5m
labels:
severity: critical
annotations:
description: There is no ActiveController or 'leader' in the Kafka cluster.
- alert: '[Kafka] Too Many Leaders'
expr: |
sum(kafka_controller_active_controller) > 1
for: 10m
labels:
severity: critical
annotations:
description: There is more than one ActiveController or 'leader' in the Kafka cluster.
- alert: '[Kafka] Offline Partitions'
expr: |
sum(kafka_controller_offline_partitions) > 0
for: 5m
labels:
severity: critical
annotations:
description: There are one or more Offline Partitions. These partitions don’t have an active leader and are hence not writable or readable.
- alert: '[Kafka] Under Replicated Partitions'
expr: |
sum(kafka_server_under_replicated_partitions) > 0
for: 10m
labels:
severity: warning
annotations:
description: There are one or more Under Replicated Partitions.
- alert: '[Kafka] Under In-Sync Replicated Partitions'
expr: |
sum(kafka_server_under_isr_partitions) > 0
for: 10m
labels:
severity: warning
annotations:
description: There are one or more Under In-Sync Replicated Partitions. These partitions will be unavailable to producers who use 'acks=all'.
- alert: '[Kafka] ConsumerGroup Lag Not Decreasing'
expr: |
(sum by(kube_cluster_name, kube_namespace_name, kube_workload_name, consumergroup, topic)(kafka_consumergroup_lag) > 0)
and
(sum by(kube_cluster_name, kube_namespace_name, kube_workload_name, consumergroup, topic)(delta(kafka_consumergroup_lag[2m])) >= 0)
for: 15m
labels:
severity: warning
annotations:
description: The ConsumerGroup lag is not decreasing. The Consumers might be down, failing to process the messages and continuously retrying, or their consumption rate is lower than the production rate of messages.
- alert: '[Kafka] ConsumerGroup Without Members'
expr: |
sum by(kube_cluster_name, kube_namespace_name, kube_workload_name, consumergroup)(kafka_consumergroup_members) == 0
for: 10m
labels:
severity: info
annotations:
description: The ConsumerGroup doesn't have any members.
- alert: '[Kafka] Producer High ThrottleTime By Client-Id'
expr: |
max by(kube_cluster_name, kube_namespace_name, kube_workload_name, client_id)(kafka_server_producer_client_throttle_time) > 1000
for: 5m
labels:
severity: warning
annotations:
description: The Producer has reached its quota and has high throttle time. Applicable when Client-Id-only quotas are being used.
- alert: '[Kafka] Producer High ThrottleTime By User'
expr: |
max by(kube_cluster_name, kube_namespace_name, kube_workload_name, user)(kafka_server_producer_user_throttle_time) > 1000
for: 5m
labels:
severity: warning
annotations:
description: The Producer has reached its quota and has high throttle time. Applicable when User-only quotas are being used.
- alert: '[Kafka] Producer High ThrottleTime By User And Client-Id'
expr: |
max by(kube_cluster_name, kube_namespace_name, kube_workload_name, user, client_id)(kafka_server_producer_user_client_throttle_time) > 1000
for: 5m
labels:
severity: warning
annotations:
description: The Producer has reached its quota and has high throttle time. Applicable when Client-Id + User quotas are being used.
- alert: '[Kafka] Consumer High ThrottleTime By Client-Id'
expr: |
max by(kube_cluster_name, kube_namespace_name, kube_workload_name, client_id)(kafka_server_consumer_client_throttle_time) > 1000
for: 5m
labels:
severity: warning
annotations:
description: The Consumer has reached its quota and has high throttle time. Applicable when Client-Id-only quotas are being used.
- alert: '[Kafka] Consumer High ThrottleTime By User'
expr: |
max by(kube_cluster_name, kube_namespace_name, kube_workload_name, user)(kafka_server_consumer_user_throttle_time) > 1000
for: 5m
labels:
severity: warning
annotations:
description: The Consumer has reached its quota and has high throttle time. Applicable when User-only quotas are being used.
- alert: '[Kafka] Consumer High ThrottleTime By User And Client-Id'
expr: |
max by(kube_cluster_name, kube_namespace_name, kube_workload_name, user, client_id)(kafka_server_consumer_user_client_throttle_time) > 1000
for: 5m
labels:
severity: warning
annotations:
description: The Consumer has reached its quota and has high throttle time. Applicable when Client-Id + User quotas are being used.
19 changes: 19 additions & 0 deletions resources/kafka/dashboards.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
apiVersion: v1
kind: Dashboard
app: kafka
version: 1.0.0
appVersion:
- '2.7'
configurations:
- name: kafka
kind: Sysdig
image: kafka/images/kafka.png
description: |
This dashboard offers information on:
* Brokers
* Network
* Topics
* ConsumerGroups
* Quotas
* Zookeeper
file: include/Kafka.json
7 changes: 7 additions & 0 deletions resources/kafka/description.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
apiVersion: v1
kind: Description
app: kafka
version: 1.0.0
appVersion:
- '2.7'
descriptionFile: README.md
Binary file added resources/kafka/images/kafka.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit cf509b4

Please sign in to comment.