-
Notifications
You must be signed in to change notification settings - Fork 24
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #236 from sysdiglabs/staging
Release from Staging Y22W32
- Loading branch information
Showing
32 changed files
with
5,923 additions
and
42 deletions.
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
--- | ||
apiVersion: v1 | ||
kind: App | ||
name: "openshift-state-metrics" | ||
keywords: | ||
- Platform | ||
- OpenShift | ||
- Kubernetes | ||
- Available | ||
availableVersions: | ||
- '4.7' | ||
shortDescription: "Specific metrics for OpenShift" | ||
description: | | ||
openshift-state-metrics expands upon kube-state-metrics by adding metrics for OpenShift specific resources | ||
icon: https://raw.githubusercontent.com/sysdiglabs/promcat-resources/master/apps/images/openshift.png | ||
website: https://github.com/openshift/openshift-state-metrics | ||
available: true |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
# Alerts | ||
## No Leader | ||
There is no ActiveController or 'leader' in the Kafka cluster. | ||
|
||
## Too Many Leaders | ||
There is more than one ActiveController or 'leader' in the Kafka cluster. | ||
|
||
## Offline Partitions | ||
There are one or more Offline Partitions. These partitions don’t have an active leader and are hence not writable or readable. | ||
|
||
## Under Replicated Partitions | ||
There are one or more Under Replicated Partitions. | ||
|
||
## Under In-Sync Replicated Partitions | ||
There are one or more Under In-Sync Replicated Partitions. These partitions will be unavailable to producers who use 'acks=all'. | ||
|
||
## ConsumerGroup Lag Not Decreasing | ||
The ConsumerGroup lag is not decreasing. The Consumers might be down, failing to process the messages and continuously retrying, or their consumption rate is lower than the production rate of messages. | ||
|
||
## ConsumerGroup Without Members | ||
The ConsumerGroup doesn't have any members. | ||
|
||
## Producer High ThrottleTime By Client-Id | ||
The Producer has reached its quota and has high throttle time. Applicable when Client-Id-only quotas are being used. | ||
|
||
## Producer High ThrottleTime By User | ||
The Producer has reached its quota and has high throttle time. Applicable when User-only quotas are being used. | ||
|
||
## Producer High ThrottleTime By User And Client-Id | ||
The Producer has reached its quota and has high throttle time. Applicable when Client-Id + User quotas are being used. | ||
|
||
## Consumer High ThrottleTime By Client-Id | ||
The Consumer has reached its quota and has high throttle time. Applicable when Client-Id-only quotas are being used. | ||
|
||
## Consumer High ThrottleTime By User | ||
The Consumer has reached its quota and has high throttle time. Applicable when User-only quotas are being used. | ||
|
||
## Consumer High ThrottleTime By User And Client-Id | ||
The Consumer has reached its quota and has high throttle time. Applicable when Client-Id + User quotas are being used. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,96 @@ | ||
# Prerequisites | ||
|
||
# Installation of the JMX-Exporter as a sidecar | ||
The JMX-Exporter can be easily installed in two steps. | ||
|
||
First deploy the ConfigMap which contains the Kafka JMX configurations. The following example is for a Kafka cluster which exposes the jmx port 9010: | ||
``` | ||
helm repo add promcat-charts https://sysdiglabs.github.io/integrations-charts | ||
helm repo update | ||
helm -n kafka install kafka-jmx-exporter promcat-charts/jmx-exporter --set jmx_port=9010 --set integrationType=kafka --set onlyCreateJMXConfigMap=true | ||
``` | ||
|
||
Then generate a patch file and apply it to your workload (your Kafka Deployment/StatefulSet/Daemonset). The following example is for a Kafka cluster which exposes the jmx port 9010, and is deployed as a StatefulSet called 'kafka-cp-kafka': | ||
``` | ||
helm template kafka-jmx-exporter promcat-charts/jmx-exporter --set jmx_port=9010 --set integrationType=kafka --set onlyCreateSidecarPatch=true > sidecar-patch.yaml | ||
kubectl -n kafka patch sts kafka-cp-kafka --patch-file sidecar-patch.yaml | ||
``` | ||
|
||
# Create Secrets for Authentication for the Kafka-Exporter | ||
Your Kafka cluster external endpoints might be secured by using authentication for the clients that want to connect to it (TLS, SASL+SCARM, SASL+Kerberos). | ||
If you are going to make the Kafka-Exporter (which will be deployed in the next tab) use these secured external endpoints, then you'll need to create Kubernetes Secrets in the following step. | ||
If you prefer using an internal not-secured (plaintext) endpoint for the Kafka-Exporter to connect to the Kafka cluster, then skip this step. | ||
|
||
If using TLS, you'll need to create a Secret which contains the CA, the client certificate and the client key. The names of these files must be "ca.crt", "tls.crt" and "tls.key". The name of the secret can be any name that you want. Example: | ||
``` | ||
kubectl create secret generic kafka-exporter-certs --from-file=./tls.key --from-file=./tls.crt --from-file=./ca.crt --dry-run=true -o yaml | kubectl apply -f - | ||
``` | ||
|
||
If using SASL+SCRAM, you'll need to create a Secret which contains the "username" and "password". Example: | ||
``` | ||
echo -n 'admin' > username | ||
echo -n '1f2d1e2e67df' > password | ||
kubectl create secret generic kafka-exporter-sasl-scram --from-file=username --from-file=password --dry-run=true -o yaml | kubectl apply -f - | ||
``` | ||
|
||
If using SASL+Kerberos, you'll need to create a Secret which contains the "kerberos.conf". If the 'Kerberos Auth Type' is 'keytabAuth', it should also contain the "kerberos.keytab". Example: | ||
``` | ||
kubectl create secret generic kafka-exporter-sasl-kerberos --from-file=./kerberos.conf --from-file=./kerberos.keytab --dry-run=true -o yaml | kubectl apply -f - | ||
``` | ||
|
||
# Installation of the Kafka-Exporter | ||
The Kafka-Exporter can be easily installed with one Helm command. The flags will change depending on the authentication used in Kafka. You can find more info about the flags in the [Kafka Exporter chart values.yaml](https://github.com/sysdiglabs/integrations-charts/blob/main/charts/kafka-exporter/values.yaml). | ||
|
||
Example of Kafka-Exporter without auth: | ||
``` | ||
helm -n kafka install kafka-exporter promcat-charts/kafka-exporter \ | ||
--set namespaceName="kafka" \ | ||
--set workloadType="statefulset" \ | ||
--set workloadName="kafka" \ | ||
--set kafkaServer[0]=kafka-cp-kafka:9092 | ||
``` | ||
|
||
Example of Kafka-Exporter with TLS auth: | ||
``` | ||
helm -n kafka install kafka-exporter promcat-charts/kafka-exporter \ | ||
--set namespaceName="kafka" \ | ||
--set workloadType="statefulset" \ | ||
--set workloadName="kafka" \ | ||
--set kafkaServer[0]=kafka-cp-kafka:9092 \ | ||
--set tls.enabled=true \ | ||
--set tls.insecureSkipVerify=false \ | ||
--set tls.serverName="kafkaServerName" \ | ||
--set tls.secretName="kafka-exporter-certs" | ||
``` | ||
|
||
Example of Kafka-Exporter with SASL+SCRAM auth: | ||
``` | ||
helm -n kafka install kafka-exporter promcat-charts/kafka-exporter \ | ||
--set namespaceName="kafka" \ | ||
--set workloadType="statefulset" \ | ||
--set workloadName="kafka" \ | ||
--set kafkaServer[0]=kafka-cp-kafka:9092 \ | ||
--set sasl.enabled=true \ | ||
--set sasl.handshake=true \ | ||
--set sasl.scram.enabled=true \ | ||
--set sasl.scram.mechanism="plain" \ | ||
--set sasl.scram.secretName="kafka-exporter-sasl-scram" | ||
``` | ||
|
||
Example of Kafka-Exporter with SASL+Kerberos auth: | ||
``` | ||
helm -n kafka install kafka-exporter promcat-charts/kafka-exporter \ | ||
--set namespaceName="kafka" \ | ||
--set workloadType="statefulset" \ | ||
--set workloadName="kafka" \ | ||
--set kafkaServer[0]=kafka-cp-kafka:9092 \ | ||
--set sasl.enabled=true \ | ||
--set sasl.handshake=true \ | ||
--set sasl.kerberos.enabled=true \ | ||
--set sasl.kerberos.serviceName="kerberos-service" \ | ||
--set sasl.kerberos.realm="kerberos-realm" \ | ||
--set sasl.kerberos.kerberosAuthType="keytabAuth" \ | ||
--set sasl.kerberos.secretName="kafka-exporter-sasl-kerberos" | ||
``` | ||
|
||
You can find below ConfigMap with the JMX configurations for Kafka, a patch for the JMX-exporter as a sidecar, a deployment with the Kafka-Exporter without auth, and the Sysdig Agent ConfigMap with the Prometheus job to scrape both exporters. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
# Kafka | ||
Apache Kafka is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications. | ||
|
||
# Prometheus and exporters | ||
Since Kafka isn't instrumentalized for Prometheus, exporters are needed. Here we're using the [jmx_exporter](https://github.com/prometheus/jmx_exporter) and the [kafka_exporter](https://github.com/danielqsj/kafka_exporter). | ||
|
||
# Metrics | ||
|
||
- kafka_brokers | ||
- kafka_consumergroup_current_offset | ||
- kafka_consumergroup_lag | ||
- kafka_consumergroup_members | ||
- kafka_controller_active_controller | ||
- kafka_controller_offline_partitions | ||
- kafka_log_size | ||
- kafka_network_consumer_request_time_milliseconds | ||
- kafka_network_fetch_follower_time_milliseconds | ||
- kafka_network_producer_request_time_milliseconds | ||
- kafka_server_bytes_in | ||
- kafka_server_bytes_out | ||
- kafka_server_consumer_client_byterate | ||
- kafka_server_consumer_client_throttle_time | ||
- kafka_server_consumer_user_byterate | ||
- kafka_server_consumer_user_client_byterate | ||
- kafka_server_consumer_user_client_throttle_time | ||
- kafka_server_consumer_user_throttle_time | ||
- kafka_server_messages_in | ||
- kafka_server_partition_leader_count | ||
- kafka_server_producer_client_byterate | ||
- kafka_server_producer_client_throttle_time | ||
- kafka_server_producer_user_byterate | ||
- kafka_server_producer_user_client_byterate | ||
- kafka_server_producer_user_client_throttle_time | ||
- kafka_server_producer_user_throttle_time | ||
- kafka_server_under_isr_partitions | ||
- kafka_server_under_replicated_partitions | ||
- kafka_server_zookeeper_auth_failures | ||
- kafka_server_zookeeper_disconnections | ||
- kafka_server_zookeeper_expired_sessions | ||
- kafka_server_zookeeper_read_only_connections | ||
- kafka_server_zookeeper_sasl_authentications | ||
- kafka_server_zookeeper_sync_connections | ||
- kafka_topic_partition_current_offset | ||
- kafka_topic_partition_oldest_offset | ||
|
||
# Attributions | ||
Configuration files, dashboards and alerts are maintained by [Sysdig team](https://sysdig.com/). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,119 @@ | ||
apiVersion: v1 | ||
kind: Alert | ||
app: kafka | ||
version: 1.0.0 | ||
appVersion: | ||
- '2.7' | ||
descriptionFile: ALERTS.md | ||
configurations: | ||
- kind: Prometheus | ||
data: |- | ||
groups: | ||
- name: Kafka | ||
rules: | ||
- alert: '[Kafka] No Leader' | ||
expr: | | ||
sum(kafka_controller_active_controller) < 1 | ||
for: 5m | ||
labels: | ||
severity: critical | ||
annotations: | ||
description: There is no ActiveController or 'leader' in the Kafka cluster. | ||
- alert: '[Kafka] Too Many Leaders' | ||
expr: | | ||
sum(kafka_controller_active_controller) > 1 | ||
for: 10m | ||
labels: | ||
severity: critical | ||
annotations: | ||
description: There is more than one ActiveController or 'leader' in the Kafka cluster. | ||
- alert: '[Kafka] Offline Partitions' | ||
expr: | | ||
sum(kafka_controller_offline_partitions) > 0 | ||
for: 5m | ||
labels: | ||
severity: critical | ||
annotations: | ||
description: There are one or more Offline Partitions. These partitions don’t have an active leader and are hence not writable or readable. | ||
- alert: '[Kafka] Under Replicated Partitions' | ||
expr: | | ||
sum(kafka_server_under_replicated_partitions) > 0 | ||
for: 10m | ||
labels: | ||
severity: warning | ||
annotations: | ||
description: There are one or more Under Replicated Partitions. | ||
- alert: '[Kafka] Under In-Sync Replicated Partitions' | ||
expr: | | ||
sum(kafka_server_under_isr_partitions) > 0 | ||
for: 10m | ||
labels: | ||
severity: warning | ||
annotations: | ||
description: There are one or more Under In-Sync Replicated Partitions. These partitions will be unavailable to producers who use 'acks=all'. | ||
- alert: '[Kafka] ConsumerGroup Lag Not Decreasing' | ||
expr: | | ||
(sum by(kube_cluster_name, kube_namespace_name, kube_workload_name, consumergroup, topic)(kafka_consumergroup_lag) > 0) | ||
and | ||
(sum by(kube_cluster_name, kube_namespace_name, kube_workload_name, consumergroup, topic)(delta(kafka_consumergroup_lag[2m])) >= 0) | ||
for: 15m | ||
labels: | ||
severity: warning | ||
annotations: | ||
description: The ConsumerGroup lag is not decreasing. The Consumers might be down, failing to process the messages and continuously retrying, or their consumption rate is lower than the production rate of messages. | ||
- alert: '[Kafka] ConsumerGroup Without Members' | ||
expr: | | ||
sum by(kube_cluster_name, kube_namespace_name, kube_workload_name, consumergroup)(kafka_consumergroup_members) == 0 | ||
for: 10m | ||
labels: | ||
severity: info | ||
annotations: | ||
description: The ConsumerGroup doesn't have any members. | ||
- alert: '[Kafka] Producer High ThrottleTime By Client-Id' | ||
expr: | | ||
max by(kube_cluster_name, kube_namespace_name, kube_workload_name, client_id)(kafka_server_producer_client_throttle_time) > 1000 | ||
for: 5m | ||
labels: | ||
severity: warning | ||
annotations: | ||
description: The Producer has reached its quota and has high throttle time. Applicable when Client-Id-only quotas are being used. | ||
- alert: '[Kafka] Producer High ThrottleTime By User' | ||
expr: | | ||
max by(kube_cluster_name, kube_namespace_name, kube_workload_name, user)(kafka_server_producer_user_throttle_time) > 1000 | ||
for: 5m | ||
labels: | ||
severity: warning | ||
annotations: | ||
description: The Producer has reached its quota and has high throttle time. Applicable when User-only quotas are being used. | ||
- alert: '[Kafka] Producer High ThrottleTime By User And Client-Id' | ||
expr: | | ||
max by(kube_cluster_name, kube_namespace_name, kube_workload_name, user, client_id)(kafka_server_producer_user_client_throttle_time) > 1000 | ||
for: 5m | ||
labels: | ||
severity: warning | ||
annotations: | ||
description: The Producer has reached its quota and has high throttle time. Applicable when Client-Id + User quotas are being used. | ||
- alert: '[Kafka] Consumer High ThrottleTime By Client-Id' | ||
expr: | | ||
max by(kube_cluster_name, kube_namespace_name, kube_workload_name, client_id)(kafka_server_consumer_client_throttle_time) > 1000 | ||
for: 5m | ||
labels: | ||
severity: warning | ||
annotations: | ||
description: The Consumer has reached its quota and has high throttle time. Applicable when Client-Id-only quotas are being used. | ||
- alert: '[Kafka] Consumer High ThrottleTime By User' | ||
expr: | | ||
max by(kube_cluster_name, kube_namespace_name, kube_workload_name, user)(kafka_server_consumer_user_throttle_time) > 1000 | ||
for: 5m | ||
labels: | ||
severity: warning | ||
annotations: | ||
description: The Consumer has reached its quota and has high throttle time. Applicable when User-only quotas are being used. | ||
- alert: '[Kafka] Consumer High ThrottleTime By User And Client-Id' | ||
expr: | | ||
max by(kube_cluster_name, kube_namespace_name, kube_workload_name, user, client_id)(kafka_server_consumer_user_client_throttle_time) > 1000 | ||
for: 5m | ||
labels: | ||
severity: warning | ||
annotations: | ||
description: The Consumer has reached its quota and has high throttle time. Applicable when Client-Id + User quotas are being used. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
apiVersion: v1 | ||
kind: Dashboard | ||
app: kafka | ||
version: 1.0.0 | ||
appVersion: | ||
- '2.7' | ||
configurations: | ||
- name: kafka | ||
kind: Sysdig | ||
image: kafka/images/kafka.png | ||
description: | | ||
This dashboard offers information on: | ||
* Brokers | ||
* Network | ||
* Topics | ||
* ConsumerGroups | ||
* Quotas | ||
* Zookeeper | ||
file: include/Kafka.json |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
apiVersion: v1 | ||
kind: Description | ||
app: kafka | ||
version: 1.0.0 | ||
appVersion: | ||
- '2.7' | ||
descriptionFile: README.md |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Oops, something went wrong.