From c0c9aa8028de476cf30036f3fe7e0ceaa52b38b9 Mon Sep 17 00:00:00 2001 From: Bruno Cadonna Date: Wed, 26 Feb 2020 14:05:47 +0100 Subject: [PATCH 1/4] Adapt metrics docs of Streams according to KIP-444 --- docs/ops.html | 521 +++++++++++++++++++------------------------------- 1 file changed, 202 insertions(+), 319 deletions(-) diff --git a/docs/ops.html b/docs/ops.html index 0fd0ba18f4a17..a6c89059caf07 100644 --- a/docs/ops.html +++ b/docs/ops.html @@ -85,16 +85,16 @@

Graceful shutdown<

Balancing leadership

- Whenever a broker stops or crashes, leadership for that broker's partitions transfers to other replicas. When the broker is restarted it will only be a follower for all its partitions, meaning it will not be used for client reads and writes. + Whenever a broker stops or crashes leadership for that broker's partitions transfers to other replicas. This means that by default when the broker is restarted it will only be a follower for all its partitions, meaning it will not be used for client reads and writes.

- To avoid this imbalance, Kafka has a notion of preferred replicas. If the list of replicas for a partition is 1,5,9 then node 1 is preferred as the leader to either node 5 or 9 because it is earlier in the replica list. By default the Kafka cluster will try to restore leadership to the restored replicas. This behaviour is configured with: + To avoid this imbalance, Kafka has a notion of preferred replicas. If the list of replicas for a partition is 1,5,9 then node 1 is preferred as the leader to either node 5 or 9 because it is earlier in the replica list. You can have the Kafka cluster try to restore leadership to the restored replicas by running the command: +

+  > bin/kafka-preferred-replica-election.sh --zookeeper zk_host:port/chroot
+  
+ Since running this command can be tedious you can also configure Kafka to do this automatically by setting the following configuration:
       auto.leader.rebalance.enable=true
-  
- You can also set this to false, but you will then need to manually restore leadership to the restored replicas by running the command: -
-  > bin/kafka-preferred-replica-election.sh --zookeeper zk_host:port/chroot
   

Balancing Replicas Across Racks

@@ -453,7 +453,7 @@

Limiting Bandwidth Usage during Da There are two interfaces that can be used to engage a throttle. The simplest, and safest, is to apply a throttle when invoking the kafka-reassign-partitions.sh, but kafka-configs.sh can also be used to view and alter the throttle values directly.

So for example, if you were to execute a rebalance, with the below command, it would move partitions at no more than 50MB/s. -
$ bin/kafka-reassign-partitions.sh --zookeeper localhost:2181 --execute --reassignment-json-file bigger-cluster.json --throttle 50000000
+
$ bin/kafka-reassign-partitions.sh --zookeeper localhost:2181 --execute --reassignment-json-file bigger-cluster.json —throttle 50000000
When you execute this script you will see the throttle engage:
   The throttle limit was set to 50000000 B/s
@@ -526,8 +526,8 @@ 
Safe usage of throttled replication

Some care should be taken when using throttled replication. In particular:

(1) Throttle Removal:

- The throttle should be removed in a timely manner once reassignment completes (by running kafka-reassign-partitions.sh - --verify). + The throttle should be removed in a timely manner once reassignment completes (by running kafka-reassign-partitions + —verify).

(2) Ensuring Progress:

If the throttle is set too low, in comparison to the incoming write rate, it is possible for replication to not @@ -1600,9 +1600,8 @@

Connect Monitoring

Streams Monitoring

- A Kafka Streams instance contains all the producer and consumer metrics as well as additional metrics specific to streams. - By default Kafka Streams has metrics with two recording levels: debug and info. The debug level records all metrics, while - the info level records only the thread-level metrics. + A Kafka Streams instance contains all the producer and consumer metrics as well as additional metrics specific to Streams. + By default Kafka Streams has metrics with two recording levels: debug and info.

Note that the metrics have a 4-layer hierarchy. At the top level there are client-level metrics for each started @@ -1662,121 +1661,113 @@

Task Metrics
-All the following metrics have a recording level of debug: +All the following metrics have a recording level of debug, +except for metrics dropped-records-(rate | total) which have a recording level +of info: @@ -1784,35 +1775,75 @@
+ + + + + + + + + + + + + + + + + + + + - - + + - - + + - - + + - + - - + + - - + + + + + + + + + + + + + + + + + + + + + +
Description Mbean name
process-latency-avgThe average execution time in ns, for processing.kafka.streams:type=stream-task-metrics,thread-id=([-.\w]+),task-id=([-.\w]+)
process-latency-maxThe maximum execution time in ns, for processing.kafka.streams:type=stream-task-metrics,thread-id=([-.\w]+),task-id=([-.\w]+)
process-rateThe average number of processed records per second across all source processor nodes of this task.kafka.streams:type=stream-thread-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=all
process-totalThe total number of processed records across all source processor nodes of this task.kafka.streams:type=stream-thread-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=all
commit-latency-avgThe average commit time in ns for this task. kafka.streams:type=stream-task-metrics,client-id=([-.\w]+),task-id=([-.\w]+)The average execution time in ns, for committing.kafka.streams:type=stream-task-metrics,thread-id=([-.\w]+),task-id=([-.\w]+)
commit-latency-maxThe maximum commit time in ns for this task. kafka.streams:type=stream-task-metrics,client-id=([-.\w]+),task-id=([-.\w]+)The maximum execution time in ns, for committing.kafka.streams:type=stream-task-metrics,thread-id=([-.\w]+),task-id=([-.\w]+)
commit-rateThe average number of commit calls per second. kafka.streams:type=stream-task-metrics,client-id=([-.\w]+),task-id=([-.\w]+)The average number of commit calls per second.kafka.streams:type=stream-task-metrics,thread-id=([-.\w]+),task-id=([-.\w]+)
commit-total The total number of commit calls. kafka.streams:type=stream-task-metrics,client-id=([-.\w]+),task-id=([-.\w]+)kafka.streams:type=stream-task-metrics,thread-id=([-.\w]+),task-id=([-.\w]+)
record-lateness-avgThe average observed lateness of records.kafka.streams:type=stream-task-metrics,client-id=([-.\w]+),task-id=([-.\w]+)The average observed lateness of records (stream time - record timestamp).kafka.streams:type=stream-task-metrics,thread-id=([-.\w]+),task-id=([-.\w]+)
record-lateness-maxThe max observed lateness of records.kafka.streams:type=stream-task-metrics,client-id=([-.\w]+),task-id=([-.\w]+)The max observed lateness of records (stream time - record timestamp).kafka.streams:type=stream-task-metrics,thread-id=([-.\w]+),task-id=([-.\w]+)
enforced-processing-rateThe average number of enforced processings per second.kafka.streams:type=stream-task-metrics,thread-id=([-.\w]+),task-id=([-.\w]+)
enforced-processing-totalThe total number enforced processings.kafka.streams:type=stream-task-metrics,thread-id=([-.\w]+),task-id=([-.\w]+)
dropped-records-rateThe average number of records dropped within this task.kafka.streams:type=stream-task-metrics,thread-id=([-.\w]+),task-id=([-.\w]+)
dropped-records-totalThe total number of records dropped within this task.kafka.streams:type=stream-task-metrics,thread-id=([-.\w]+),task-id=([-.\w]+)
@@ -1826,111 +1857,25 @@
Description Mbean name - - process-latency-avg - The average process execution time in ns. - kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+) - - - process-latency-max - The maximum process execution time in ns. - kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+) - - - punctuate-latency-avg - The average punctuate execution time in ns. - kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+) - - - punctuate-latency-max - The maximum punctuate execution time in ns. - kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+) - - - create-latency-avg - The average create execution time in ns. - kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+) - - - create-latency-max - The maximum create execution time in ns. - kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+) - - - destroy-latency-avg - The average destroy execution time in ns. - kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+) - - - destroy-latency-max - The maximum destroy execution time in ns. - kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+) - process-rate - The average number of process operations per second. - kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+) + The average number of records processed by a source processor node per second. + kafka.streams:type=stream-processor-node-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+) process-total - The total number of process operations called. - kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+) - - - punctuate-rate - The average number of punctuate operations per second. - kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+) - - - punctuate-total - The total number of punctuate operations called. - kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+) - - - create-rate - The average number of create operations per second. - kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+) - - - create-total - The total number of create operations called. - kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+) - - - destroy-rate - The average number of destroy operations per second. - kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+) - - - destroy-total - The total number of destroy operations called. - kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+) - - - forward-rate - The average rate of records being forwarded downstream, from source nodes only, per second. - kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+) - - - forward-total - The total number of of records being forwarded downstream, from source nodes only. - kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+) + The total number of records processed by a source processor node per second. + kafka.streams:type=stream-processor-node-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+) suppression-emit-rate - - The rate at which records that have been emitted downstream from suppression operation nodes. - Compare with the process-rate metric to determine how many updates are being suppressed. - - kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+) + The rate at which records that have been emitted downstream from suppression operation nodes. + kafka.streams:type=stream-processor-node-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+) suppression-emit-total - - The total number of records that have been emitted downstream from suppression operation nodes. - Compare with the process-total metric to determine how many updates are being suppressed. - - kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+) + The total number of records that have been emitted downstream from suppression operation nodes. + kafka.streams:type=stream-processor-node-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+) @@ -1942,6 +1887,7 @@
Suppression Buffer Metrics
- All the following metrics have a recording level of debug: - - - - - - - + + + - - - + + + - - - - - - - - - - - - - - - - - - + + +
suppression-buffer-size-currentThe current total size, in bytes, of the buffered data.kafka.streams:type=stream-buffer-metrics,client-id=([-.\w]+),task-id=([-.\w]+),buffer-id=([-.\w]+)hit-ratio-avgThe average cache hit ratio defined as the ratio of cache read hits over the total cache read requests.kafka.streams:type=stream-record-cache-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),record-cache-id=([-.\w]+)
suppression-buffer-size-avgThe average total size, in bytes, of the buffered data over the sampling window.kafka.streams:type=stream-buffer-metrics,client-id=([-.\w]+),task-id=([-.\w]+),buffer-id=([-.\w]+)hit-ratio-minThe mininum cache hit ratio.kafka.streams:type=stream-record-cache-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),record-cache-id=([-.\w]+)
suppression-buffer-size-maxThe maximum total size, in bytes, of the buffered data over the sampling window.kafka.streams:type=stream-buffer-metrics,client-id=([-.\w]+),task-id=([-.\w]+),buffer-id=([-.\w]+)
suppression-buffer-count-currentThe current number of records buffered.kafka.streams:type=stream-buffer-metrics,client-id=([-.\w]+),task-id=([-.\w]+),buffer-id=([-.\w]+)
suppression-buffer-size-avgThe average number of records buffered over the sampling window.kafka.streams:type=stream-buffer-metrics,client-id=([-.\w]+),task-id=([-.\w]+),buffer-id=([-.\w]+)
suppression-buffer-size-maxThe maximum number of records buffered over the sampling window.kafka.streams:type=stream-buffer-metrics,client-id=([-.\w]+),task-id=([-.\w]+),buffer-id=([-.\w]+)hit-ratio-maxThe maximum cache hit ratio.kafka.streams:type=stream-record-cache-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),record-cache-id=([-.\w]+)
From 8d2c2f95c2c1761f3151ac53a61d47755ab8f07a Mon Sep 17 00:00:00 2001 From: Bruno Cadonna Date: Wed, 26 Feb 2020 19:48:22 +0100 Subject: [PATCH 2/4] Revert "Adapt metrics docs of Streams according to KIP-444" This reverts commit c0c9aa8028de476cf30036f3fe7e0ceaa52b38b9. --- docs/ops.html | 521 +++++++++++++++++++++++++++++++------------------- 1 file changed, 319 insertions(+), 202 deletions(-) diff --git a/docs/ops.html b/docs/ops.html index a6c89059caf07..0fd0ba18f4a17 100644 --- a/docs/ops.html +++ b/docs/ops.html @@ -85,16 +85,16 @@

Graceful shutdown<

Balancing leadership

- Whenever a broker stops or crashes leadership for that broker's partitions transfers to other replicas. This means that by default when the broker is restarted it will only be a follower for all its partitions, meaning it will not be used for client reads and writes. + Whenever a broker stops or crashes, leadership for that broker's partitions transfers to other replicas. When the broker is restarted it will only be a follower for all its partitions, meaning it will not be used for client reads and writes.

- To avoid this imbalance, Kafka has a notion of preferred replicas. If the list of replicas for a partition is 1,5,9 then node 1 is preferred as the leader to either node 5 or 9 because it is earlier in the replica list. You can have the Kafka cluster try to restore leadership to the restored replicas by running the command: -

-  > bin/kafka-preferred-replica-election.sh --zookeeper zk_host:port/chroot
-  
+ To avoid this imbalance, Kafka has a notion of preferred replicas. If the list of replicas for a partition is 1,5,9 then node 1 is preferred as the leader to either node 5 or 9 because it is earlier in the replica list. By default the Kafka cluster will try to restore leadership to the restored replicas. This behaviour is configured with: - Since running this command can be tedious you can also configure Kafka to do this automatically by setting the following configuration:
       auto.leader.rebalance.enable=true
+  
+ You can also set this to false, but you will then need to manually restore leadership to the restored replicas by running the command: +
+  > bin/kafka-preferred-replica-election.sh --zookeeper zk_host:port/chroot
   

Balancing Replicas Across Racks

@@ -453,7 +453,7 @@

Limiting Bandwidth Usage during Da There are two interfaces that can be used to engage a throttle. The simplest, and safest, is to apply a throttle when invoking the kafka-reassign-partitions.sh, but kafka-configs.sh can also be used to view and alter the throttle values directly.

So for example, if you were to execute a rebalance, with the below command, it would move partitions at no more than 50MB/s. -
$ bin/kafka-reassign-partitions.sh --zookeeper localhost:2181 --execute --reassignment-json-file bigger-cluster.json —throttle 50000000
+
$ bin/kafka-reassign-partitions.sh --zookeeper localhost:2181 --execute --reassignment-json-file bigger-cluster.json --throttle 50000000
When you execute this script you will see the throttle engage:
   The throttle limit was set to 50000000 B/s
@@ -526,8 +526,8 @@ 
Safe usage of throttled replication

Some care should be taken when using throttled replication. In particular:

(1) Throttle Removal:

- The throttle should be removed in a timely manner once reassignment completes (by running kafka-reassign-partitions - —verify). + The throttle should be removed in a timely manner once reassignment completes (by running kafka-reassign-partitions.sh + --verify).

(2) Ensuring Progress:

If the throttle is set too low, in comparison to the incoming write rate, it is possible for replication to not @@ -1600,8 +1600,9 @@

Connect Monitoring

Streams Monitoring

- A Kafka Streams instance contains all the producer and consumer metrics as well as additional metrics specific to Streams. - By default Kafka Streams has metrics with two recording levels: debug and info. + A Kafka Streams instance contains all the producer and consumer metrics as well as additional metrics specific to streams. + By default Kafka Streams has metrics with two recording levels: debug and info. The debug level records all metrics, while + the info level records only the thread-level metrics.

Note that the metrics have a 4-layer hierarchy. At the top level there are client-level metrics for each started @@ -1661,113 +1662,121 @@

Task Metrics
-All the following metrics have a recording level of debug, -except for metrics dropped-records-(rate | total) which have a recording level -of info: +All the following metrics have a recording level of debug: @@ -1775,75 +1784,35 @@
- - - - - - - - - - - - - - - - - - - - - - + + - - + + - - + + - + - - + + - - - - - - - - - - - - - - - - - - - - - - + +
Description Mbean name
process-latency-avgThe average execution time in ns, for processing.kafka.streams:type=stream-task-metrics,thread-id=([-.\w]+),task-id=([-.\w]+)
process-latency-maxThe maximum execution time in ns, for processing.kafka.streams:type=stream-task-metrics,thread-id=([-.\w]+),task-id=([-.\w]+)
process-rateThe average number of processed records per second across all source processor nodes of this task.kafka.streams:type=stream-thread-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=all
process-totalThe total number of processed records across all source processor nodes of this task.kafka.streams:type=stream-thread-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=all
commit-latency-avgThe average execution time in ns, for committing.kafka.streams:type=stream-task-metrics,thread-id=([-.\w]+),task-id=([-.\w]+)The average commit time in ns for this task. kafka.streams:type=stream-task-metrics,client-id=([-.\w]+),task-id=([-.\w]+)
commit-latency-maxThe maximum execution time in ns, for committing.kafka.streams:type=stream-task-metrics,thread-id=([-.\w]+),task-id=([-.\w]+)The maximum commit time in ns for this task. kafka.streams:type=stream-task-metrics,client-id=([-.\w]+),task-id=([-.\w]+)
commit-rateThe average number of commit calls per second.kafka.streams:type=stream-task-metrics,thread-id=([-.\w]+),task-id=([-.\w]+)The average number of commit calls per second. kafka.streams:type=stream-task-metrics,client-id=([-.\w]+),task-id=([-.\w]+)
commit-total The total number of commit calls. kafka.streams:type=stream-task-metrics,thread-id=([-.\w]+),task-id=([-.\w]+)kafka.streams:type=stream-task-metrics,client-id=([-.\w]+),task-id=([-.\w]+)
record-lateness-avgThe average observed lateness of records (stream time - record timestamp).kafka.streams:type=stream-task-metrics,thread-id=([-.\w]+),task-id=([-.\w]+)The average observed lateness of records.kafka.streams:type=stream-task-metrics,client-id=([-.\w]+),task-id=([-.\w]+)
record-lateness-maxThe max observed lateness of records (stream time - record timestamp).kafka.streams:type=stream-task-metrics,thread-id=([-.\w]+),task-id=([-.\w]+)
enforced-processing-rateThe average number of enforced processings per second.kafka.streams:type=stream-task-metrics,thread-id=([-.\w]+),task-id=([-.\w]+)
enforced-processing-totalThe total number enforced processings.kafka.streams:type=stream-task-metrics,thread-id=([-.\w]+),task-id=([-.\w]+)
dropped-records-rateThe average number of records dropped within this task.kafka.streams:type=stream-task-metrics,thread-id=([-.\w]+),task-id=([-.\w]+)
dropped-records-totalThe total number of records dropped within this task.kafka.streams:type=stream-task-metrics,thread-id=([-.\w]+),task-id=([-.\w]+)The max observed lateness of records.kafka.streams:type=stream-task-metrics,client-id=([-.\w]+),task-id=([-.\w]+)
@@ -1857,25 +1826,111 @@
Description Mbean name + + process-latency-avg + The average process execution time in ns. + kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+) + + + process-latency-max + The maximum process execution time in ns. + kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+) + + + punctuate-latency-avg + The average punctuate execution time in ns. + kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+) + + + punctuate-latency-max + The maximum punctuate execution time in ns. + kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+) + + + create-latency-avg + The average create execution time in ns. + kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+) + + + create-latency-max + The maximum create execution time in ns. + kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+) + + + destroy-latency-avg + The average destroy execution time in ns. + kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+) + + + destroy-latency-max + The maximum destroy execution time in ns. + kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+) + process-rate - The average number of records processed by a source processor node per second. - kafka.streams:type=stream-processor-node-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+) + The average number of process operations per second. + kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+) process-total - The total number of records processed by a source processor node per second. - kafka.streams:type=stream-processor-node-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+) + The total number of process operations called. + kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+) + + + punctuate-rate + The average number of punctuate operations per second. + kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+) + + + punctuate-total + The total number of punctuate operations called. + kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+) + + + create-rate + The average number of create operations per second. + kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+) + + + create-total + The total number of create operations called. + kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+) + + + destroy-rate + The average number of destroy operations per second. + kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+) + + + destroy-total + The total number of destroy operations called. + kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+) + + + forward-rate + The average rate of records being forwarded downstream, from source nodes only, per second. + kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+) + + + forward-total + The total number of of records being forwarded downstream, from source nodes only. + kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+) suppression-emit-rate - The rate at which records that have been emitted downstream from suppression operation nodes. - kafka.streams:type=stream-processor-node-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+) + + The rate at which records that have been emitted downstream from suppression operation nodes. + Compare with the process-rate metric to determine how many updates are being suppressed. + + kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+) suppression-emit-total - The total number of records that have been emitted downstream from suppression operation nodes. - kafka.streams:type=stream-processor-node-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+) + + The total number of records that have been emitted downstream from suppression operation nodes. + Compare with the process-total metric to determine how many updates are being suppressed. + + kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+) @@ -1887,7 +1942,6 @@
Suppression Buffer Metrics
+ All the following metrics have a recording level of debug: + + + + + + + - - - + + + - - - + + + + + + + + + + + + + + + + + +
suppression-buffer-size-currentThe current total size, in bytes, of the buffered data.kafka.streams:type=stream-buffer-metrics,client-id=([-.\w]+),task-id=([-.\w]+),buffer-id=([-.\w]+)
hit-ratio-minThe mininum cache hit ratio.kafka.streams:type=stream-record-cache-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),record-cache-id=([-.\w]+)suppression-buffer-size-avgThe average total size, in bytes, of the buffered data over the sampling window.kafka.streams:type=stream-buffer-metrics,client-id=([-.\w]+),task-id=([-.\w]+),buffer-id=([-.\w]+)
hit-ratio-maxThe maximum cache hit ratio.kafka.streams:type=stream-record-cache-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),record-cache-id=([-.\w]+)suppression-buffer-size-maxThe maximum total size, in bytes, of the buffered data over the sampling window.kafka.streams:type=stream-buffer-metrics,client-id=([-.\w]+),task-id=([-.\w]+),buffer-id=([-.\w]+)
suppression-buffer-count-currentThe current number of records buffered.kafka.streams:type=stream-buffer-metrics,client-id=([-.\w]+),task-id=([-.\w]+),buffer-id=([-.\w]+)
suppression-buffer-size-avgThe average number of records buffered over the sampling window.kafka.streams:type=stream-buffer-metrics,client-id=([-.\w]+),task-id=([-.\w]+),buffer-id=([-.\w]+)
suppression-buffer-size-maxThe maximum number of records buffered over the sampling window.kafka.streams:type=stream-buffer-metrics,client-id=([-.\w]+),task-id=([-.\w]+),buffer-id=([-.\w]+)
From 7ba7b1429190d1359e577f71388770877c98b2b0 Mon Sep 17 00:00:00 2001 From: Bruno Cadonna Date: Wed, 26 Feb 2020 19:55:47 +0100 Subject: [PATCH 3/4] Adapt docs about metrics of Streams according to KIP-444 --- docs/ops.html | 518 ++++++++++++++++++++------------------------------ 1 file changed, 202 insertions(+), 316 deletions(-) diff --git a/docs/ops.html b/docs/ops.html index 0fd0ba18f4a17..5f9f8f561ba0b 100644 --- a/docs/ops.html +++ b/docs/ops.html @@ -1600,9 +1600,8 @@

Connect Monitoring

Streams Monitoring

- A Kafka Streams instance contains all the producer and consumer metrics as well as additional metrics specific to streams. - By default Kafka Streams has metrics with two recording levels: debug and info. The debug level records all metrics, while - the info level records only the thread-level metrics. + A Kafka Streams instance contains all the producer and consumer metrics as well as additional metrics specific to Streams. + By default Kafka Streams has metrics with two recording levels: debug and info.

Note that the metrics have a 4-layer hierarchy. At the top level there are client-level metrics for each started @@ -1617,7 +1616,7 @@

Streams Mo
metrics.recording.level="info"
Client Metrics
-All the following metrics have a recording level of info: +All of the following metrics have a recording level of info: @@ -1654,7 +1653,7 @@
Thread Metrics
-All the following metrics have a recording level of info: +All of the following metrics have a recording level of info:
@@ -1662,121 +1661,112 @@
Task Metrics
-All the following metrics have a recording level of debug: +All of the following metrics have a recording level of debug, except for metrics +dropped-records-rate and dropped-records-total which have a recording level of info:
@@ -1784,41 +1774,83 @@
+ + + + + + + + + + + + + + + + + + + + - - + + - - + + - - + + - + - - + + - - + + + + + + + + + + + + + + + + + + + + + +
Description Mbean name
process-latency-avgThe average execution time in ns, for processing.kafka.streams:type=stream-task-metrics,thread-id=([-.\w]+),task-id=([-.\w]+)
process-latency-maxThe maximum execution time in ns, for processing.kafka.streams:type=stream-task-metrics,thread-id=([-.\w]+),task-id=([-.\w]+)
process-rateThe average number of processed records per second across all source processor nodes of this task.kafka.streams:type=stream-thread-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=all
process-totalThe total number of processed records across all source processor nodes of this task.kafka.streams:type=stream-thread-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=all
commit-latency-avgThe average commit time in ns for this task. kafka.streams:type=stream-task-metrics,client-id=([-.\w]+),task-id=([-.\w]+)The average execution time in ns, for committing.kafka.streams:type=stream-task-metrics,thread-id=([-.\w]+),task-id=([-.\w]+)
commit-latency-maxThe maximum commit time in ns for this task. kafka.streams:type=stream-task-metrics,client-id=([-.\w]+),task-id=([-.\w]+)The maximum execution time in ns, for committing.kafka.streams:type=stream-task-metrics,thread-id=([-.\w]+),task-id=([-.\w]+)
commit-rateThe average number of commit calls per second. kafka.streams:type=stream-task-metrics,client-id=([-.\w]+),task-id=([-.\w]+)The average number of commit calls per second.kafka.streams:type=stream-task-metrics,thread-id=([-.\w]+),task-id=([-.\w]+)
commit-total The total number of commit calls. kafka.streams:type=stream-task-metrics,client-id=([-.\w]+),task-id=([-.\w]+)kafka.streams:type=stream-task-metrics,thread-id=([-.\w]+),task-id=([-.\w]+)
record-lateness-avgThe average observed lateness of records.kafka.streams:type=stream-task-metrics,client-id=([-.\w]+),task-id=([-.\w]+)The average observed lateness of records (stream time - record timestamp).kafka.streams:type=stream-task-metrics,thread-id=([-.\w]+),task-id=([-.\w]+)
record-lateness-maxThe max observed lateness of records.kafka.streams:type=stream-task-metrics,client-id=([-.\w]+),task-id=([-.\w]+)The max observed lateness of records (stream time - record timestamp).kafka.streams:type=stream-task-metrics,thread-id=([-.\w]+),task-id=([-.\w]+)
enforced-processing-rateThe average number of enforced processings per second.kafka.streams:type=stream-task-metrics,thread-id=([-.\w]+),task-id=([-.\w]+)
enforced-processing-totalThe total number enforced processings.kafka.streams:type=stream-task-metrics,thread-id=([-.\w]+),task-id=([-.\w]+)
dropped-records-rateThe average number of records dropped within this task.kafka.streams:type=stream-task-metrics,thread-id=([-.\w]+),task-id=([-.\w]+)
dropped-records-totalThe total number of records dropped within this task.kafka.streams:type=stream-task-metrics,thread-id=([-.\w]+),task-id=([-.\w]+)
Processor Node Metrics
- All the following metrics have a recording level of debug: + The following metrics are only available on certain types of nodes, i.e., process-rate and process-total are + only available for source processor nodes and suppression-emit-rate and suppression-emit-total are only available + for suppression operation nodes. All of the metrics have a recording level of debug: @@ -1826,126 +1858,43 @@
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + - - + + - - + +
Description Mbean name
process-latency-avgThe average process execution time in ns. kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+)
process-latency-maxThe maximum process execution time in ns. kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+)
punctuate-latency-avgThe average punctuate execution time in ns. kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+)
punctuate-latency-maxThe maximum punctuate execution time in ns. kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+)
create-latency-avgThe average create execution time in ns. kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+)
create-latency-maxThe maximum create execution time in ns. kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+)
destroy-latency-avgThe average destroy execution time in ns. kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+)
destroy-latency-maxThe maximum destroy execution time in ns. kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+)
process-rateThe average number of process operations per second. kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+)The average number of records processed by a source processor node per second.kafka.streams:type=stream-processor-node-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+)
process-totalThe total number of process operations called. kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+)
punctuate-rateThe average number of punctuate operations per second. kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+)
punctuate-totalThe total number of punctuate operations called. kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+)
create-rateThe average number of create operations per second. kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+)
create-totalThe total number of create operations called. kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+)
destroy-rateThe average number of destroy operations per second. kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+)
destroy-totalThe total number of destroy operations called. kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+)
forward-rateThe average rate of records being forwarded downstream, from source nodes only, per second. kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+)
forward-totalThe total number of of records being forwarded downstream, from source nodes only. kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+)The total number of records processed by a source processor node per second.kafka.streams:type=stream-processor-node-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+)
suppression-emit-rate - The rate at which records that have been emitted downstream from suppression operation nodes. - Compare with the process-rate metric to determine how many updates are being suppressed. - kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+)The rate at which records that have been emitted downstream from suppression operation nodes.kafka.streams:type=stream-processor-node-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+)
suppression-emit-total - The total number of records that have been emitted downstream from suppression operation nodes. - Compare with the process-total metric to determine how many updates are being suppressed. - kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+)The total number of records that have been emitted downstream from suppression operation nodes.kafka.streams:type=stream-processor-node-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+)
State Store Metrics
- All the following metrics have a recording level of debug. Note that the store-scope value is specified in StoreSupplier#metricsScope() for user's customized + All of the following metrics have a recording level of debug. Note that the store-scope value is specified in StoreSupplier#metricsScope() for user's customized state stores; for built-in state stores, currently we have:
  • in-memory-state
  • in-memory-lru-state
  • in-memory-window-state
  • +
  • in-memory-suppression (for suppression buffers)
  • rocksdb-state (for RocksDB backed key-value store)
  • rocksdb-window-state (for RocksDB backed window store)
  • rocksdb-session-state (for RocksDB backed session store)
+ Metrics suppression-buffer-size-avg, suppression-buffer-size-max, suppression-buffer-count-avg, and suppression-buffer-count-max + are only available for suppression buffers. All other metrics are not available for suppression buffers. @@ -1957,188 +1906,163 @@
RocksDB Metrics
- All the following metrics have a recording level of debug. + All of the following metrics have a recording level of debug. The metrics are collected every minute from the RocksDB state stores. If a state store consists of multiple RocksDB instances as it is the case for aggregations over time and session windows, each metric reports an aggregation over the RocksDB instances of the state store. @@ -2159,88 +2083,88 @@
Record Cache Metrics
- All the following metrics have a recording level of debug: + All of the following metrics have a recording level of debug:
@@ -2250,57 +2174,19 @@
Suppression Buffer Metrics
- All the following metrics have a recording level of debug: - -
- - - - - - - - - - - - - - - - - - - - + + + - - - + + + - - - + + +
suppression-buffer-size-currentThe current total size, in bytes, of the buffered data.kafka.streams:type=stream-buffer-metrics,client-id=([-.\w]+),task-id=([-.\w]+),buffer-id=([-.\w]+)
suppression-buffer-size-avgThe average total size, in bytes, of the buffered data over the sampling window.kafka.streams:type=stream-buffer-metrics,client-id=([-.\w]+),task-id=([-.\w]+),buffer-id=([-.\w]+)
suppression-buffer-size-maxThe maximum total size, in bytes, of the buffered data over the sampling window.kafka.streams:type=stream-buffer-metrics,client-id=([-.\w]+),task-id=([-.\w]+),buffer-id=([-.\w]+)
suppression-buffer-count-currentThe current number of records buffered.kafka.streams:type=stream-buffer-metrics,client-id=([-.\w]+),task-id=([-.\w]+),buffer-id=([-.\w]+)hit-ratio-avgThe average cache hit ratio defined as the ratio of cache read hits over the total cache read requests.kafka.streams:type=stream-record-cache-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),record-cache-id=([-.\w]+)
suppression-buffer-size-avgThe average number of records buffered over the sampling window.kafka.streams:type=stream-buffer-metrics,client-id=([-.\w]+),task-id=([-.\w]+),buffer-id=([-.\w]+)hit-ratio-minThe mininum cache hit ratio.kafka.streams:type=stream-record-cache-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),record-cache-id=([-.\w]+)
suppression-buffer-size-maxThe maximum number of records buffered over the sampling window.kafka.streams:type=stream-buffer-metrics,client-id=([-.\w]+),task-id=([-.\w]+),buffer-id=([-.\w]+)hit-ratio-maxThe maximum cache hit ratio.kafka.streams:type=stream-record-cache-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),record-cache-id=([-.\w]+)
From 2f49ab241a5113d20eccedf3c098014218332965 Mon Sep 17 00:00:00 2001 From: Bruno Cadonna Date: Wed, 26 Feb 2020 20:18:58 +0100 Subject: [PATCH 4/4] Include feedback --- docs/ops.html | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/docs/ops.html b/docs/ops.html index 5f9f8f561ba0b..85699299c7c83 100644 --- a/docs/ops.html +++ b/docs/ops.html @@ -1673,12 +1673,12 @@
process-rate The average number of processed records per second across all source processor nodes of this task. - kafka.streams:type=stream-thread-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=all + kafka.streams:type=stream-task-metrics,thread-id=([-.\w]+),task-id=([-.\w]+) process-total The total number of processed records across all source processor nodes of this task. - kafka.streams:type=stream-thread-metrics,thread-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=all + kafka.streams:type=stream-task-metrics,thread-id=([-.\w]+),task-id=([-.\w]+) commit-latency-avg