-
Notifications
You must be signed in to change notification settings - Fork 6
Metrics
The term metrics
is used to quantify a characteristic of the software. These metrics are generally used for many reasons, which includes performance measuring, performance tuning, monitoring and debugging.
We use Prometheus
to collect the metrics from monitored targets by scraping metrics HTTP endpoints on these targets at given intervals, evaluates rule expressions, displays the results, and can trigger alerts if some condition is observed to be true.
Generally in Spark, MetricsSystem initializes the internal registries and counters. When created, MetricsSystem requests MetricsConfig
to initialize. MetricsConfig
uses metrics.properties
as the default metrics configuration file that can however be changed using spark.metrics.conf property. The file is first loaded from the path directly before using Spark’s CLASSPATH
.
MetricsConfig lets you also configure the metrics configuration using spark.metrics.conf.-prefixed Spark properties.
Once we have the metrics created, we should serve them to prometheus
. The prometheus basically needs some targets to scrape the metrics or data from. By default,Prometheus
can be a target for itself and it can scrape its own metrics.
The Prometheus Pushgateway
allows you to push time series from short-lived service-level batch jobs to an intermediary job which Prometheus can scrape. Combined with Prometheus's simple text-based exposition format, this makes it easy to instrument even shell scripts without a client library.
We can serve the metrics to Prometheus
once we have them created and ready at the target. In the Prometheus.yml
file the job name is added as a label job=<job_name>
to any timeseries scraped from this config. A scrape_config
section specifies a set of targets and parameters describing how to scrape them. In the general case, one scrape configuration specifies a single job. In advanced configurations, this may change.
Targets may be statistically configured via the static_configs parameter or dynamically discovered using one of the supported service-discovery mechanisms. We need to specify the target information in static_configs
section of the Prometheus.yml
file. The urls for the target
host should be specified in this section so the prometheus can scrape the metrics from the target. The default configuration for prometheus is available at prometheus.yml
The main class of metrics is Vertica Metrics
. This class has four metrics namely ProducerHistogram
, CopyDurationHistogram
, CopyIncrementalCounter
, RowsLoadedCouter
.
ProducerHistogram
is used for tracking the time spent producing data to kafka within the vertica sink, where the CopyDurationHistogram
tracks time spent executing copy commands within the vertica sink.
The CopyIncrementalCounter
tracks number of incremental copy commands triggered per micro-batch. The number of rows loaded per copy are tracked by RowsLoadedCouter
.
Some examples of metrics used in the VerticaSink
are
- kafkaTimer
- copyTimer
KafkaTimer
is a metric that starts the timer that takes in ProducerHistogram
from VerticaMetrics
to measure the time taken for data production to kafka.
copyTimer
uses CopyDurationHistogram
from the same VerticaMetrics
to find the time spent on the copy commands execution.
These timers are used to help the users know about the time taken for purging the unwanted batches.
Metrics are captured from PSTL SparkListener
. Spark SQL provides SQL Metrics for each operator within a stage. If we look at a simple SQL query:
SELECT a, b FROM table WHERE c > 10
We have a SCAN->FILTER->PROJECT
Each of these operators provide tuple level metrics like num bytes read / written, num input / output rows.
For serving these captured metrics to Prometheus, we need to specify the target host in the yml file of the prometheus. The default port number for prometheus is localhost:9090
. For more information, follow Prometheus
We can get the Prometheus
metrics served to Grafana
also. For more information Prometheus-Grafana