Skip to content

Metrics

vnnv01 edited this page Mar 6, 2018 · 1 revision

Metrics

The term metrics is used to quantify a characteristic of the software. These metrics are generally used for many reasons, which includes performance measuring, performance tuning, monitoring and debugging. We use Prometheus to collect the metrics from monitored targets by scraping metrics HTTP endpoints on these targets at given intervals, evaluates rule expressions, displays the results, and can trigger alerts if some condition is observed to be true. Generally in Spark, MetricsSystem initializes the internal registries and counters. When created, MetricsSystem requests MetricsConfig to initialize. MetricsConfig uses metrics.properties as the default metrics configuration file that can however be changed using spark.metrics.conf property. The file is first loaded from the path directly before using Spark’s CLASSPATH. MetricsConfig lets you also configure the metrics configuration using spark.metrics.conf.-prefixed Spark properties.

Once we have the metrics created, we should serve them to prometheus. The prometheus basically needs some targets to scrape the metrics or data from. By default,Prometheus can be a target for itself and it can scrape its own metrics. The Prometheus Pushgateway allows you to push time series from short-lived service-level batch jobs to an intermediary job which Prometheus can scrape. Combined with Prometheus's simple text-based exposition format, this makes it easy to instrument even shell scripts without a client library.

We can serve the metrics to Prometheus once we have them created and ready at the target. In the Prometheus.yml file the job name is added as a label job=<job_name> to any timeseries scraped from this config. A scrape_config section specifies a set of targets and parameters describing how to scrape them. In the general case, one scrape configuration specifies a single job. In advanced configurations, this may change.

Targets may be statistically configured via the static_configs parameter or dynamically discovered using one of the supported service-discovery mechanisms. We need to specify the target information in static_configs section of the Prometheus.yml file. The urls for the target host should be specified in this section so the prometheus can scrape the metrics from the target. The default configuration for prometheus is available at prometheus.yml

The main class of metrics is Vertica Metrics. This class has four metrics namely ProducerHistogram, CopyDurationHistogram, CopyIncrementalCounter, RowsLoadedCouter. ProducerHistogram is used for tracking the time spent producing data to kafka within the vertica sink, where the CopyDurationHistogram tracks time spent executing copy commands within the vertica sink. The CopyIncrementalCounter tracks number of incremental copy commands triggered per micro-batch. The number of rows loaded per copy are tracked by RowsLoadedCouter. Some examples of metrics used in the VerticaSink are

  • kafkaTimer
  • copyTimer

KafkaTimer is a metric that starts the timer that takes in ProducerHistogram from VerticaMetrics to measure the time taken for data production to kafka.

copyTimer uses CopyDurationHistogram from the same VerticaMetrics to find the time spent on the copy commands execution. These timers are used to help the users know about the time taken for purging the unwanted batches.

Sample Prometheus Metrics

Metrics are captured from PSTL SparkListener. Spark SQL provides SQL Metrics for each operator within a stage. If we look at a simple SQL query: SELECT a, b FROM table WHERE c > 10 We have a SCAN->FILTER->PROJECT Each of these operators provide tuple level metrics like num bytes read / written, num input / output rows. For serving these captured metrics to Prometheus, we need to specify the target host in the yml file of the prometheus. The default port number for prometheus is localhost:9090. For more information, follow Prometheus

We can get the Prometheus metrics served to Grafana also. For more information Prometheus-Grafana