Monitoring Vault with Prometheus

A Vault node exposes telemetry information that can be used to monitor and alert on the health and performance of a Vault cluster.

How the Vault Metrics are Exposed

By default the Vault operator will configure each vault pod to publish statsd metrics.

The Vault operator runs a statsd-exporter container inside each Vault pod to convert and expose those metrics in the format for Prometheus.

curl the /metrics endpoint on port 9102 for any vault pod to get the Prometheus metrics:

$ VPOD=$(kubectl -n default get vault example -o jsonpath='{.status.vaultStatus.active}')
$ kubectl -n default exec -ti ${VPOD} --container=vault -- curl localhost:9102/metrics
# HELP vault_core_unseal Metric autogenerated by statsd_exporter.
# TYPE vault_core_unseal summary
vault_core_unseal{quantile="0.5"} NaN
vault_core_unseal{quantile="0.9"} NaN
vault_core_unseal{quantile="0.99"} NaN
vault_core_unseal_sum 2.077112
vault_core_unseal_count 1
. . .

Consuming the Metrics

The Vault operator also creates a service with the same name as the Vault cluster that exposes the /metrics endpoint for the Vault nodes via the prometheus port. So for a Vault cluster named example the following service exists:

$ kubectl -n default get service example -o yaml

apiVersion: v1
kind: Service
metadata:
  labels:
    app: vault
    vault_cluster: example
  name: example
  namespace: default
  ...
spec:
  ports:
  - name: vault-client
    port: 8200
    protocol: TCP
    targetPort: 8200
  - name: vault-cluster
    port: 8201
    protocol: TCP
    targetPort: 8201
  - name: prometheus
    port: 9102
    protocol: TCP
    targetPort: 9102
  selector:
    app: vault
    vault_cluster: example
  type: ClusterIP
  ...

The above service can be scraped to consume the Prometheus metrics for the Vault cluster.

Consult the Prometheus operator docs on how to setup and configure Prometheus with a ServiceMonitor to consume the metrics for a target service.

A ServiceMonitor with the following spec can be created to describe the above Vault service as target for Prometheus.

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  ...
spec:
  selector:
    matchLabels:
      app: vault
      vault_cluster: example
  namespaceSelector:
    matchNames:
      - default
  endpoints:
    - interval: 30s
      path: /metrics
      port: prometheus

Alerting Rules

The following alert rules for some key metrics are provided as a guide for the best practice of alerting on Vault metrics.

vault_core_leadership_lost_count
vault_core_step_down_count
vault_core_leadership_setup_failed

The sample alert rules assume Prometheus is configured to monitor a Vault service named example.

alert: VaultLeadershipLoss
expr: sum(increase(vault_core_leadership_lost_count{job="example"}[1h])) > 5
for: 1m
labels:
 severity: critical
annotations:
 summary: High frequency of Vault leadership losses
 description: There have been more than 5 Vault leadership losses in the past 1h

alert: VaultLeadershipStepDowns
expr: sum(increase(vault_core_step_down_count{job="example"}[1h])) > 5
for: 1m
labels:
 severity: critical
annotations:
 summary: High frequency of Vault leadership step downs
 description: There have been more than 5 Vault leadership step downs in the past 1h

alert: VaultLeadershipSetupFailures
expr: sum(increase(vault_core_leadership_setup_failed{job="example"}[1h])) > 5
for: 1m
labels:
 severity: critical
annotations:
 summary: High frequency of Vault leadership setup failures
 description: There have been more than 5 Vault leadership setup failures in the past 1h

The above queries and parameters of the alert rules should be tuned for your particular use case. Read more on Prometheus queries and alerting rules to learn how to write the alerting rules as needed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

monitoring.md

monitoring.md

Monitoring Vault with Prometheus

How the Vault Metrics are Exposed

Consuming the Metrics

Alerting Rules

Files

monitoring.md

Latest commit

History

monitoring.md

File metadata and controls

Monitoring Vault with Prometheus

How the Vault Metrics are Exposed

Consuming the Metrics

Alerting Rules