New Monitoring

New Roles

DeepSea now knows the prometheus and grafana roles and deploys the monitoring stack accordingly. This allows for e.g. multiple prometheus instances for an HA setup. The prometheus role also includes the alertmanager.

There is also code to remove Prometheus and Grafana from nodes that do not have the respective roles. If you have a Prometheus/Grafana installation that is managed outside of DeepSea on DeepSea minions, make sure to add rescind-[prometheus|grafana|alertmanager]: default-nop to your pillar, otherwise DeepSea will remove your installation.

Pillar variables

Default below pillar is available to all nodes by default. Either alter the global in /srv/pillar/ceph/stack/global.yml or alter /srv/pillar/ceph/stack/<cluster_name>/minions/<host> if specific minion configs should be altered. Refer to stack pillar doc

monitoring:
  alertmanager:
    config: salt://path/to/config
    additional_flags: ''
  prometheus:
    rule_files: []
    scrape_interval:
      ceph: 10
      node_exporter: 10
      prometheus: 10
      grafana: 10
    relabel_config:
      ceph: ''
      node_exporter: ''
      prometheus: ''
      grafana: ''
    metric_relabel_config:
      ceph: ''
      node_exporter: ''
      prometheus: ''
      grafana: ''
    target_partition:
      ceph: '1/1'
      node_exporter: '1/1'
      prometheus: '1/1'
      grafana: '1/1'

Prometheus

scrape_interval: change the scrape interval for various scrape target groups (these groups map to exporters that provide data)
target_partition: When multiple prometheus instances are deployed it can be desired to partition scrape targets and have some instances scrape only part of all exporter instances (currently this is only implemented for node_exporter targets). Say there're two prometheus instances and the available node_exporter targets should be divided between them. Configure the pillar so that on instance sees monitoring:prometheus:target_partition:node_exporter:'1/2' while the other sees monitoring:prometheus:target_partition:node_exporter:'2/2. A prometheus instance seeing 0/X in its pillar will remove all scrape targets of that kind.

Alertmanager

config: As the alertmanager config contains only user specific configuration, we rely on the user to provide a alertmanager config in the pillar. The location of the file should be accessible by salts salt:// file server url, for instance srv/salt/ceph/monitoring/alertmanager/files/myconfig.yml would translate to salt://ceph/monitoring/alertmanager/files/myconfig.yml as the pillar content. DeepSea will then take this file and deploy it. If the pillar variable is not set, DeepSea will only ensure that there is a file. That can either be the default config file installed by the rpm or a user managed file.
additional_flags: DeepSea will create the needed --cluster.peer flags for a highly available alertmanager setup (if more then one node has the prometheus role). If you want to pass additional flags (see prometheus-alertmanager --help for available flags), list them as a spaces-separated string in this pillar variable.