Skip to content

Commit

Permalink
alerts: add replication not running alert
Browse files Browse the repository at this point in the history
Part of #133
  • Loading branch information
DifferentialOrange committed May 5, 2022
1 parent 449f46a commit ca45b23
Show file tree
Hide file tree
Showing 3 changed files with 34 additions and 1 deletion.
2 changes: 1 addition & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Net memory and new binary connections panels
- Vinyl index and bloom filter memory panels
- Clock delta panel
- Replication status panel
- Replication status panel and alert example

### Changed
- Rework "Tarantool memory memory miscellaneous" section to "Tarantool runtime overview"
Expand Down
12 changes: 12 additions & 0 deletions example_cluster/prometheus/alerts.yml
Original file line number Diff line number Diff line change
Expand Up @@ -173,6 +173,18 @@ groups:
description: "Instance '{{ $labels.alias }}' of job '{{ $labels.job }}' event loop has high cycle duration.
Some high loaded fiber has too little yields. It may be the reason of 'Too long WAL write' warnings."

# Alert for Tarantool replication not running.
- alert: ReplicationNotRunning
expr: tnt_replication_status == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Instance '{{ $labels.alias }}' ('{{ $labels.job }}') {{ $labels.stream }} (id {{ $labels.id }})
replication is not running"
description: "Instance '{{ $labels.alias }}' ('{{ $labels.job }}') {{ $labels.stream }} (id {{ $labels.id }})
replication is not running. Check Cartridge UI for details."

- name: tarantool-business
rules:
# Warning for any endpoint of an instance in tarantool_app job that responds too long.
Expand Down
21 changes: 21 additions & 0 deletions example_cluster/prometheus/test_alerts.yml
Original file line number Diff line number Diff line change
Expand Up @@ -387,6 +387,27 @@ tests:
Some high loaded fiber has too little yields. It may be the reason of 'Too long WAL write' warnings."


- interval: 15s
input_series:
- series: tnt_replication_status{job="tarantool_app", instance="app:8081", alias="tnt_storage_master", id="1", stream="upstream"}
values: '1+0x3 0+0x10'
alert_rule_test:
- eval_time: 2m
alertname: ReplicationNotRunning
exp_alerts:
- exp_labels:
severity: critical
instance: app:8081
alias: tnt_storage_master
job: tarantool_app
id: "1"
stream: upstream
exp_annotations:
summary: "Instance 'tnt_storage_master' ('tarantool_app') upstream (id 1) replication is not running"
description: "Instance 'tnt_storage_master' ('tarantool_app') upstream (id 1) replication is
not running. Check Cartridge UI for details."


- interval: 15s
input_series:
- series: http_server_request_latency_count{job="tarantool_app",instance="app:8081",path="/hello",method="GET",status="200",alias="tnt_router"}
Expand Down

0 comments on commit ca45b23

Please sign in to comment.