Skip to content

Commit

Permalink
tiflash, metric: add alert for TiFlash down (#6590)
Browse files Browse the repository at this point in the history
  • Loading branch information
shichun-0415 authored Oct 9, 2021
1 parent 8efd451 commit 7534088
Show file tree
Hide file tree
Showing 2 changed files with 22 additions and 2 deletions.
22 changes: 21 additions & 1 deletion alert-rules.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ aliases: ['/docs/dev/alert-rules/','/docs/dev/reference/alert-rules/']

# TiDB Cluster Alert Rules

This document describes the alert rules for different components in a TiDB cluster, including the rule descriptions and solutions of the alert items in TiDB, TiKV, PD, TiDB Binlog, Node_exporter and Blackbox_exporter.
This document describes the alert rules for different components in a TiDB cluster, including the rule descriptions and solutions of the alert items in TiDB, TiKV, PD, TiFlash, TiDB Binlog, Node_exporter and Blackbox_exporter.

According to the severity level, alert rules are divided into three categories (from high to low): emergency-level, critical-level, and warning-level. This division of severity levels applies to all alert items of each component below.

Expand Down Expand Up @@ -781,6 +781,10 @@ This section gives the alert rules for the TiKV component.

The speed of splitting Regions is slower than the write speed. To alleviate this issue, you’d better update TiDB to a version that supports batch-split (>= 2.1.0-rc1). If it is not possible to update temporarily, you can use `pd-ctl operator add split-region <region_id> --policy=approximate` to manually split Regions.

## TiFlash alert rules

For the detailed descriptions of TiFlash alert rules, see [TiFlash Alert Rules](/tiflash/tiflash-alert-rules.md).

## TiDB Binlog alert rules

For the detailed descriptions of TiDB Binlog alert rules, see [TiDB Binlog monitoring document](/tidb-binlog/monitor-tidb-binlog-cluster.md#alert-rules).
Expand Down Expand Up @@ -954,6 +958,22 @@ This section gives the alert rules for the Blackbox_exporter TCP, ICMP, and HTTP
* Check whether the TiDB process exists.
* Check whether the network between the monitoring machine and the TiDB machine is normal.

#### `TiFlash_server_is_down`

* Alert rule:

`probe_success{group="tiflash"} == 0`

* Description:

Failure to probe the TiFlash service port.

* Solution:

* Check whether the machine that provides the TiFlash service is down.
* Check whether the TiFlash process exists.
* Check whether the network between the monitoring machine and the TiFlash machine is normal.

#### `Pump_server_is_down`

* Alert rule:
Expand Down
2 changes: 1 addition & 1 deletion tiflash/tiflash-alert-rules.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ This document introduces the alert rules of the TiFlash cluster.

- Solution:

It might be caused by the internal problems of the TiFlash TMT engine. Contact [TiFlash R&D](mailto:[email protected]) for support.
It might be caused by the internal problems of the TiFlash storage engine. Contact [TiFlash R&D](mailto:[email protected]) for support.

## `TiFlash_raft_read_index_duration`

Expand Down

0 comments on commit 7534088

Please sign in to comment.