-
Notifications
You must be signed in to change notification settings - Fork 89
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Browse files
Browse the repository at this point in the history
* Use uppercase * remove outdated link * help users by linking the two topics * roma's feedback * draft * + list of templates and linked from related topics * added topic to ToC * improved descriptions * capitalization in ToC * bolded template names * reorganized table * reorganized table * feedback from Roma * Update docs/get-started/alert-templates.md Co-authored-by: Roman Novikov <[email protected]> * Update docs/get-started/alert-templates.md Co-authored-by: Roman Novikov <[email protected]> * moved templates info inside Alerting topic * undo commit * moved topic * Updated ToC * Fixed build * fix build * removed duplicate content --------- Co-authored-by: Catalina A <[email protected]> Co-authored-by: Roman Novikov <[email protected]>
- Loading branch information
1 parent
3651569
commit 0db9283
Showing
5 changed files
with
93 additions
and
21 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,61 @@ | ||
# PMM alert templates | ||
|
||
Alert templates provide a set of common events and expressions for alerting, serving as a foundation for creating alert rules. | ||
|
||
Percona Monitoring and Management (PMM) offers three categories of alert templates to enhance database performance monitoring: | ||
|
||
1. **Built-in templates**: templates that are available out-of-the-box with the PMM installation and are available to all PMM users. | ||
2. **Percona Platform templates**: additional templates dynamically delivered to PMM if the instance is [connected to Percona Platform](../how-to/integrate-platform.md) using a Percona Account. | ||
When connected to the Platform, PMM automatically downloads these templates if the **Telemetry** option is enabled under **Configuration > Settings > Advanced Settings**. | ||
3. **Custom templates**: user-created templates for specific needs not met by built-in or Percona Platform templates. These allow you to tailor alerts to your unique environment and requirements. | ||
For details on creating custom templates, see [Percona Alerting](../get-started/alerting.md#configure-alert-templates). | ||
|
||
## Accessing alert templates | ||
|
||
To check the alert templates for your PMM instance, go to PMM > **Alerting > Alert Rule Templates** tab. | ||
|
||
## Available alert template | ||
|
||
The table below lists all the alert templates available in Percona Monitoring and Management (PMM). | ||
|
||
This list includes both built-in templates (accessible to all PMM users), and customer-only templates. | ||
|
||
To access the customer-only templates, you must be a Percona customer and [connect PMM to Percona Platform](../how-to/integrate-platform.md) using a Percona Account. | ||
|
||
|
||
| Template name | Description | Availability | Database technology | | ||
| :------------ | :---------- | :----------- | :------------------ | | ||
| **Node high CPU load** | Monitors node CPU usage and alerts when it surpasses 80% (default threshold). Provides details about specific nodes experiencing high CPU load, indicating potential performance issues or scaling needs. | All users | MySQL, MongoDB, PostgreSQL | | ||
| **Memory available less than a threshold** | Tracks available memory on nodes and alerts when free memory drops below 20% (default threshold). Helps prevent system instability due to memory constraints. | All users | MySQL, MongoDB, PostgreSQL | | ||
| **Node high swap filling up** | Monitors node swap usage and alerts when it exceeds 80% (default threshold). Indicates potential memory pressure and performance degradation, allowing for timely intervention. | All users | MySQL, MongoDB, PostgreSQL | | ||
| **PMM agent down** | Monitors PMM Agent status and alerts when an agent becomes unreachable, indicating potential host or agent issues. | All users | MySQL, MongoDB, PostgreSQL, ProxySQL | | ||
| **Backup failed [Technical Preview]** | Monitors backup processes and alerts on failures, providing details about the failed backup artifact and service. Helps maintain data safety and recovery readiness. This template is currently in Technical Preview status and should be used for testing purposes only as it is subject to change. | All users | MySQL, MongoDB, PostgreSQL, ProxySQL | | ||
| **MongoDB down** | Detects when a MongoDB instance becomes unavailable, enabling rapid response to maintain database accessibility. | All users | MongoDB | | ||
| **Memory used by MongoDB connections** | Tracks MongoDB connection memory usage and alerts when it exceeds configurable thresholds. Helps identify and address potential performance issues caused by high memory consumption. | All users | MongoDB | | ||
| **Memory used by MongoDB** | Monitors overall MongoDB memory usage and alerts when it exceeds 80% of total system memory. Provides details about specific MongoDB services and nodes experiencing high memory consumption, aiding in resource optimization. | All users | MongoDB | | ||
| **MongoDB restarted** | Detects recent MongoDB restarts, alerting if an instance has been restarted within the last 5 minutes (default threshold). Facilitates investigation of unexpected downtime and potential issues. | All users | MongoDB | | ||
| **MongoDB DBPath disk space utilization** | Monitors disk space usage in MongoDB's data directory and alerts when it exceeds set thresholds. Helps prevent storage-related issues and ensures adequate space for database operations. | Customer-only | MongoDB | | ||
| **MongoDB host SSL certificate expiry** | Tracks SSL certificate expiration dates for MongoDB hosts and alerts when certificates are approaching expiry. Enables timely certificate renewal to maintain secure connections. | Customer-only | MongoDB | | ||
| **MongoDB oplog window** | Monitors the oplog window size and alerts when it falls below the recommended threshold (typically 24-48 hours). Ensures sufficient time for secondary nodes to replicate data and maintain cluster consistency. | Customer-only | MongoDB | | ||
| **MongoDB read tickets** | Tracks read ticket availability in the WiredTiger storage engine and alerts when it falls below set thresholds. Helps optimize read performance and identify potential bottlenecks. | Customer-only | MongoDB | | ||
| **MongoDB replication lag is high** | Monitors replication lag and alerts when it exceeds acceptable thresholds. Crucial for maintaining data consistency across replicas and identifying synchronization issues. | Customer-only | MongoDB | | ||
| **MongoDB ReplicaSet has no primary** | Detects when a replica set loses its primary node and alerts users. Indicates that the cluster is in read-only mode, potentially affecting write operations and overall database functionality. | Customer-only | MongoDB | | ||
| **MongoDB member is in unusual state** | Identifies and alerts when replica set members enter unusual states such as Recovering, Startup, or Rollback. Helps maintain cluster health and performance by enabling quick intervention. | Customer-only | MongoDB | | ||
| **MongoDB write tickets** | Monitors write ticket availability in the WiredTiger storage engine and alerts when it falls below set thresholds. Aids in optimizing write performance and identifying potential bottlenecks. | Customer-only | MongoDB | | ||
| **MySQL down** | Monitors MySQL instance availability and alerts when any MySQL service becomes unreachable. Enables quick response to maintain database services. | All users | MySQL | | ||
| **MySQL replication running IO** | Tracks MySQL replication I/O thread status and alerts if it stops running on a replica. Crucial for ensuring data is being received from the primary server. | All users | MySQL | | ||
| **MySQL replication running SQL** | Monitors MySQL replication SQL thread status and alerts if it stops running on a replica. Essential for verifying that received data is being applied correctly to maintain data consistency. | All users | MySQL | | ||
| **MySQL restarted** | Detects recent MySQL restarts, alerting if an instance has been restarted within the last 5 minutes (default threshold). Aids in investigating unexpected downtime and potential issues. | All users | MySQL | | ||
| **MySQL connections in use** | Tracks MySQL connection usage and alerts when the percentage of active connections exceeds 80% of the maximum allowed (default threshold). Helps prevent performance degradation due to connection overload. | All users | MySQL | | ||
| **PostgreSQL down** | Detects when PostgreSQL instances become unavailable, enabling quick response to maintain database services. Provides details about affected services and nodes. | All users | PostgreSQL | | ||
| **PostgreSQL restarted** | Identifies recent PostgreSQL restarts, alerting if an instance has been restarted within the last 5 minutes (default threshold). Aids in investigating unexpected downtime and potential issues. | All users | PostgreSQL | | ||
| **PostgreSQL connections in use** | Monitors PostgreSQL connection usage and alerts when the percentage of active connections exceeds 80% of the maximum allowed (default threshold). Helps prevent performance degradation due to excessive connections. | All users | PostgreSQL | | ||
| **PostgreSQL index bloat is high** | Detects excessive index bloat and alerts users. Helps identify performance degradation due to bloated indexes, enabling timely maintenance to improve query performance. | Customer-only | PostgreSQL | | ||
| **PostgreSQL high number of dead tuples** | Monitors the accumulation of dead tuples in relations and alerts when they exceed set thresholds. Indicates potential issues with vacuum settings and helps optimize storage and query performance. | Customer-only | PostgreSQL | | ||
| **PostgreSQL has a high number of statement timeouts** | Tracks and alerts on frequent query cancellations due to statement timeouts. Helps identify various issues such as high load, poorly written queries, or inadequate resource allocation. | Customer-only | PostgreSQL | | ||
| **PostgreSQL table bloat is high** | Detects excessive table bloat and alerts users. Indicates a need to adjust vacuum settings for specific relations or globally, helping to maintain optimal query performance and storage efficiency. | Customer-only | PostgreSQL | | ||
| **PostgreSQL high rate of transaction rollbacks** | Monitors the ratio of transaction rollbacks to commits and alerts on high rates. Helps identify potential application or database issues leading to frequent transaction failures. | Customer-only | PostgreSQL | | ||
| **PostgreSQL tables not auto analyzed** | Identifies tables that are not being auto-analyzed and alerts users. Crucial for maintaining accurate statistics and generating proper query execution plans. | Customer-only | PostgreSQL | | ||
| **PostgreSQL tables not auto vacuumed** | Detects tables that are not being auto-vacuumed and alerts users. Essential for managing bloat, optimizing storage, and maintaining overall database health. | Customer-only | PostgreSQL | | ||
| **PostgreSQL unused replication slot** | Identifies and alerts on unused replication slots. Helps prevent excessive WAL retention and potential disk space issues, especially when replicas are offline. | Customer-only | PostgreSQL | | ||
| **ProxySQL server status** | Tracks ProxySQL server status and alerts when a server's status becomes OFFLINE_SOFT (3) or OFFLINE_HARD (4). Provides details about the server's endpoint, hostgroup, and associated ProxySQL service. Crucial for maintaining high availability and preventing service disruptions. | All users | ProxySQL | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,8 +1,8 @@ | ||
# Percona Alerting | ||
# About Percona Alerting | ||
|
||
!!! alert alert-info "" | ||
Percona Alerting is the new Alerting feature introduced in PMM 2.31. This replaces the Integrated Alerting feature available in previous versions. | ||
|
||
Alerting notifies of important or unusual activity in your database environments so that you can identify and resolve problems quickly. When something needs your attention, Percona Alerting can be configured to automatically send you a notification through your specified contact points. | ||
|
||
PMM 2.31 introduced Percona Alerting which replaces Integrated Alerting in previous PMM versions. In addition to full feature parity, Percona Alerting includes additional benefits like Grafana-based alert rules and a unified, easy-to-use alerting command center on the **Alerting** page. | ||
|
@@ -16,8 +16,11 @@ Percona Alerting is powered by Grafana infrastructure. It leverages Grafana's ad | |
Depending on the datasources that you want to query, and the complexity of your required evaluation criteria, Percona Alerting enables you to create the following types of alerts: | ||
|
||
- **Percona templated alerts**: alerts based on a set of Percona-supplied templates with common events and expressions for alerting. | ||
If you need custom expressions on which to base your alert rules, you can also create your own templates. | ||
If you need custom expressions on which to base your alert rules, you can also create your own templates. To see the complete list of available templates, see the [PMM Alert Templates topic](../get-started/alert-templates.md) | ||
|
||
- **Grafana managed alerts**: alerts that handle complex conditions and can span multiple different data sources like SQL, Prometheus, InfluxDB, etc. These alerts are stored and executed by Grafana. | ||
|
||
|
||
<!--- we dont use support these for now so commenting them out | ||
- **Mimir or Loki alerts**: alerts that consist of one single query, written in PromQL or LogQL. The alert rules are stored and executed on the Mimir or Loki ruler and are completely decoupled from the PMM and Grafana runtime. | ||
|
@@ -32,7 +35,7 @@ The Alerting page contains are split into eight tabs: Fired Alerts, Alert Rules, | |
|
||
## Alert rules | ||
|
||
Alert rules describe the circumstances under which you want to be alerted. The evaluation criteria that you define determine whether an alert will fire. | ||
Alert rules describe the circumstances under which you want to be alerted. The evaluation criteria that you define determine whether an alert will fire. | ||
|
||
An alert rule consists of one or more queries and expressions, a condition, the frequency of evaluation, and the duration over which the condition is met. For example, you might configure an alert to fire and trigger a notification when MongoDB is down. | ||
|
||
|
@@ -48,13 +51,19 @@ It takes at least one evaluation cycle for an alert rule to transition from one | |
|
||
## Alert rules templates | ||
|
||
PMM provides a set of Alert Rule templates with common events and expressions for alerting. These templates can be used as a basis for creating Alert Rules. You can also create your own templates if you need custom expressions. | ||
PMM provides a set of alert rule templates with common events and expressions for alerting. These templates can be used as a basis for creating alert rules. | ||
|
||
You can check the alert templates available for your account under **Alerting > Alert rule templates** tab. PMM lists here the following types of templates: | ||
Percona Monitoring and Management (PMM) offers three categories of alert templates to enhance database performance monitoring: | ||
|
||
- Built-in templates, available out-of-the-box with PMM. | ||
- Templates downloaded from Percona Platform. | ||
- Custom templates created or uploaded on the **Alerting page > Alert Templates** tab. You can also store your custom template files in your ``/srv/alerting/templates`` directory and PMM will load them during startup. | ||
- Additional templates available after connecting PMM with Percona Platform. See [Integrate PMM with Percona Platform](../how-to/integrate-platform.md). | ||
- Custom templates created or uploaded on the **Alerting** page > **Alert Templates** tab. You can also store your custom template files in your */srv/alerting/templates* directory and PMM will load them during startup. | ||
|
||
### Accessing alert rule templates | ||
|
||
To see the full list of the alert templates that PMM offers, see [PMM alert templates](../get-started/alert-templates.md). | ||
|
||
You can check the alert templates available for your account under **Alerting > Alert rule templates** tab. | ||
|
||
### Create alert rules from alert rule templates | ||
|
||
|
@@ -153,7 +162,7 @@ After provisioning the resources required for creating Percona templated alerts, | |
2. On the **Create alert rule** page, select the **Percona templated alert** option. If you want to learn about creating Grafana alerts instead, check our [Grafana's documentation](https://grafana.com/docs/grafana/latest/alerting/). | ||
3. In the **Template details** section, choose the template on which you want to base the new alert rule. This automatically populates the **Name**, **Duration**, and **Severity** fields with information from the template. You can change these values if you want to override the default specifications in the template. | ||
4. In the **Filters** field, specify if you want the alert rule to apply only to specific services or nodes. For example: `service_name=ps5.7`. When creating alert rule filters, consider the following: | ||
|
||
- Filters use conjunction semantics. This means that if you add more than one filter, PMM will combine their conditions to search for matches: filter 1 AND filter 2 AND filter 3. | ||
- **Label** must be an exact match. You can find a complete list of labels using the <i class="uil uil-compass"></i> **Explore** menu in PMM. | ||
|
||
|
@@ -190,8 +199,8 @@ To use SMTP with a PMM Docker installation: | |
[email protected] | ||
GF_SMTP_FROM_NAME=Percona Alerting | ||
``` | ||
Below is a summary of each environment variable above: | ||
- `GF_SMTP_ENABLED`: When true, enables Grafana to send emails. | ||
Below is a summary of each environment variable above: | ||
- `GF_SMTP_ENABLED`: When true, enables Grafana to send emails. | ||
- `GF_SMTP_HOST`: Host address of your SMTP server. | ||
- `GF_SMTP_USER`: Username for SMTP authentication. | ||
- `GF_SMTP_PASSWORD`: Password for SMTP authentication | ||
|
@@ -201,7 +210,7 @@ To use SMTP with a PMM Docker installation: | |
|
||
*NB: If you are using your Gmail’s SMTP credentials as shown above, you will have to generate an app password and fill it in as the value of your $GF_SMTP_PASSWORD variable.* | ||
|
||
2. Pass in the `.env` file to Docker run using the `--env-file` flag: | ||
2. Pass in the `.env` file to Docker run using the `--env-file` flag: | ||
``` | ||
docker run --env-file=.env -p 443:443 -p 80:80 percona/pmm-server:2 | ||
``` | ||
|
@@ -278,13 +287,13 @@ This can be useful, for example, when you want to send notifications to a catch- | |
6. Toggle **Override grouping** if you do not want to use root policy grouping. | ||
7. Toggle **Override general timings** to specify how often you want to wait until the initial notification is sent for a new group. When this is disabled, PMM uses root policy group timings instead. | ||
8. Add a mute timing if you want to mute notifications or this policy for a specific, regular interval. For example, you can create a mute to suppress trivial notifications during weekends. Mute timings are different from silences in the sense that they are recurring, while silences have a fixed start and end time. | ||
|
||
!!! caution alert alert-warning "Important" | ||
Time specified in mute timing must be in UTC and military format i.e. 14:00 not 2:00 PM. | ||
|
||
## Silence alerts | ||
Create a silence when you want to suppress/stop alerts and their associated notifications for a very specific amount of time. | ||
Silences default to today’s current date and have a default duration of two hours. | ||
Create a silence when you want to suppress/stop alerts and their associated notifications for a very specific amount of time. | ||
Silences default to today’s current date and have a default duration of two hours. | ||
|
||
You can also schedule a silence for a future date and time. This is referred to as a `Pending` silence, which can be observed on the Silences page. | ||
|
||
|
@@ -340,7 +349,7 @@ After upgrading to PMM 2.31, make sure to manually migrate any alert rules that | |
##### Script commands | ||
|
||
The default command for migrating rules is: | ||
```yaml | ||
```yaml | ||
python3 ia_migration.py -u admin -p admin | ||
``` | ||
To see all the available options, check the scrip help using `ia_migration.py -h` | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.