Skip to content
This repository has been archived by the owner on Oct 5, 2023. It is now read-only.

AlertsDetails

github-actions edited this page Jul 10, 2023 · 14 revisions

Metric Alerts Details

The following metric alerts have been defined and can deployed within your landing zones via Azure Policy.

The resources, metric alerts and their settings provide you with a starting point to help you address the following monitoring questions: "What should we monitor in Azure?" and "What alert settings should we use?" While they are opinionated settings and they are meant to cover the most common Azure Landing Zone components, we encourage you to adjust these settings to suit your monitoring needs based on how you're using Azure.

If you have suggestions for other resources that should be included please open an Issue on this page providing the Azure resource provider and settings you'd like implemented, we can't promise to implement them all but we will look into it. Or if you'd like to contribute directly, follow the steps on how to contribute here.

Metric Alerts Settings

The values shown for Aggregation, Operator, Threshold, WindowSize, Frequency and Severity have been derived from field experience and what customers have implemented themselves; Alerts are based on Microsoft public guidance where available (indicated by a 'Yes' in the Verified column), and on practical application experience where public guidance is not available (indicated by a 'No' in the Verified column). Links to Product Group guidance can be found in the References column and when no guidance is provided we've provided a link to the description of the Metric on learn.microsoft.com.

The Scope column details where we scoped the alerts as described in Introduction to deploying ALZ-Monitor.

Only a small number of the resources support metric alert rules scoped at the subscription level and the metric alerts would only apply to resources deployed within the same region. The Support for Multiple Resources column to show which resources support metric alerts being scoped at the subscription level. For a complete list of which resources support metrics alert rules scoped at the subscription level click here.

NOTE: There are hidden columns within the table, to scroll across you need to go to the bottom of the table to scroll and this is a limitation within tables in GitHub. If you have any suggestions to improving this expeirence please do get in touch via a PR or raise an issue, thank you.

AlertName Component Metric Aggregation Operator Threshold WindowSize Frequency Severity Scope Support for Multiple Resources Verified References
[DINE] Deploy ExpressRoute Circuits Bgp Availability Alert1 microsoft.network/expressroutecircuits BgpAvailability Average LessThan 90 PT5M PT1M 0 Resource No Yes Monitor ExpressRoute Alerts
ExpressRoute KQL Queries
[DINE] Deploy ExpressRoute Circuits Arp Availability Alert1 microsoft.network/expressroutecircuits ArpAvailability Average LessThan 90 PT5M PT1M 0 Resource No Yes Monitor ExpressRoute Alerts
ExpressRoute KQL Queries
[DINE] Deploy ExpressRoute Circuits QosDropBitsInPerSecond Alert microsoft.network/expressroutecircuits QosDropBitsInPerSecond Average GreaterThan 100 PT5M PT1M 2 Resource No No Monitor ExpressRoute Alerts
ExpressRoute KQL Queries
[DINE] Deploy ExpressRoute Circuits QosDropBitsOutPerSecond Alert microsoft.network/expressroutecircuits QosDropBitsOutPerSecond Average GreaterThan 100 PT5M PT1M 2 Resource No No Monitor ExpressRoute Alerts
ExpressRoute KQL Queries
[DINE] Deploy KeyVault Availability Alert1 Microsoft.KeyVault/vaults Availability Average LessThan 90 PT5M PT1M 1 Resource Yes Yes Monitoring KeyVault Reference
Monitoring KeyVault
KeyVault Insights Overview
[DINE] Deploy KeyVault Capacity Alert Microsoft.KeyVault/vaults SaturationShoebox Average GreaterThan 75 PT5M PT1M 1 Resource Yes Yes Monitoring KeyVault Reference
Monitoring KeyVault
KeyVault Insights Overview
[DINE] Deploy KeyVault Latency Alert Microsoft.KeyVault/vaults ServiceApiLatency Average GreaterThan 1000 PT5M PT1M 3 Resource Yes Yes Monitoring KeyVault Reference
Monitoring KeyVault
KeyVault Insights Overview
[DINE] Deploy KeyVault Requests Alert Microsoft.KeyVault/vaults ServiceApiResult Average GreaterThan dynamic PT5M PT1M 2 Resource Yes Yes Monitoring KeyVault Reference
Monitoring KeyVault
KeyVault Insights Overview
[DINE] Deploy Automation Account TotalJob Alert Microsoft.Automation/automationAccounts TotalJob Count GreaterThan 0 PT5M PT1M 2 Resource No No Azure Automation Azure Monitor Metrics
[DINE] Deploy AFW FirewallHealth Alert Microsoft.Network/azureFirewalls FirewallHealth Average LessThan 90 PT5M PT1M 0 Resource No No Overview of Azure Firewall logs and metrics
[DINE] Deploy AFW SNATPortUtilization Alert Microsoft.Network/azureFirewalls SNATPortUtilization Average GreaterThan 80 PT5M PT1M 1 Resource No No Overview of Azure Firewall logs and metrics
[DINE] Deploy SA Availability Alert1 Microsoft.Storage/storageAccounts Availability Average LessThan 90 PT5M PT5M 1 Resource No Y Monitoring Availability
[DINE] Deploy VPNG BGP Peer Status Alert microsoft.network/vpngateways BgpPeerStatus Total LessThan 1 PT5M PT5M Resource No No Overview of Azure Firewall logs and metrics
[DINE] Deploy VPNG Ingress Packet Drop Mismatch Alert microsoft.network/vpngateways TunnelIngressPacketDropTSMismatch Average GreaterThan dynamic PT5M PT5M 3 Resource No No Overview of Azure Firewall logs and metrics
[DINE] Deploy VPNG Egress Packet Drop Count Alert microsoft.network/vpngateways TunnelEgressPacketDropCount Total GreaterThan dynamic PT5M PT5M 3 Resource No No Overview of Azure Firewall logs and metrics
[DINE] Deploy VPNG Ingress Packet Drop Count Alert microsoft.network/vpngateways TunnelIngressPacketDropCount Total GreaterThan dynamic PT5M PT5M 3 Resource No No Overview of Azure Firewall logs and metrics
[DINE] Deploy VPNG Egress Packet Drop Mismatch Alert microsoft.network/vpngateways TunnelEgressPacketDropTSMismatch Total GreaterThan dynamic PT5M PT5M 3 Resource No No Overview of Azure Firewall logs and metrics
[DINE] Deploy VNetG ExpressRoute CPU Utilization Alert' microsoft.network/virtualNetworkGateways ExpressRouteGatewayCpuUtilization Average GreaterThan 90 PT5M PT1M 1 Resource No Yes Azure Monitor supported metrics by resource type - Azure Monitor
[DINE] Deploy VNetG ExpressRoute CPU Utilization Alert microsoft.network/expressroutegateways ExpressRouteGatewayCpuUtilization Average GreaterThan 80 PT5M PT1M 1 Resource No Yes ExpressRoute Monitoring Metrics Alerts for ExpressRoute Gateways
[DINE] Deploy PDNSZ Capacity Utilization Alert Microsoft.Network/privateDnsZones VirtualNetworkLinkCapacityUtilization Maximum GreaterThanEqualTo 80 PT1H PT1H 2 Resource No No Private DNS Alert Metrics
[DINE] Deploy PDNSZ Query Volume Alert Microsoft.Network/privateDnsZones QueryVolume Total GreaterThanOrEqual 500 PT1H PT1H 4 Resource No No Private DNS Alert Metrics
[DINE] Deploy PDNSZ Record Set Capacity Alert Microsoft.Network/privateDnsZones RecordSetCapacityUtilization Maximum GreaterThanOrEqual 75 PT1H PT1H 2 Resource No No Private DNS Alert Metrics
[DINE] Deploy PDNSZ Registration Capacity Utilization Alert Microsoft.Network/privateDnsZones VirtualNetworkWithRegistrationCapacityUtilization Maximum GreaterThan 90 PT1H PT1H 2 Resource No No Private DNS Alert Metrics
[DINE] Deploy PIP Bytes in DDoS Attack Alert Microsoft.Network/publicIPAddresses bytesinddos Maximum GreaterThan 8000000 PT5M PT5M 4 Resource No No Monitor Public IP Addresses
Public IP Addresses Supported Metrics
[DINE] Deploy PIP DDoS Attack Alert Microsoft.Network/publicIPAddresses ifunderddosattack Maximum GreaterThan 1 PT5M PT5M 1 Resource No Yes Monitor Public IP Addresses
Public IP Addresses Supported Metrics
[DINE] Deploy PIP Packets in DDoS Attack Alert Microsoft.Network/publicIPAddresses PacketsInDDoS Total GreaterThanEqualTo 40000 PT5M PT5M 4 Resource No No Monitor Public IP Addresses
Public IP Addresses Supported Metrics
[DINE] Deploy PIP VIP Availability Alert Microsoft.Network/publicIPAddresses VipAvailability Average LessThan 1 PT5M PT5M 1 Resource No No Azure Monitor supported metrics by resource type - Azure Monitor
[DINE] Deploy VNet DDoS Attack Alert Microsoft.Network/virtualNetworks ifunderddosattack Maximum GreaterThanOrEqual 1 PT5M PT5M 1 Resource No No Azure Monitor supported metrics by resource type - Azure Monitor
[DINE] Deploy VNetG Tunnel Bandwidth Alert Microsoft.Network/virtualNetworkGateways TunnelAverageBandwidth Average LessThan 1 PT5M PT5M 0 Resource No No Azure Monitor supported metrics by resource type - Azure Monitor
[DINE] Deploy VNetG Tunnel Egress Alert Microsoft.Network/virtualNetworkGateways TunnelEgressBytes Average LessThanOrEqual 1 PT5M PT5M 0 Resource No No Azure Monitor supported metrics by resource type - Azure Monitor
[DINE] Deploy VNetG Tunnel Ingress Alert Microsoft.Network/virtualNetworkGateways TunnelIngressBytes Average LessThanOrEqual 1 PT5M PT5M 0 Resource No No Azure Monitor supported metrics by resource type - Azure Monitor
[DINE] Deploy VPNG Bandwidth Utilization Alert microsoft.network/vpngateways tunnelaveragebandwidth Average GreaterThan 1000000000 PT5M PT5M 0 Resource No No Monitor VPN Gateway
Monitor VPN Gateway Reference
Azure Monitor supported metrics by resource type - Azure Monitor
[DINE] Deploy VPNG Egress Alert microsoft.network/vpngateways tunnelegressbytes Total LessThanOrEqual 0 PT5M PT5M 0 Resource No No Monitor VPN Gateway
Monitor VPN Gateway Reference
Azure Monitor supported metrics by resource type - Azure Monitor
[DINE] Deploy VPNG Ingress Alert microsoft.network/vpngateways tunnelingressbytes Total LessThanOrEqual 0 PT5M PT5M 0 Resource No No Monitor VPN Gateway
Monitor VPN Gateway Reference
Azure Monitor supported metrics by resource type - Azure Monitor
[DINE] Deploy VPNG Ingress Packet Drop Mismatch Alert Microsoft.Network/virtualNetworkGateways TunnelIngressPacketDropTSMismatch Average GreaterThan 100 PT5M PT5M 3 Resource No No Azure Monitor supported metrics by resource type - Azure Monitor
[DINE] Deploy VNetG Egress Packet Drop Mismatch Alert Microsoft.Network/virtualNetworkGateways TunnelEgressPacketDropCount Average GreaterThan 100 PT5M PT5M 3 Resource No No Azure Monitor supported metrics by resource type - Azure Monitor
[DINE] Deploy VNetG Ingress Packet Drop Count Alert Microsoft.Network/virtualNetworkGateways TunnelIngressPacketDropCount Average GreaterThan 100 PT5M PT5M 3 Resource No No Azure Monitor supported metrics by resource type - Azure Monitor
[DINE] Deploy VNetG Egress Packet Drop Mismatch Alert Microsoft.Network/virtualNetworkGateways TunnelEgressPacketDropTSMismatch Average GreaterThan 100 PT5M PT5M 3 Resource No No Azure Monitor supported metrics by resource type - Azure Monitor
[DINE] Deploy VNetG ExpressRoute Bits Per Second Alert Microsoft.Network/virtualNetworkGateways ExpressRouteGatewayBitsPerSecond Average LessThanOrEqual 1 PT5M PT5M 0 Resource No No Azure Monitor supported metrics by resource type - Azure Monitor
[DINE] Deploy ERG ExpressRoute Bits In Alert microsoft.network/expressroutegateways ERGatewayConnectionBitsInPerSecond Average LessThanOrEqual 1 PT5M PT5M 0 Resource No No ExpressRoute Monitoring Metrics Alerts - ExpressRoute-Gateways
[DINE] Deploy ERG ExpressRoute Bits Out Alert microsoft.network/expressroutegateways ERGatewayConnectionBitsOutPerSecond Average LessThanOrEqual 1 PT5M PT5M 0 Resource No No ExpressRoute Monitoring Metrics Alerts - ExpressRoute-Gateways

1 See "Why are the availability alert thresholds lower than 100% in this solution when the product group documention recommends 100%?" in the FAQ for more details.

Activity Log Alerts

Activity Log Resource Health

Use the following two sections to quickly know when there's a Service Health issue with an Azure resource, saving you the effort of further troubleshooting and allow you to focus on communicating to your user base and/or use these alerts as part of your business continuity actions (remediations).

Alert Policy Name Alert Name targetScope Category Property.cause Properties.currentHealthStatus Scope Verified
References
[DINE] Deploy Resource Health Unhealthy Alert ResourceHealthUnhealthyAlert managementGroup ResourceHealth PlatformInitiated,
UserInitiated
Degraded,
Unavailable
Subscription Yes Resource Health
Best practices for setting up service health alerts

Service Health Alerts

Alert Policy Name Alert Name PolicyScope Category properties.incidentType Scope Documented
References
[DINE] Deploy Service Health Advisory Alert ServiceHealthAdvisoryEvent managementGroup ServiceHealth ActionRequired Subscription Yes Activity Log Service Notifications
Best practices for setting up service health alerts
[DINE] Deploy Service Health Incident Alert ServiceHealthIncident managementGroup ServiceHealth Incident Subscription Yes Activity Log Service Notifications
Best practices for setting up service health alerts
[DINE] Deploy Service Health Maintenance Alert ServiceHealthPlannedMaintenance managementGroup ServiceHealth Maintenance Subscription Yes Activity Log Service Notifications
Best practices for setting up service health alerts
[DINE] Deploy Service Health Security Advisory Alert ServiceHealthSecurityIncident managementGroup ServiceHealth Security Subscription Yes Activity Log Service Notifications
Best practices for setting up service health alerts

Activity Log Administrative

The following table lists a number of operational Activity Log alerts to alert your team when certain resources have been deleted.

There isn't any per resource type guidance so what's been provided is some general guidance on alerting on the deletion of specific resources, the list may grow in the future and of course you can create your own following the pattern used for these Activity Log alerts.

Alert Policy Name Alert Name PolicyScope category operationName status Scope Documented
References
[DINE] Deploy Activity Log Azure FireWall Delete Alert ActivityAzureFirewallDelete managementGroup Administrative Microsoft.Microsoft.Network/azurefirewalls/delete succeeded Subscription No Activity Log Service Notifications
Best practices for setting up service health alerts
[DINE] Deploy Activity Log Key Vault Delete Alert ActivityKeyVaultDelete managementGroup Administrative Microsoft.KeyVault/vaults/delete succeeded Subscription No Activity Log Service Notifications
Best practices for setting up service health alerts
[DINE] Deploy Activity Log LA Workspace Delete Alert ActivityLAWorkspaceDelete managementGroup Administrative Microsoft.OperationalInsights/workspaces/delete succeeded Subscription No Activity Log Service Notifications
Best practices for setting up service health alerts
[DINE] Deploy Activity Log LA Workspace Regenerate Key Alert ActivityLAWorkspaceRegenKey managementGroup Administrative Microsoft.OperationalInsights/workspaces/regeneratesharedkey/action succeeded Subscription No Activity Log Service Notifications
Best practices for setting up service health alerts
[DINE] Deploy Activity Log NSG Delete Alert ActivityNSGDelete managementGroup Administrative Microsoft.Network/networkSecurityGroups/delete succeeded Subscription No Activity Log Service Notifications
Best practices for setting up service health alerts
[DINE] Deploy Activity Log Route Table Update Alert ActivityUDRUpdate managementGroup Administrative Microsoft.Network/routeTables/routes/write succeeded Subscription No Activity Log Service Notifications
Best practices for setting up service health alerts
[DINE] Deploy Activity Log VPN Gateway Delete Alert ActivityVPNGatewayDelete managementGroup Administrative Microsoft.Network/vpnGateways/delete succeeded Subscription No Activity Log Service Notifications
Best practices for setting up service health alerts

VM Insights Log Alerts

Once VM Insights has been enabled in your environment, the following alert rules can be configured for use via the Baseline Alerts framework.

N/A: Not applicable, not used in the query or used as a parameter.

AlertName Component Aggregation Operator Threshold WindowSize Frequency ResolveTime EvaluationPeriods FailingPeriods ComputersToInclude Other Resources Severity Query Verified References
[DINE] Deploy VM Available Memory Alert Microsoft.Compute/virtualMachines Average LessThan 1000 PT15M PT5M 0:10:00 1 1 N/A N/A 2 InsightsMetrics| where Origin == "vm.azm.ms"| where Namespace == "Memory" and Name == "AvailableMB"| extend TotalMemory = toreal(todynamic(Tags)["vm.azm.ms/memorySizeMB"]) | extend AvailableMemoryPercentage = (toreal(Val) / TotalMemory) * 100.0| summarize AggregatedValue = avg(AvailableMemoryPercentage) by bin(TimeGenerated, 15m), Computer, _ResourceId Y Monitor virtual machines with Azure Monitor: Alerts
[DINE] Deploy VM CPU Alert Microsoft.Compute/virtualMachines Average GreaterThan 85 PT15M PT5M 0:10:00 N/A 1 N/A N/A 2 InsightsMetrics| where Origin == "vm.azm.ms"| where Namespace == "Processor" and Name == "UtilizationPercentage"| summarize AggregatedValue = avg(Val) by bin(TimeGenerated, 15m), Computer, _ResourceId Y Monitor virtual machines with Azure Monitor: Alerts
[DINE] Deploy VM Data Disk Write Latency Alert Microsoft.Compute/virtualMachines Average GreaterThan 50 PT15M PT5M 0:10:00 1 1 * parDisksToInclude
*
2 InsightsMetrics| where Origin == "vm.azm.ms"| where Namespace == "LogicalDisk" and Name == "ReadLatencyMs"| extend Disk=tostring(todynamic(Tags)["vm.azm.ms/mountId"])|where Disk !in (\'C:\',\'/\')| summarize AggregatedValue = avg(Val) by bin(TimeGenerated, 15m), Computer, _ResourceId, Disk N Monitor virtual machines with Azure Monitor: Alerts
[DINE] Deploy VM Data Disk Read Latency Alert Microsoft.Compute/virtualMachines Average GreaterThan 50 PT15M PT5M 0:10:00 1 1 * parDisksToInclude
*
2 InsightsMetrics| where Origin == "vm.azm.ms"| where Namespace == "LogicalDisk" and Name == "ReadLatencyMs"| extend Disk=tostring(todynamic(Tags)["vm.azm.ms/mountId"])|where Disk !in (\'C:\',\'/\')| summarize AggregatedValue = avg(Val) by bin(TimeGenerated, 15m), Computer, _ResourceId, Disk N Monitor virtual machines with Azure Monitor: Alerts
[DINE] Deploy VM OS Disk Write Latency Alert Microsoft.Compute/virtualMachines Average GreaterThan 50 PT15M PT5M 0:10:00 1 1 * parDisksToInclude
C:
/
2 InsightsMetrics| where Origin == "vm.azm.ms"| where Namespace == "LogicalDisk" and Name == "WriteLatencyMs"| extend Disk=tostring(todynamic(Tags)["vm.azm.ms/mountId"])| summarize AggregatedValue = avg(Val) by bin(TimeGenerated, 15m), Computer, _ResourceId, Disk N Monitor virtual machines with Azure Monitor: Alerts
[DINE] Deploy VM OS Disk Read Latency Alert Microsoft.Compute/virtualMachines Average GreaterThan 30 PT15M PT5M 0:10:00 1 1 * parDisksToInclude
C:
/
2 InsightsMetrics| where Origin == "vm.azm.ms"| where Namespace == "LogicalDisk" and Name == "ReadLatencyMs"| extend Disk=tostring(todynamic(Tags)["vm.azm.ms/mountId"])| summarize AggregatedValue = avg(Val) by bin(TimeGenerated, 15m), Computer, _ResourceId, Disk N Monitor virtual machines with Azure Monitor: Alerts
[DINE] Deploy VM Network Write Alert Microsoft.Compute/virtualMachines Average GreaterThan 10000000 PT15M PT5M 0:10:00 1 1 * NetworkInterfacetToInclude
*
2 InsightsMetrics| where Origin == "vm.azm.ms"| where Namespace == "Network" and Name == "WriteBytesPerSecond"| extend NetworkInterface=tostring(todynamic(Tags)["vm.azm.ms/networkDeviceId"])|summarize AggregatedValue = avg(Val) by bin(TimeGenerated, 15m), Computer, _ResourceId, NetworkInterface Y Monitor virtual machines with Azure Monitor: Alerts
[DINE] Deploy VM Network Read Alert Microsoft.Compute/virtualMachines Average GreaterThan 10000000 PT15M PT5M 0:10:00 1 1 * NetworkInterfacetToInclude
*
2 InsightsMetrics| where Origin == "vm.azm.ms"| where Namespace == "Network" and Name == "ReadBytesPerSecond"| extend NetworkInterface=tostring(todynamic(Tags)["vm.azm.ms/networkDeviceId"])|summarize AggregatedValue = avg(Val) by bin(TimeGenerated, 15m), Computer, _ResourceId, NetworkInterface Y Monitor virtual machines with Azure Monitor: Alerts
[DINE] Deploy VM OS Disk Space Alert Microsoft.Compute/virtualMachines Average LessThan 10 PT15M PT5M 0:10:00 1 1 * parDisksToInclude
C:
/
2 InsightsMetrics| where Origin == "vm.azm.ms"| where Namespace == "LogicalDisk" and Name == "FreeSpacePercentage"| extend Disk=tostring(todynamic(Tags)["vm.azm.ms/mountId"])| summarize AggregatedValue = avg(Val) by bin(TimeGenerated, 15m), Computer, _ResourceId, Disk Y Monitor virtual machines with Azure Monitor: Alerts
[DINE] Deploy VM Data Disk Space Alert Microsoft.Compute/virtualMachines Average LessThan 10 PT15M PT5M 0:10:00 1 1 * parDisksToInclude
*
2 InsightsMetrics| where Origin == "vm.azm.ms"| where Namespace == "LogicalDisk" and Name == "FreeSpacePercentage"| extend Disk=tostring(todynamic(Tags)["vm.azm.ms/mountId"])|where Disk !in (\'C:\',\'/\')| summarize AggregatedValue = avg(Val) by bin(TimeGenerated, 15m), Computer, _ResourceId, Disk Y Monitor virtual machines with Azure Monitor: Alerts
[DINE] Deploy VM HeartBeat Alert Microsoft.Compute/virtualMachines Average GreaterThan 10 PT15M PT5M 0:10:00 1 1 N/A N/A 1 Heartbeat| summarize TimeGenerated=max(TimeGenerated) by Computer, _ResourceId| extend Duration = datetime_diff('minute',now(),TimeGenerated)| summarize AggregatedValue = min(Duration) by Computer, bin(TimeGenerated,5m), _ResourceId Y Monitor virtual machines with Azure Monitor: Alerts

Recovery Vault Alerts

The following policy disables the classic alerts that are available in Azure Backup and enables the Azure Monitor alerts.

Security Alerts and Job Failure alerts are summarized in the "Using Backup Center" documentation.

PolicyName Component Category Scope Support for Multiple Resources Verified References
Deploy_RecoveryVault_BackupHealthMonitor_Alert Microsoft.RecoveryServices/Vaults Microsoft.RecoveryServices/vaults/monitoringSettings.classicAlertSettings.alertsForCriticalOperations Resource No Y Azure Monitor Alerts for Azure Backup
Move to Azure Monitor Alerts