DHCP uses Grafana to display metrics. The Grafana dashboard can be found here.
The JSON used to configure the dashboard is stored in the IMA Grafana dashboard configuration repo. When updates are made in the dashboard, the JSON needs to be saved and tracked with version control. More information around this can be found in the documentation in the IMA repo.
The DHCP dashboard is separated into four sections:
- Alarms
- AWS Service metrics
- Kea Network metrics
- Kea Subnet metrics
The alarms section summarizes the state of the system and categorizes them as OK, Pending or Alerting.
OK
is a sign that the system is operating normallyPending
indicates that the system may be either recovering or erroringAlerting
shows that the system needs attention
This section displays all the relevant AWS metrics. These include:
- ECS Task Count
- NLB ProccessBytes
- UnHealthyHostCount
- ECS MemoryUtilization and CPUUtilization
- RDS ReadIOPS, WriteIOPS and CPUUtilization
The custom section displays all DHCP metrics produced by Kea as well as a subset of error and debug messages.
- CONFIGURATION RELOAD SUCCESSFUL
- CONFIGURATION RELOAD FAILED
- ALLOC_ENGINE_V4_ALLOC_ERROR
- ALLOC_ENGINE_V4_ALLOC_FAIL
- ALLOC_ENGINE_V4_ALLOC_FAIL_CLASSES
The dashboards monitor subnet and network metrics including the rate of change for DORA operations (Discovery, Offer, Request and Acknowledge) and Not Acknowledge (NAK) operations.