datadog-monitors

Summary

Using templates for certain monitors has the advantage they are easy to extend and to modify. Whereas keeping them in terraform it would require to add additional parameters, extend the corresponding map variable and adjust parameters for templatefile functions

TODO:

dashboards
integrations
tagging enhancements
network checks
...

Usage:

See also https://www.terraform.io/docs/providers/datadog/index.html

Set the required Datadog variables:

module "datadog_monitor" {
  source = "../modules/datadog_monitor"

  api_key = <DD API KEY>
  app_key = <DD APP KEY>
  api_url = <DD API URL>

  notification_recipient = "@dd-contact"

  monitor_custom = {
    "NTP clock drift" = {
      query_tpl = "ntp",
      msg_tpl   = "default",
      critical  = "2",
      warning   = "1"
    }
  }

  monitor_processes = {
    "CPU utilization (%)" = {
      query    = "avg(last_5m):avg:system.cpu.user{*} by {host} + avg:system.cpu.guest{*} by {host} + avg:system.cpu.system{*} by {host} > 90"
      msg_tpl  = "default",
      critical = "90",
      warning  = "80"
    },
    "Memory used (%)" = {
      query    = "avg(last_5m):avg:system.mem.usable{*} by {host} / avg:system.mem.total{*} by {host} * 100  > 90"
      msg_tpl  = "default",
      critical = "90",
      warning  = "80"
    },
    "Disk used (%)" = {
      query     = "avg(last_5m):avg:system.disk.in_use{*} by {host} >= 90"
      query_tpl = "disk",
      msg_tpl   = "default",
      critical  = "90",
      warning   = "80"
    },
    "Inodes used (%)" = {
      query    = "avg(last_5m):avg:system.fs.inodes.used{*} by {host} / ( avg:system.fs.inodes.total{*} by {host} / 100 ) > 90"
      msg_tpl  = "default",
      critical = "90",
      warning  = "80"
    },
    "IO utilization (%)" = {
      query    = "avg(last_5m):avg:system.io.util{*} by {host} > 90"
      msg_tpl  = "default",
      critical = "90",
      warning  = "80"
    },
    "Load 1min" = {
      query    = "avg(last_5m):avg:system.load.norm.1{*} by {host} > 2.0"
      msg_tpl  = "load",
      critical = "2.0",
      warning  = "1.75"
    },
    "Load 5min" = {
      query    = "avg(last_5m):avg:system.load.norm.5{*} by {host} > 1.75"
      msg_tpl  = "load",
      critical = "1.75",
      warning  = "1.50"
    },
    "Load 15min" = {
      query    = "avg(last_5m):avg:system.load.norm.15{*} by {host} > 1.50"
      msg_tpl  = "load",
      critical = "1.50",
      warning  = "1.25"
    }
  }
}

Monitors

Datadog provides different monitor types depending on the service, use case and integration.

Process

To verify if a certain process is running e.g. httpd, ssh, tomcat, Needs to be configured also on DD agent

Metrics

To gather usage data about cpu, memory, disk, etc.

Custom

Forecast

To predict metric usage

Integration

To collect data from ressources / services. e.g. GCP, Azure, AWS ressources

Network

To monitor regular checks from a DD agent to a destination set on agent side.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
examples		examples
monitors		monitors
templates		templates
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
integration-aws.tf		integration-aws.tf
main.tf		main.tf
output.tf		output.tf
variables.tf		variables.tf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

datadog-monitors

Summary

TODO:

Usage:

Monitors

Process

Metrics

Custom

Forecast

Integration

Network

Tags

About

Releases

Packages

Languages

License

ps-xaf/datadog-monitor

Folders and files

Latest commit

History

Repository files navigation

datadog-monitors

Summary

TODO:

Usage:

Monitors

Process

Metrics

Custom

Forecast

Integration

Network

Tags

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages